LLM Token Counting: How Tokens Work and Why They Matter for AI Cost

What Is a Token?

Language models don't process text character by character or word by word — they process tokens, which are subword units produced by a tokenizer. A token is roughly 3–4 characters of English text on average, but the exact split depends on the tokenizer used by the model. The word "tokenization" might be split into token, ization. A common English word like "the" is a single token. An uncommon word might be split into many.

Understanding tokens matters because AI APIs charge per token, and context window limits (how much text a model can process at once) are measured in tokens.

Why Different Models Have Different Token Counts

Each model family uses its own tokenizer, trained on different data with different vocabulary sizes:

GPT-4 and GPT-3.5 use OpenAI's cl100k_base tokenizer (100,000 token vocabulary)
Claude uses Anthropic's tokenizer, which produces similar but not identical counts
Llama models use a SentencePiece tokenizer
Gemini uses Google's tokenizer

The same sentence will produce different token counts in each. For cost estimation, always use the tokenizer matching the model you're actually calling.

How Code and Non-English Text Tokenize

Code tokenizes differently from prose. Variable names, keywords, and punctuation each count as separate tokens. A dense block of Python may tokenize to more tokens than the equivalent line count would suggest. Non-English text is often less efficient — rarer scripts may tokenize at a lower character-per-token ratio than English, so a paragraph in Chinese may use more tokens than the same content in English.

Context Window Limits

Every model has a maximum context length — the total number of tokens for the combined system prompt, conversation history, and response. Common limits:

| Model | Context Window |

|---|---|

| GPT-4o | 128,000 tokens |

| Claude 3.5 Sonnet | 200,000 tokens |

| Llama 3.1 70B | 128,000 tokens |

| Gemini 1.5 Pro | 1,000,000 tokens |

Sending more tokens than the limit throws an error. Near the limit, models may start truncating earlier context or hallucinating.

Estimating API Costs

API pricing is listed per 1,000 or per 1,000,000 tokens, usually split into input (prompt) and output (completion) rates. Output tokens are typically 2–5× more expensive than input tokens. Before building a feature that makes many API calls, estimate the average token count of your prompts and expected response lengths to project monthly cost.

A rough heuristic: 1,000 tokens ≈ 750 words. A 10-page document is roughly 5,000–7,500 tokens. A full novel is roughly 100,000–150,000 tokens.

Reducing Token Usage

Shorter prompts with precise instructions reduce costs without sacrificing quality. Stripping unnecessary whitespace, removing repetition in system prompts, and summarising long conversation histories all reduce token counts. Some teams cache prompt prefixes using provider features (like Anthropic's prompt caching) to avoid re-billing the same system prompt on every call.

Checking Token Count Before Sending

Use a token counter to verify prompt length before making an API call, especially when building prompts dynamically from user input. A user pasting a very long document can push a prompt over the context limit, causing a failure at runtime. Checking token count first lets you truncate, summarise, or warn the user before the API call is made.