Answer
What Are Tokens in AI Chats?
Tokens are the basic units that AI language models process. They are subword pieces β neither full words nor individual characters β produced by a tokenization algorithm.
What Is a Token?
text"Hello, world!" β ["Hello", ",", " world", "!"] β 4 tokens "tokenization" β ["token", "ization"] β 2 tokens "antidisestablishmentarianism" β 8+ tokens
Roughly: 1 token β 4 characters β 0.75 words in English.
Why Tokens Matter
| Reason | Description |
|---|---|
| Pricing | AI APIs charge per token (input + output separately) |
| Context limit | Models have max tokens they can process (200K for Claude) |
| Speed | More tokens = more computation |
| Truncation | Long conversations get cut off at context limit |
Model Context Windows
| Model | Max Tokens | Notes |
|---|---|---|
| Claude 3.5 Sonnet | 200,000 | Largest among top models |
| GPT-4o | 128,000 | Good for most tasks |
| Gemini 1.5 Pro | 1,000,000 | Largest available |
| Llama 3.1 70B | 128,000 | Open source |
Checking Token Counts
pythonimport tiktoken import anthropic # OpenAI-compatible token counting enc = tiktoken.get_encoding("cl100k_base") tokens = enc.encode("How many tokens is this message?") print(f"Token count: {len(tokens)}") # β 7 # Anthropic Claude token counting client = anthropic.Anthropic() count = client.messages.count_tokens( model="claude-opus-4-6", messages=[{"role": "user", "content": "How many tokens is this?"}] ) print(f"Input tokens: {count.input_tokens}")
Rough Conversions
text100 tokens β 75 words β half a page 1,000 tokens β 750 words β 1.5 pages 10,000 tokens β 7,500 words β 15 pages 100,000 tokens β a short novel
Pricing Example
python# Cost estimation input_tokens = 5000 output_tokens = 1000 # Claude 3.5 Sonnet pricing (example) input_cost = input_tokens / 1_000_000 * 3.00 output_cost = output_tokens / 1_000_000 * 15.00 total = input_cost + output_cost print(f"Estimated cost: ${total:.4f}") # β $0.0300
Non-English Text Uses More Tokens
pythonenc = tiktoken.get_encoding("cl100k_base") # English: efficient print(len(enc.encode("Hello, how are you?"))) # β 5 tokens # Chinese: less efficient print(len(enc.encode("δ½ ε₯½οΌδ½ ζδΉζ ·οΌ"))) # β 14+ tokens
This means prompts in Chinese, Arabic, Japanese etc. cost more per word than English.
Tips for Token Efficiency
- Remove unnecessary whitespace and filler text from prompts
- Use prompt caching for repeated system prompts (Anthropic supports this)
- Monitor token usage in production with logging
- For RAG: retrieve fewer, higher-quality chunks