What are tokens in AI chats?

Question

Accepted Answer

## What Are Tokens in AI Chats?

**Tokens** are the basic units that AI language models process. They are subword pieces — neither full words nor individual characters — produced by a tokenization algorithm.

### What Is a Token?

```
"Hello, world!" → ["Hello", ",", " world", "!"] → 4 tokens

"tokenization" → ["token", "ization"] → 2 tokens

"antidisestablishmentarianism" → 8+ tokens
```

Roughly: **1 token ≈ 4 characters ≈ 0.75 words** in English.

### Why Tokens Matter

| Reason | Description |
|--------|-------------|
| **Pricing** | AI APIs charge per token (input + output separately) |
| **Context limit** | Models have max tokens they can process (200K for Claude) |
| **Speed** | More tokens = more computation |
| **Truncation** | Long conversations get cut off at context limit |

### Model Context Windows

| Model | Max Tokens | Notes |
|-------|-----------|-------|
| Claude 3.5 Sonnet | 200,000 | Largest among top models |
| GPT-4o | 128,000 | Good for most tasks |
| Gemini 1.5 Pro | 1,000,000 | Largest available |
| Llama 3.1 70B | 128,000 | Open source |

### Checking Token Counts

```python
import tiktoken
import anthropic

# OpenAI-compatible token counting
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("How many tokens is this message?")
print(f"Token count: {len(tokens)}")  # → 7

# Anthropic Claude token counting
client = anthropic.Anthropic()
count = client.messages.count_tokens(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "How many tokens is this?"}]
)
print(f"Input tokens: {count.input_tokens}")
```

### Rough Conversions

```
100 tokens   ≈ 75 words ≈ half a page
1,000 tokens ≈ 750 words ≈ 1.5 pages
10,000 tokens ≈ 7,500 words ≈ 15 pages
100,000 tokens ≈ a short novel
```

### Pricing Example

```python
# Cost estimation
input_tokens = 5000
output_tokens = 1000

# Claude 3.5 Sonnet pricing (example)
input_cost = input_tokens / 1_000_000 * 3.00
output_cost = output_tokens / 1_000_000 * 15.00
total = input_cost + output_cost
print(f"Estimated cost: ${total:.4f}")  # → $0.0300
```

### Non-English Text Uses More Tokens

```python
enc = tiktoken.get_encoding("cl100k_base")

# English: efficient
print(len(enc.encode("Hello, how are you?")))  # → 5 tokens

# Chinese: less efficient
print(len(enc.encode("你好，你怎么样？")))       # → 14+ tokens
```

This means prompts in Chinese, Arabic, Japanese etc. cost more per word than English.

### Tips for Token Efficiency

* Remove unnecessary whitespace and filler text from prompts
* Use **prompt caching** for repeated system prompts (Anthropic supports this)
* Monitor token usage in production with logging
* For RAG: retrieve fewer, higher-quality chunks

What are tokens in AI chats?

Answer

What Are Tokens in AI Chats?

What Is a Token?

Why Tokens Matter

Model Context Windows

Checking Token Counts

Rough Conversions

Pricing Example

Non-English Text Uses More Tokens

Tips for Token Efficiency

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Reason	Description
Pricing	AI APIs charge per token (input + output separately)
Context limit	Models have max tokens they can process (200K for Claude)
Speed	More tokens = more computation
Truncation	Long conversations get cut off at context limit

Model	Max Tokens	Notes
Claude 3.5 Sonnet	200,000	Largest among top models
GPT-4o	128,000	Good for most tasks
Gemini 1.5 Pro	1,000,000	Largest available
Llama 3.1 70B	128,000	Open source