Concept #82Easyextended-ai-concepts

What are tokens in AI chats?

#gen-ai#tokens

Answer

What Are Tokens in AI Chats?

Tokens are the basic units that AI language models process. They are subword pieces β€” neither full words nor individual characters β€” produced by a tokenization algorithm.

What Is a Token?

text
"Hello, world!" β†’ ["Hello", ",", " world", "!"] β†’ 4 tokens

"tokenization" β†’ ["token", "ization"] β†’ 2 tokens

"antidisestablishmentarianism" β†’ 8+ tokens

Roughly: 1 token β‰ˆ 4 characters β‰ˆ 0.75 words in English.

Why Tokens Matter

ReasonDescription
PricingAI APIs charge per token (input + output separately)
Context limitModels have max tokens they can process (200K for Claude)
SpeedMore tokens = more computation
TruncationLong conversations get cut off at context limit

Model Context Windows

ModelMax TokensNotes
Claude 3.5 Sonnet200,000Largest among top models
GPT-4o128,000Good for most tasks
Gemini 1.5 Pro1,000,000Largest available
Llama 3.1 70B128,000Open source

Checking Token Counts

python
import tiktoken
import anthropic

# OpenAI-compatible token counting
enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode("How many tokens is this message?")
print(f"Token count: {len(tokens)}")  # β†’ 7

# Anthropic Claude token counting
client = anthropic.Anthropic()
count = client.messages.count_tokens(
    model="claude-opus-4-6",
    messages=[{"role": "user", "content": "How many tokens is this?"}]
)
print(f"Input tokens: {count.input_tokens}")

Rough Conversions

text
100 tokens   β‰ˆ 75 words β‰ˆ half a page
1,000 tokens β‰ˆ 750 words β‰ˆ 1.5 pages
10,000 tokens β‰ˆ 7,500 words β‰ˆ 15 pages
100,000 tokens β‰ˆ a short novel

Pricing Example

python
# Cost estimation
input_tokens = 5000
output_tokens = 1000

# Claude 3.5 Sonnet pricing (example)
input_cost = input_tokens / 1_000_000 * 3.00
output_cost = output_tokens / 1_000_000 * 15.00
total = input_cost + output_cost
print(f"Estimated cost: ${total:.4f}")  # β†’ $0.0300

Non-English Text Uses More Tokens

python
enc = tiktoken.get_encoding("cl100k_base")

# English: efficient
print(len(enc.encode("Hello, how are you?")))  # β†’ 5 tokens

# Chinese: less efficient
print(len(enc.encode("δ½ ε₯½οΌŒδ½ ζ€ŽδΉˆζ ·οΌŸ")))       # β†’ 14+ tokens

This means prompts in Chinese, Arabic, Japanese etc. cost more per word than English.

Tips for Token Efficiency

  • Remove unnecessary whitespace and filler text from prompts
  • Use prompt caching for repeated system prompts (Anthropic supports this)
  • Monitor token usage in production with logging
  • For RAG: retrieve fewer, higher-quality chunks