How are tokens calculated in an AI conversation? (e.g., Gemini CLI + GEMINI.md)
#gen-ai#tokens
Answer
How Tokens Are Calculated in an AI Conversation
Token calculation in a multi-turn conversation is cumulative — every message (including the full history) counts toward your token usage and context window limit.
What Gets Counted
In a multi-turn conversation, all messages in the history are sent to the model each turn:
textTurn 1: Input = system_prompt + user_message_1 Output = assistant_response_1 Turn 2: Input = system_prompt + user_message_1 + assistant_response_1 + user_message_2 Output = assistant_response_2 Turn 3: Input = system_prompt + user_msg_1 + asst_1 + user_msg_2 + asst_2 + user_msg_3 Output = assistant_response_3
You pay for ALL input tokens on each turn, including conversation history.
Gemini CLI + GEMINI.md Token Calculation
Gemini CLI reads
text
GEMINI.mdtextTotal tokens per query = GEMINI.md tokens + conversation history + new message
For a typical setup:
- = 500-2000 tokens (project instructions)text
GEMINI.md - Each conversation turn adds both user and assistant tokens
- After 10 turns with 500 token avg exchanges: 5000+ tokens of history
Measuring Token Usage
pythonimport anthropic client = anthropic.Anthropic() conversation = [] def chat(user_message: str) -> str: conversation.append({"role": "user", "content": user_message}) response = client.messages.create( model="claude-opus-4-6", max_tokens=1024, system="You are a helpful assistant.", messages=conversation ) assistant_msg = response.content[0].text conversation.append({"role": "assistant", "content": assistant_msg}) # Track usage print(f"Input tokens: {response.usage.input_tokens}") print(f"Output tokens: {response.usage.output_tokens}") print(f"Total this turn: {response.usage.input_tokens + response.usage.output_tokens}") return assistant_msg # As conversation grows, input_tokens grows too chat("What is Python?") chat("What are its main use cases?") chat("How does it compare to JavaScript?")
Cost Accumulation
pythondef calculate_conversation_cost( turns: int, avg_user_tokens: int = 50, avg_assistant_tokens: int = 200, system_tokens: int = 100, input_price_per_M: float = 3.0, output_price_per_M: float = 15.0 ) -> dict: total_input = 0 total_output = 0 for turn in range(1, turns + 1): # Input grows with history turn_input = system_tokens + (turn * avg_user_tokens) + ((turn - 1) * avg_assistant_tokens) turn_output = avg_assistant_tokens total_input += turn_input total_output += turn_output input_cost = total_input / 1_000_000 * input_price_per_M output_cost = total_output / 1_000_000 * output_price_per_M return { "total_input_tokens": total_input, "total_output_tokens": total_output, "total_cost": input_cost + output_cost } print(calculate_conversation_cost(turns=10))
Context Window Management
When conversations approach the model's context limit, you have options:
| Strategy | Description | Trade-off |
|---|---|---|
| Truncate old messages | Drop oldest messages | Lose early context |
| Summarize history | Replace old turns with summary | Slightly less accurate |
| Sliding window | Keep last N turns | Miss early context |
| Prompt caching | Cache repeated system prompt | Reduces cost for system prompt |
Key Rule
In a 10-turn conversation, turn 10's input tokens = system + all 9 previous exchanges + new message. Input costs grow quadratically with conversation length.
For long conversations, consider summarizing history periodically to control costs.