How are tokens calculated in an AI conversation? (e.g., Gemini CLI + GEMINI.md)

Question

Accepted Answer

## How Tokens Are Calculated in an AI Conversation

Token calculation in a multi-turn conversation is cumulative — every message (including the full history) counts toward your token usage and context window limit.

### What Gets Counted

In a multi-turn conversation, **all messages in the history** are sent to the model each turn:

```
Turn 1:
  Input = system_prompt + user_message_1
  Output = assistant_response_1

Turn 2:
  Input = system_prompt + user_message_1 + assistant_response_1 + user_message_2
  Output = assistant_response_2

Turn 3:
  Input = system_prompt + user_msg_1 + asst_1 + user_msg_2 + asst_2 + user_msg_3
  Output = assistant_response_3
```

**You pay for ALL input tokens on each turn**, including conversation history.

### Gemini CLI + GEMINI.md Token Calculation

Gemini CLI reads `GEMINI.md` as a system-level context file that is prepended to every conversation:

```
Total tokens per query = GEMINI.md tokens + conversation history + new message
```

For a typical setup:
- `GEMINI.md` = 500-2000 tokens (project instructions)
- Each conversation turn adds both user and assistant tokens
- After 10 turns with 500 token avg exchanges: 5000+ tokens of history

### Measuring Token Usage

```python
import anthropic

client = anthropic.Anthropic()

conversation = []

def chat(user_message: str) -> str:
    conversation.append({"role": "user", "content": user_message})

response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        system="You are a helpful assistant.",
        messages=conversation
    )

assistant_msg = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_msg})

# Track usage
    print(f"Input tokens: {response.usage.input_tokens}")
    print(f"Output tokens: {response.usage.output_tokens}")
    print(f"Total this turn: {response.usage.input_tokens + response.usage.output_tokens}")

return assistant_msg

# As conversation grows, input_tokens grows too
chat("What is Python?")
chat("What are its main use cases?")
chat("How does it compare to JavaScript?")
```

### Cost Accumulation

```python
def calculate_conversation_cost(
    turns: int,
    avg_user_tokens: int = 50,
    avg_assistant_tokens: int = 200,
    system_tokens: int = 100,
    input_price_per_M: float = 3.0,
    output_price_per_M: float = 15.0
) -> dict:
    total_input = 0
    total_output = 0

for turn in range(1, turns + 1):
        # Input grows with history
        turn_input = system_tokens + (turn * avg_user_tokens) + ((turn - 1) * avg_assistant_tokens)
        turn_output = avg_assistant_tokens
        total_input += turn_input
        total_output += turn_output

input_cost = total_input / 1_000_000 * input_price_per_M
    output_cost = total_output / 1_000_000 * output_price_per_M

return {
        "total_input_tokens": total_input,
        "total_output_tokens": total_output,
        "total_cost": input_cost + output_cost
    }

print(calculate_conversation_cost(turns=10))
```

### Context Window Management

When conversations approach the model's context limit, you have options:

| Strategy | Description | Trade-off |
|---------|-------------|-----------|
| **Truncate old messages** | Drop oldest messages | Lose early context |
| **Summarize history** | Replace old turns with summary | Slightly less accurate |
| **Sliding window** | Keep last N turns | Miss early context |
| **Prompt caching** | Cache repeated system prompt | Reduces cost for system prompt |

### Key Rule

> In a 10-turn conversation, turn 10's input tokens = system + all 9 previous exchanges + new message. Input costs grow quadratically with conversation length.

For long conversations, consider summarizing history periodically to control costs.

How are tokens calculated in an AI conversation? (e.g., Gemini CLI + GEMINI.md)

Answer

How Tokens Are Calculated in an AI Conversation

What Gets Counted

Gemini CLI + GEMINI.md Token Calculation

Measuring Token Usage

Cost Accumulation

Context Window Management

Key Rule

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Strategy	Description	Trade-off
Truncate old messages	Drop oldest messages	Lose early context
Summarize history	Replace old turns with summary	Slightly less accurate
Sliding window	Keep last N turns	Miss early context
Prompt caching	Cache repeated system prompt	Reduces cost for system prompt