Concept #96Mediumextended-ai-concepts

How are tokens calculated in an AI conversation? (e.g., Gemini CLI + GEMINI.md)

#gen-ai#tokens

Answer

How Tokens Are Calculated in an AI Conversation

Token calculation in a multi-turn conversation is cumulative — every message (including the full history) counts toward your token usage and context window limit.

What Gets Counted

In a multi-turn conversation, all messages in the history are sent to the model each turn:

text
Turn 1:
  Input = system_prompt + user_message_1
  Output = assistant_response_1

Turn 2:
  Input = system_prompt + user_message_1 + assistant_response_1 + user_message_2
  Output = assistant_response_2

Turn 3:
  Input = system_prompt + user_msg_1 + asst_1 + user_msg_2 + asst_2 + user_msg_3
  Output = assistant_response_3

You pay for ALL input tokens on each turn, including conversation history.

Gemini CLI + GEMINI.md Token Calculation

Gemini CLI reads

text
GEMINI.md
as a system-level context file that is prepended to every conversation:

text
Total tokens per query = GEMINI.md tokens + conversation history + new message

For a typical setup:

  • text
    GEMINI.md
    = 500-2000 tokens (project instructions)
  • Each conversation turn adds both user and assistant tokens
  • After 10 turns with 500 token avg exchanges: 5000+ tokens of history

Measuring Token Usage

python
import anthropic

client = anthropic.Anthropic()

conversation = []

def chat(user_message: str) -> str:
    conversation.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        system="You are a helpful assistant.",
        messages=conversation
    )

    assistant_msg = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_msg})

    # Track usage
    print(f"Input tokens: {response.usage.input_tokens}")
    print(f"Output tokens: {response.usage.output_tokens}")
    print(f"Total this turn: {response.usage.input_tokens + response.usage.output_tokens}")

    return assistant_msg

# As conversation grows, input_tokens grows too
chat("What is Python?")
chat("What are its main use cases?")
chat("How does it compare to JavaScript?")

Cost Accumulation

python
def calculate_conversation_cost(
    turns: int,
    avg_user_tokens: int = 50,
    avg_assistant_tokens: int = 200,
    system_tokens: int = 100,
    input_price_per_M: float = 3.0,
    output_price_per_M: float = 15.0
) -> dict:
    total_input = 0
    total_output = 0

    for turn in range(1, turns + 1):
        # Input grows with history
        turn_input = system_tokens + (turn * avg_user_tokens) + ((turn - 1) * avg_assistant_tokens)
        turn_output = avg_assistant_tokens
        total_input += turn_input
        total_output += turn_output

    input_cost = total_input / 1_000_000 * input_price_per_M
    output_cost = total_output / 1_000_000 * output_price_per_M

    return {
        "total_input_tokens": total_input,
        "total_output_tokens": total_output,
        "total_cost": input_cost + output_cost
    }

print(calculate_conversation_cost(turns=10))

Context Window Management

When conversations approach the model's context limit, you have options:

StrategyDescriptionTrade-off
Truncate old messagesDrop oldest messagesLose early context
Summarize historyReplace old turns with summarySlightly less accurate
Sliding windowKeep last N turnsMiss early context
Prompt cachingCache repeated system promptReduces cost for system prompt

Key Rule

In a 10-turn conversation, turn 10's input tokens = system + all 9 previous exchanges + new message. Input costs grow quadratically with conversation length.

For long conversations, consider summarizing history periodically to control costs.