Concept #90Mediumextended-ai-concepts

How are text/chat prompts converted to AI-understandable format?

#gen-ai#tokens

Answer

How Text/Chat Prompts Are Converted to AI-Understandable Format

This covers the complete journey from a chat message to model output.

Full Pipeline

text
You type: "What is AI?"
1. Chat formatting → role labels added
2. Chat template → model-specific format string
3. Tokenization → integer token IDs
4. Embedding lookup → float vectors
5. + Positional encoding → position-aware vectors
6. Transformer blocks → context-enriched representations
7. Output layer → logits over vocabulary
8. Sampling → token IDs → detokenize → "AI stands for..."

Step 1-2: Chat Template Formatting

Every model has a specific format it was fine-tuned on:

python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is AI?"},
]

# Apply the model-specific template
formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(formatted)

Output for Llama 3:

text
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.
<|eot_id|><|start_header_id|>user<|end_header_id|>
What is AI?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Different models, different formats:

Model FamilyFormat
Llama 3`<
ChatML (GPT)`<
Mistral/Mixtral
text
[INST] ... [/INST]
Claude APIHandled server-side by Anthropic
Gemini APIHandled server-side by Google

Step 3: Tokenization

python
import tiktoken
enc = tiktoken.get_encoding("cl100k_base")

text = "<|im_start|>user\nWhat is AI?<|im_end|>"
token_ids = enc.encode(text)
# → [100264, 882, 198, 3923, 374, 15592, 30, 100265]

Step 4-5: Embeddings + Position

python
import torch
import torch.nn as nn

token_ids_tensor = torch.tensor(token_ids)

# Token embedding: ID → semantic vector
token_embeds = nn.Embedding(100277, 4096)(token_ids_tensor)

# Position embedding: position → positional vector
pos_ids = torch.arange(len(token_ids))
pos_embeds = nn.Embedding(8192, 4096)(pos_ids)

# Combined input to transformer
x = token_embeds + pos_embeds  # shape: (n_tokens, 4096)

Step 6-8: Transformer → Output

python
# Transformer blocks process x through attention + FFN layers
# Output: logits over vocabulary
logits = transformer_model(x)  # shape: (n_tokens, vocab_size)

# Take last token's logits (predict next token)
next_token_logits = logits[-1]

# Sample from distribution (or argmax for greedy)
probs = torch.softmax(next_token_logits, dim=-1)
next_token_id = torch.multinomial(probs, 1).item()
next_token_text = enc.decode([next_token_id])

Key Insight for Engineers

When you send a message via API:

  1. The API provider handles chat formatting server-side
  2. They tokenize and embed for you
  3. Model runs forward pass
  4. Output tokens are decoded back to text
  5. You receive a string response

You don't need to implement this manually when using APIs — but understanding it helps you debug token limits, prompt formatting issues, and model behavior.