Explain chain-of-thought (CoT) prompting. Why does it work?
Answer
Chain-of-Thought Prompting
Chain-of-Thought (CoT) prompting instructs the LLM to reason step-by-step before producing the final answer. It dramatically improves performance on multi-step reasoning tasks.
Why CoT Works
Standard prompting forces the model to "jump" to an answer. CoT allocates more compute (tokens) to reasoning, allowing the model to work through intermediate steps — similar to how a human solves a math problem by writing out the steps.
Zero-Shot CoT
Simply add "Let's think step by step" (or similar) to the prompt.
pythonfrom openai import OpenAI client = OpenAI() # Without CoT — often wrong on multi-step problems basic_prompt = "If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours, what is its average speed for the whole journey?" # With Zero-Shot CoT cot_prompt = '''If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours, what is its average speed for the whole journey? Let's think step by step.''' response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": cot_prompt}], temperature=0 ) print(response.choices[0].message.content) # Step 1: Total distance = 120 + 80 = 200km # Step 2: Total time = 1.5 + 2 = 3.5 hours # Step 3: Average speed = 200 / 3.5 = 57.1 km/h
Few-Shot CoT
Provide worked examples showing the reasoning process.
pythonfew_shot_cot_prompt = '''Solve these word problems by reasoning step by step. Problem: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have? Reasoning: Roger starts with 5 balls. 2 cans × 3 balls = 6 new balls. 5 + 6 = 11 total. Answer: 11 Problem: A cafeteria had 23 apples. They used 20 to make lunch and bought 6 more. How many apples do they have? Reasoning: ''' # Model continues with the step-by-step reasoning
Self-Consistency (Ensemble CoT)
Generate multiple reasoning chains, then take the majority vote on the final answer. Reduces variance significantly.
pythondef self_consistent_answer(question: str, n_samples: int = 5) -> str: answers = [] for _ in range(n_samples): response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": f"{question}\nLet's think step by step."}], temperature=0.7 # Some randomness to get diverse chains ) # Extract final answer from last line answer = response.choices[0].message.content.strip().split("\n")[-1] answers.append(answer) # Return majority vote from collections import Counter return Counter(answers).most_common(1)[0][0]
When CoT Helps vs Doesn't
| Task Type | CoT Benefit | Example |
|---|---|---|
| Multi-step arithmetic | High | Word problems, calculations |
| Logical reasoning | High | Syllogisms, puzzle solving |
| Code debugging | High | "Find the bug and fix it" |
| Causal reasoning | High | "Why did X happen?" |
| Simple factual retrieval | Low | "What is the capital of France?" |
| Text classification | Low | Sentiment analysis |
| Creative writing | Neutral | Story generation |
Prompt Patterns
python# Pattern 1: "Let's think step by step" "Solve this problem. Let's think step by step: {problem}" # Pattern 2: "First, ... then, ... finally, ..." "First identify the key facts. Then apply the relevant formula. Finally state the answer." # Pattern 3: Structured reasoning "Reasoning: <your step-by-step thinking>\nAnswer: <final answer only>"
When CoT helps most: Multi-step arithmetic, logical reasoning, code debugging, causal reasoning. It helps less for simple factual retrieval or classification.