Explain chain-of-thought (CoT) prompting. Why does it work?

Question

Accepted Answer

## Chain-of-Thought Prompting **Chain-of-Thought (CoT)** prompting instructs the LLM to reason step-by-step before producing the final answer. It dramatically improves performance on multi-step reasoning tasks. ### Why CoT Works Standard prompting forces the model to "jump" to an answer. CoT allocates more compute (tokens) to reasoning, allowing the model to work through intermediate steps — similar to how a human solves a math problem by writing out the steps. ### Zero-Shot CoT Simply add *"Let's think step by step"* (or similar) to the prompt. ```python from openai import OpenAI client = OpenAI() # Without CoT — often wrong on multi-step problems basic_prompt = "If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours, what is its average speed for the whole journey?" # With Zero-Shot CoT cot_prompt = '''If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours, what is its average speed for the whole journey? Let's think step by step.''' response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": cot_prompt}], temperature=0 ) print(response.choices[0].message.content) # Step 1: Total distance = 120 + 80 = 200km # Step 2: Total time = 1.5 + 2 = 3.5 hours # Step 3: Average speed = 200 / 3.5 = 57.1 km/h ``` ### Few-Shot CoT Provide worked examples showing the reasoning process. ```python few_shot_cot_prompt = '''Solve these word problems by reasoning step by step. Problem: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have? Reasoning: Roger starts with 5 balls. 2 cans × 3 balls = 6 new balls. 5 + 6 = 11 total. Answer: 11 Problem: A cafeteria had 23 apples. They used 20 to make lunch and bought 6 more. How many apples do they have? Reasoning: ''' # Model continues with the step-by-step reasoning ``` ### Self-Consistency (Ensemble CoT) Generate multiple reasoning chains, then take the **majority vote** on the final answer. Reduces variance significantly. ```python def self_consistent_answer(question: str, n_samples: int = 5) -> str: answers = [] for _ in range(n_samples): response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": f"{question} Let's think step by step."}], temperature=0.7 # Some randomness to get diverse chains ) # Extract final answer from last line answer = response.choices[0].message.content.strip().split(" ")[-1] answers.append(answer) # Return majority vote from collections import Counter return Counter(answers).most_common(1)[0][0] ``` ### When CoT Helps vs Doesn't | Task Type | CoT Benefit | Example | |-----------|------------|---------| | Multi-step arithmetic | High | Word problems, calculations | | Logical reasoning | High | Syllogisms, puzzle solving | | Code debugging | High | "Find the bug and fix it" | | Causal reasoning | High | "Why did X happen?" | | Simple factual retrieval | Low | "What is the capital of France?" | | Text classification | Low | Sentiment analysis | | Creative writing | Neutral | Story generation | ### Prompt Patterns ```python # Pattern 1: "Let's think step by step" "Solve this problem. Let's think step by step: {problem}" # Pattern 2: "First, ... then, ... finally, ..." "First identify the key facts. Then apply the relevant formula. Finally state the answer." # Pattern 3: Structured reasoning "Reasoning: Answer: " ``` > **When CoT helps most:** Multi-step arithmetic, logical reasoning, code debugging, causal reasoning. It helps less for simple factual retrieval or classification.

Explain chain-of-thought (CoT) prompting. Why does it work?

Answer

Chain-of-Thought Prompting

Why CoT Works

Zero-Shot CoT

Few-Shot CoT

Self-Consistency (Ensemble CoT)

When CoT Helps vs Doesn't

Prompt Patterns

Related Concepts

Explain the Transformer architecture. What are attention mechanisms and why are they important?

What's the difference between a Large Language Model (LLM) and other ML models?

Explain these LLM concepts: Tokens, Context window, Temperature & Top-p sampling, Beam search.

What's the difference between encoder-only, decoder-only, and encoder-decoder models?

Explain quantization in LLMs. Why is it important?

Task Type	CoT Benefit	Example
Multi-step arithmetic	High	Word problems, calculations
Logical reasoning	High	Syllogisms, puzzle solving
Code debugging	High	"Find the bug and fix it"
Causal reasoning	High	"Why did X happen?"
Simple factual retrieval	Low	"What is the capital of France?"
Text classification	Low	Sentiment analysis
Creative writing	Neutral	Story generation