Concept #15Mediumgen-ai-fundamentals

Explain chain-of-thought (CoT) prompting. Why does it work?

#gen-ai#prompt-engineering

Answer

Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting instructs the LLM to reason step-by-step before producing the final answer. It dramatically improves performance on multi-step reasoning tasks.

Why CoT Works

Standard prompting forces the model to "jump" to an answer. CoT allocates more compute (tokens) to reasoning, allowing the model to work through intermediate steps — similar to how a human solves a math problem by writing out the steps.

Zero-Shot CoT

Simply add "Let's think step by step" (or similar) to the prompt.

python
from openai import OpenAI
client = OpenAI()

# Without CoT — often wrong on multi-step problems
basic_prompt = "If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours, what is its average speed for the whole journey?"

# With Zero-Shot CoT
cot_prompt = '''If a train travels 120km in 1.5 hours, then slows down and travels 80km in 2 hours,
what is its average speed for the whole journey?

Let's think step by step.'''

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": cot_prompt}],
    temperature=0
)
print(response.choices[0].message.content)
# Step 1: Total distance = 120 + 80 = 200km
# Step 2: Total time = 1.5 + 2 = 3.5 hours
# Step 3: Average speed = 200 / 3.5 = 57.1 km/h

Few-Shot CoT

Provide worked examples showing the reasoning process.

python
few_shot_cot_prompt = '''Solve these word problems by reasoning step by step.

Problem: Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have?
Reasoning: Roger starts with 5 balls. 2 cans × 3 balls = 6 new balls. 5 + 6 = 11 total.
Answer: 11

Problem: A cafeteria had 23 apples. They used 20 to make lunch and bought 6 more. How many apples do they have?
Reasoning: '''

# Model continues with the step-by-step reasoning

Self-Consistency (Ensemble CoT)

Generate multiple reasoning chains, then take the majority vote on the final answer. Reduces variance significantly.

python
def self_consistent_answer(question: str, n_samples: int = 5) -> str:
    answers = []
    for _ in range(n_samples):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"{question}\nLet's think step by step."}],
            temperature=0.7   # Some randomness to get diverse chains
        )
        # Extract final answer from last line
        answer = response.choices[0].message.content.strip().split("\n")[-1]
        answers.append(answer)

    # Return majority vote
    from collections import Counter
    return Counter(answers).most_common(1)[0][0]

When CoT Helps vs Doesn't

Task TypeCoT BenefitExample
Multi-step arithmeticHighWord problems, calculations
Logical reasoningHighSyllogisms, puzzle solving
Code debuggingHigh"Find the bug and fix it"
Causal reasoningHigh"Why did X happen?"
Simple factual retrievalLow"What is the capital of France?"
Text classificationLowSentiment analysis
Creative writingNeutralStory generation

Prompt Patterns

python
# Pattern 1: "Let's think step by step"
"Solve this problem. Let's think step by step: {problem}"

# Pattern 2: "First, ... then, ... finally, ..."
"First identify the key facts. Then apply the relevant formula. Finally state the answer."

# Pattern 3: Structured reasoning
"Reasoning: <your step-by-step thinking>\nAnswer: <final answer only>"

When CoT helps most: Multi-step arithmetic, logical reasoning, code debugging, causal reasoning. It helps less for simple factual retrieval or classification.