Concept #16Mediumgen-ai-fundamentals

What's few-shot vs. zero-shot prompting?

#gen-ai#prompt-engineering

Answer

Few-Shot vs Zero-Shot Prompting

Prompting strategy is one of the first decisions you make when deploying an LLM. Choosing between zero-shot and few-shot affects accuracy, cost, and latency.

Zero-Shot Prompting

Gives the model only the task description — no examples.

python
from openai import OpenAI
client = OpenAI()

zero_shot_prompt = '''Classify the sentiment of the following customer review as Positive, Negative, or Neutral.

Review: "The product arrived on time but the packaging was damaged."
Sentiment:'''

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a sentiment classifier. Respond with only: Positive, Negative, or Neutral."},
        {"role": "user", "content": zero_shot_prompt}
    ],
    temperature=0
)
print(response.choices[0].message.content)  # "Neutral"

When to use: Task is well-understood by the model, output format is simple, tokens are limited.

Few-Shot Prompting

Provides 2–10 input/output examples before the actual query.

python
few_shot_prompt = '''Classify the sentiment of customer reviews as Positive, Negative, or Neutral.

Review: "Absolutely love this! Works perfectly."
Sentiment: Positive

Review: "Terrible quality. Broke after one use."
Sentiment: Negative

Review: "It's okay, nothing special."
Sentiment: Neutral

Review: "The product arrived on time but the packaging was damaged."
Sentiment:'''

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": few_shot_prompt}],
    temperature=0
)

When to use: Task has a specific format the model might miss, domain-specific outputs, edge cases to cover.

Comparison

FactorZero-ShotFew-Shot
Tokens usedLowHigher (examples add tokens)
CostLowerHigher
LatencyLowerHigher
AccuracyGood for general tasksBetter for specific formats/domains
Setup effortNoneNeeds curated examples
Example dependencyNonePoor examples hurt accuracy

Few-Shot Example Selection Tips

The quality of few-shot examples matters enormously:

python
# ❌ Bad: all examples from the same class
examples = [
    ("Great product!", "Positive"),
    ("Amazing quality!", "Positive"),
    ("Loved it!", "Positive"),
    # Model over-predicts Positive
]

# ✅ Good: balanced, covering edge cases
examples = [
    ("Great product!", "Positive"),           # Clear positive
    ("Terrible quality, broke immediately.", "Negative"),  # Clear negative
    ("It's okay, nothing special.", "Neutral"), # Neutral
    ("Late delivery but product is good.", "Neutral"),  # Ambiguous case
]

Dynamic Few-Shot Selection

In production, retrieve the most relevant examples for each query using embeddings:

python
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# Pre-embed your example library
example_texts = [e[0] for e in example_library]
example_embeddings = model.encode(example_texts)

def get_relevant_examples(query: str, k: int = 3) -> list:
    query_embedding = model.encode([query])
    similarities = cosine_similarity(query_embedding, example_embeddings)[0]
    top_k_indices = np.argsort(similarities)[::-1][:k]
    return [example_library[i] for i in top_k_indices]

Production tip: Start zero-shot. If accuracy is insufficient, add few-shot examples. If still insufficient, consider fine-tuning. Always measure the accuracy delta — sometimes zero-shot with a better system prompt beats few-shot.