What's the difference between fine-tuning and prompt engineering?

Question

Accepted Answer

## Fine-tuning vs Prompt Engineering

Both adapt an LLM to a specific task, but they differ fundamentally in cost, flexibility, and when to use them.

### Prompt Engineering

**What it is:** Crafting the input text (system prompt, instructions, examples) to steer the model's output — no weight changes.

```python
from openai import OpenAI
client = OpenAI()

system_prompt = '''You are a senior Python code reviewer.
When given code, you:
1. Identify bugs and security issues
2. Suggest improvements with explanations
3. Rate code quality 1-10
Always respond in this format: Issues: ... Improvements: ... Rating: X/10'''

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Review this: def add(a,b): return a+b"}
    ]
)
```

### Fine-tuning

**What it is:** Updating model weights on a task-specific dataset to permanently change the model's behaviour.

```python
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset

dataset = load_dataset("json", data_files="training_data.jsonl")

trainer = SFTTrainer(
    model=model,  # Base or instruction-tuned model
    args=TrainingArguments(
        output_dir="./fine-tuned-model",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        learning_rate=2e-4,
    ),
    train_dataset=dataset["train"],
    dataset_text_field="text",
)
trainer.train()
```

### Decision Framework

| Criteria | Prompt Engineering | Fine-tuning |
|----------|-------------------|-------------|
| **Cost** | Near-zero | Medium (GPU compute) |
| **Speed to deploy** | Minutes | Hours to days |
| **Data needed** | 0–20 examples | 100–10K+ examples |
| **Latency** | Longer prompts = slower | Same model, faster (shorter prompts) |
| **Style/format control** | Good | Excellent |
| **Factual knowledge** | Limited to model's training | Can add domain knowledge |
| **Iteration speed** | Instant | Slow (retrain loop) |
| **API models (GPT-4o)** | ✅ Works | Limited (expensive) |

### When to Choose Each

**Choose prompt engineering first if:**
- You can get good results with a well-crafted system prompt
- Task changes frequently
- Budget is limited

**Choose fine-tuning if:**
1. You need consistent format/style the prompt alone can't achieve
2. You have proprietary domain knowledge not in the base model
3. You want to compress a long system prompt into model weights (faster inference)
4. You need style/format consistency at scale

> **Best practice in production:** Start with prompt engineering. Fine-tune only when prompt engineering plateaus. Always measure accuracy improvement before committing to a fine-tuning pipeline.

What's the difference between fine-tuning and prompt engineering?

Answer

Fine-tuning vs Prompt Engineering

Prompt Engineering

Fine-tuning

Decision Framework

When to Choose Each

Related Concepts

Explain the Transformer architecture. What are attention mechanisms and why are they important?

What's the difference between a Large Language Model (LLM) and other ML models?

Explain these LLM concepts: Tokens, Context window, Temperature & Top-p sampling, Beam search.

What's the difference between encoder-only, decoder-only, and encoder-decoder models?

Explain quantization in LLMs. Why is it important?

Criteria	Prompt Engineering	Fine-tuning
Cost	Near-zero	Medium (GPU compute)
Speed to deploy	Minutes	Hours to days
Data needed	0–20 examples	100–10K+ examples
Latency	Longer prompts = slower	Same model, faster (shorter prompts)
Style/format control	Good	Excellent
Factual knowledge	Limited to model's training	Can add domain knowledge
Iteration speed	Instant	Slow (retrain loop)
API models (GPT-4o)	✅ Works	Limited (expensive)