Explain RAG. When would you use it over fine-tuning?
#gen-ai#rag#fine-tuning
Answer
RAG vs Fine-tuning
RAG and fine-tuning are both ways to customise an LLM, but they solve different problems. Understanding the distinction is one of the most common Gen AI learning topics.
Core Difference
- RAG — gives the model access to external knowledge at inference time (retrieval)
- Fine-tuning — changes the model's weights to alter its behaviour, style, or embedded knowledge
Comparison Table
| Feature | RAG | Fine-tuning |
|---|---|---|
| Knowledge source | External documents retrieved at runtime | Baked into model weights |
| Knowledge freshness | Real-time (update the vector store) | Stale until retrained |
| Training required | No | Yes (GPU compute) |
| Cost | Embedding + retrieval costs | Training cost (one-time) |
| Traceability | Can show source documents | Black box |
| Hallucination risk | Lower (grounded in retrieved docs) | Higher for factual claims |
| Domain adaptation | Good for factual Q&A | Good for style, format, task behaviour |
| Data needed | Documents to index | Labelled training examples |
| Best for | "What does our policy say about X?" | "Write code in our internal style" |
RAG Example
pythonfrom langchain_community.vectorstores import Chroma from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain.chains import RetrievalQA # Index your documents once vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings()) # Retrieve + generate at query time qa_chain = RetrievalQA.from_chain_type( llm=ChatOpenAI(model="gpt-4o"), retriever=vectorstore.as_retriever(search_kwargs={"k": 5}) ) answer = qa_chain.invoke("What is our refund policy?") print(answer["result"])
When to Use What
Use RAG when:
- Knowledge changes frequently (product docs, policies, news)
- You need citations/traceability
- You want to avoid hallucinations on factual questions
- Your data is confidential and can't be used for training
Use fine-tuning when:
- You need a specific output format/style consistently
- You want to teach the model a proprietary task (e.g. domain-specific code patterns)
- You want to distil a long system prompt into weights (saves tokens per request)
Use both (RAG + Fine-tuning):
- Fine-tune for style and format
- RAG for up-to-date factual grounding
- This combination outperforms either approach alone in production
Best practice in production: Combine both — fine-tune for style and format, RAG for dynamic knowledge. This is called RAG + Fine-tuning and outperforms either approach alone.