Concept #7Mediumgen-ai-fundamentals

Explain RAG. When would you use it over fine-tuning?

#gen-ai#rag#fine-tuning

Answer

RAG vs Fine-tuning

RAG and fine-tuning are both ways to customise an LLM, but they solve different problems. Understanding the distinction is one of the most common Gen AI learning topics.

Core Difference

  • RAG — gives the model access to external knowledge at inference time (retrieval)
  • Fine-tuning — changes the model's weights to alter its behaviour, style, or embedded knowledge

Comparison Table

FeatureRAGFine-tuning
Knowledge sourceExternal documents retrieved at runtimeBaked into model weights
Knowledge freshnessReal-time (update the vector store)Stale until retrained
Training requiredNoYes (GPU compute)
CostEmbedding + retrieval costsTraining cost (one-time)
TraceabilityCan show source documentsBlack box
Hallucination riskLower (grounded in retrieved docs)Higher for factual claims
Domain adaptationGood for factual Q&AGood for style, format, task behaviour
Data neededDocuments to indexLabelled training examples
Best for"What does our policy say about X?""Write code in our internal style"

RAG Example

python
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA

# Index your documents once
vectorstore = Chroma.from_documents(documents, OpenAIEmbeddings())

# Retrieve + generate at query time
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5})
)

answer = qa_chain.invoke("What is our refund policy?")
print(answer["result"])

When to Use What

Use RAG when:

  • Knowledge changes frequently (product docs, policies, news)
  • You need citations/traceability
  • You want to avoid hallucinations on factual questions
  • Your data is confidential and can't be used for training

Use fine-tuning when:

  • You need a specific output format/style consistently
  • You want to teach the model a proprietary task (e.g. domain-specific code patterns)
  • You want to distil a long system prompt into weights (saves tokens per request)

Use both (RAG + Fine-tuning):

  • Fine-tune for style and format
  • RAG for up-to-date factual grounding
  • This combination outperforms either approach alone in production

Best practice in production: Combine both — fine-tune for style and format, RAG for dynamic knowledge. This is called RAG + Fine-tuning and outperforms either approach alone.