Concept #141Mediumextended-ai-concepts

What are the different types of RAG?

#gen-ai#rag

Answer

Types of RAG (Retrieval-Augmented Generation)

RAG has evolved significantly from a simple retrieve-and-generate pattern into a family of specialized architectures.

Core RAG Types

TypeDescriptionKey Innovation
Basic RAGSimple retrieve → generate pipelineFoundation
Advanced RAGImproved retrieval (pre/post processing)Better retrieval
Modular RAGPlug-and-play componentsFlexibility
Agentic RAGLLM controls retrieval decisionsAutonomy
Graph RAGKnowledge graph + retrievalRelationships
Self-RAGModel decides when to retrieveEfficiency
Corrective RAGEvaluates and corrects retrievalQuality
CAGCache full context instead of retrievalSpeed
Speculative RAGDraft then refineAccuracy

1. Basic RAG

python
def basic_rag(query: str) -> str:
    chunks = vector_db.search(query, k=3)
    context = "\n".join(chunks)
    return llm.invoke(f"Context: {context}\n\nQ: {query}")

One fixed pipeline — always retrieves, always generates.

2. Advanced RAG

Adds pre-retrieval and post-retrieval improvements:

python
def advanced_rag(query: str) -> str:
    # PRE-RETRIEVAL: Query transformation
    expanded_query = llm.invoke(f"Rewrite for better retrieval: {query}")
    hypothetical = llm.invoke(f"Generate a hypothetical answer to: {query}")

    # RETRIEVAL: Hybrid search
    semantic_results = vector_db.search(expanded_query, k=5)
    keyword_results = bm25_index.search(expanded_query, k=5)
    combined = merge_results(semantic_results, keyword_results)

    # POST-RETRIEVAL: Reranking
    reranked = cohere_reranker.rerank(query, combined, top_n=3)
    context = "\n".join(reranked)

    return llm.invoke(f"Context: {context}\n\nQ: {query}")

3. Self-RAG

Model generates special tokens to decide whether to retrieve:

python
def self_rag(query: str) -> str:
    # Step 1: Does this need retrieval?
    decision = llm.invoke(f"Does answering '{query}' require looking up documents? YES/NO")

    if "YES" in decision.upper():
        chunks = vector_db.search(query, k=4)

        # Step 2: Is the retrieved content relevant?
        for chunk in chunks:
            relevance = llm.invoke(f"Is this relevant to '{query}'? {chunk[:200]} YES/NO/PARTIALLY")

        # Step 3: Is the answer supported by retrieved content?
        response = llm.invoke(f"Context: {chunks}\n\nQ: {query}")
        supported = llm.invoke(f"Is this answer fully supported by the context? {response}")
        return response
    else:
        return llm.invoke(query)  # Direct answer from training

4. Corrective RAG (CRAG)

Evaluates retrieval quality and corrects if needed:

python
def corrective_rag(query: str) -> str:
    # 1. Retrieve
    chunks = vector_db.search(query, k=4)

    # 2. Evaluate quality
    quality = llm.invoke(
        f"Rate relevance 1-5 of these chunks for '{query}':\n{chunks}"
    )

    if float(quality) < 3:
        # 3. If poor quality — use web search as fallback
        web_results = web_search(query)
        chunks = chunks + web_results  # Combine

    # 4. Generate with (corrected) context
    return llm.invoke(f"Context: {chunks}\n\nQ: {query}")

5. HyDE (Hypothetical Document Embeddings)

Generate a fake answer first, embed it, use that embedding for retrieval:

python
def hyde_rag(query: str) -> str:
    # Generate hypothetical answer
    hypothetical_answer = llm.invoke(
        f"Write a hypothetical document that would answer: {query}"
    )

    # Embed the hypothetical answer (not the query)
    hyp_embedding = embedder.encode(hypothetical_answer)

    # Search with hypothetical embedding — finds real docs that match
    chunks = vector_db.search_by_embedding(hyp_embedding, k=4)

    # Generate real answer
    return llm.invoke(f"Context: {chunks}\n\nQ: {query}")

Choosing the Right RAG Type

Use CaseRecommended
Simple Q&ABasic RAG
High accuracy neededAdvanced RAG (hybrid + reranking)
Cost optimizationSelf-RAG or CAG
Complex entity queriesGraph RAG
Unreliable retrievalCorrective RAG
Autonomous researchAgentic RAG
Dense technical docsHyDE
Stable small knowledge baseCAG

RAG Evaluation Metrics

MetricMeasures
FaithfulnessIs answer grounded in retrieved context?
Answer relevanceDoes answer address the question?
Context precisionAre retrieved chunks actually useful?
Context recallWere all relevant chunks retrieved?