What are the different types of RAG?

Question

Accepted Answer

## Types of RAG (Retrieval-Augmented Generation)

RAG has evolved significantly from a simple retrieve-and-generate pattern into a family of specialized architectures.

### Core RAG Types

| Type | Description | Key Innovation |
|------|-------------|--------------|
| **Basic RAG** | Simple retrieve → generate pipeline | Foundation |
| **Advanced RAG** | Improved retrieval (pre/post processing) | Better retrieval |
| **Modular RAG** | Plug-and-play components | Flexibility |
| **Agentic RAG** | LLM controls retrieval decisions | Autonomy |
| **Graph RAG** | Knowledge graph + retrieval | Relationships |
| **Self-RAG** | Model decides when to retrieve | Efficiency |
| **Corrective RAG** | Evaluates and corrects retrieval | Quality |
| **CAG** | Cache full context instead of retrieval | Speed |
| **Speculative RAG** | Draft then refine | Accuracy |

### 1. Basic RAG

```python
def basic_rag(query: str) -> str:
    chunks = vector_db.search(query, k=3)
    context = "
".join(chunks)
    return llm.invoke(f"Context: {context}

Q: {query}")
```

One fixed pipeline — always retrieves, always generates.

### 2. Advanced RAG

Adds pre-retrieval and post-retrieval improvements:

```python
def advanced_rag(query: str) -> str:
    # PRE-RETRIEVAL: Query transformation
    expanded_query = llm.invoke(f"Rewrite for better retrieval: {query}")
    hypothetical = llm.invoke(f"Generate a hypothetical answer to: {query}")

# RETRIEVAL: Hybrid search
    semantic_results = vector_db.search(expanded_query, k=5)
    keyword_results = bm25_index.search(expanded_query, k=5)
    combined = merge_results(semantic_results, keyword_results)

# POST-RETRIEVAL: Reranking
    reranked = cohere_reranker.rerank(query, combined, top_n=3)
    context = "
".join(reranked)

return llm.invoke(f"Context: {context}

Q: {query}")
```

### 3. Self-RAG

Model generates special tokens to decide whether to retrieve:

```python
def self_rag(query: str) -> str:
    # Step 1: Does this need retrieval?
    decision = llm.invoke(f"Does answering '{query}' require looking up documents? YES/NO")

if "YES" in decision.upper():
        chunks = vector_db.search(query, k=4)

# Step 2: Is the retrieved content relevant?
        for chunk in chunks:
            relevance = llm.invoke(f"Is this relevant to '{query}'? {chunk[:200]} YES/NO/PARTIALLY")

# Step 3: Is the answer supported by retrieved content?
        response = llm.invoke(f"Context: {chunks}

Q: {query}")
        supported = llm.invoke(f"Is this answer fully supported by the context? {response}")
        return response
    else:
        return llm.invoke(query)  # Direct answer from training
```

### 4. Corrective RAG (CRAG)

Evaluates retrieval quality and corrects if needed:

```python
def corrective_rag(query: str) -> str:
    # 1. Retrieve
    chunks = vector_db.search(query, k=4)

# 2. Evaluate quality
    quality = llm.invoke(
        f"Rate relevance 1-5 of these chunks for '{query}':
{chunks}"
    )

if float(quality) < 3:
        # 3. If poor quality — use web search as fallback
        web_results = web_search(query)
        chunks = chunks + web_results  # Combine

# 4. Generate with (corrected) context
    return llm.invoke(f"Context: {chunks}

Q: {query}")
```

### 5. HyDE (Hypothetical Document Embeddings)

Generate a fake answer first, embed it, use that embedding for retrieval:

```python
def hyde_rag(query: str) -> str:
    # Generate hypothetical answer
    hypothetical_answer = llm.invoke(
        f"Write a hypothetical document that would answer: {query}"
    )

# Embed the hypothetical answer (not the query)
    hyp_embedding = embedder.encode(hypothetical_answer)

# Search with hypothetical embedding — finds real docs that match
    chunks = vector_db.search_by_embedding(hyp_embedding, k=4)

# Generate real answer
    return llm.invoke(f"Context: {chunks}

Q: {query}")
```

### Choosing the Right RAG Type

| Use Case | Recommended |
|---------|-------------|
| Simple Q&A | Basic RAG |
| High accuracy needed | Advanced RAG (hybrid + reranking) |
| Cost optimization | Self-RAG or CAG |
| Complex entity queries | Graph RAG |
| Unreliable retrieval | Corrective RAG |
| Autonomous research | Agentic RAG |
| Dense technical docs | HyDE |
| Stable small knowledge base | CAG |

### RAG Evaluation Metrics

| Metric | Measures |
|--------|---------|
| **Faithfulness** | Is answer grounded in retrieved context? |
| **Answer relevance** | Does answer address the question? |
| **Context precision** | Are retrieved chunks actually useful? |
| **Context recall** | Were all relevant chunks retrieved? |

What are the different types of RAG?

Answer

Types of RAG (Retrieval-Augmented Generation)

Core RAG Types

1. Basic RAG

2. Advanced RAG

3. Self-RAG

4. Corrective RAG (CRAG)

5. HyDE (Hypothetical Document Embeddings)

Choosing the Right RAG Type

RAG Evaluation Metrics

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Type	Description	Key Innovation
Basic RAG	Simple retrieve → generate pipeline	Foundation
Advanced RAG	Improved retrieval (pre/post processing)	Better retrieval
Modular RAG	Plug-and-play components	Flexibility
Agentic RAG	LLM controls retrieval decisions	Autonomy
Graph RAG	Knowledge graph + retrieval	Relationships
Self-RAG	Model decides when to retrieve	Efficiency
Corrective RAG	Evaluates and corrects retrieval	Quality
CAG	Cache full context instead of retrieval	Speed
Speculative RAG	Draft then refine	Accuracy

Use Case	Recommended
Simple Q&A	Basic RAG
High accuracy needed	Advanced RAG (hybrid + reranking)
Cost optimization	Self-RAG or CAG
Complex entity queries	Graph RAG
Unreliable retrieval	Corrective RAG
Autonomous research	Agentic RAG
Dense technical docs	HyDE
Stable small knowledge base	CAG

Metric	Measures
Faithfulness	Is answer grounded in retrieved context?
Answer relevance	Does answer address the question?
Context precision	Are retrieved chunks actually useful?
Context recall	Were all relevant chunks retrieved?