Concept #87Mediumextended-ai-concepts

What is the difference between Basic RAG and Agentic RAG?

#gen-ai#rag#agents

Answer

Basic RAG vs Agentic RAG

RAG has evolved from a simple fixed pipeline to a sophisticated agent-controlled retrieval system.

Basic RAG

A fixed, linear pipeline — always retrieves, always generates:

text
Query → Embed → Search → Retrieve top-k → Generate
python
def basic_rag(question: str) -> str:
    results = vector_db.similarity_search(question, k=4)
    context = "\n".join([r.page_content for r in results])
    return llm.invoke(f"Context: {context}\n\nQuestion: {question}")

Limitations:

  • Single retrieval step — misses multi-hop questions
  • Same process for all queries — no decision-making
  • Can't choose between data sources
  • No self-correction if retrieval quality is poor

Agentic RAG

The LLM decides when, what, and how many times to retrieve:

python
from langchain.agents import AgentExecutor
from langchain.tools import Tool

# Retrieval is just one tool the agent can choose
tools = [
    Tool(
        name="search_knowledge_base",
        description="Search internal docs for company-specific info",
        func=lambda q: vector_db.similarity_search(q, k=3)
    ),
    Tool(
        name="search_web",
        description="Search internet for current public information",
        func=web_search
    ),
    Tool(
        name="query_database",
        description="Get specific records from SQL database",
        func=sql_query
    )
]

# Agent decides: which tool? how many times? stop when confident
agent = create_react_agent(llm, tools, react_prompt)
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=10)
result = executor.invoke({"input": "Compare our Q3 revenue to industry average"})

Comparison

DimensionBasic RAGAgentic RAG
ControlFixed pipelineLLM decides
RetrievalsExactly one0, 1, or many
Multi-hopPoorExcellent
Tool selectionNoYes
Self-correctionNoYes (re-retrieve on poor results)
LatencyLowerHigher
CostLowerHigher
ComplexityLowHigh

Agentic RAG Patterns

PatternDescription
Adaptive RAGAgent decides if retrieval is even needed
Iterative RAGRetrieve → assess → retrieve again if needed
Multi-source RAGAgent chooses between vector DB, web, SQL
Self-RAGModel generates
text
[Retrieve]
/
text
[No Retrieve]
tokens
Corrective RAGEvaluate retrieved docs; search web if quality is poor
Query decompositionBreak complex query into sub-queries

Self-RAG Decision Logic

python
def adaptive_rag(question: str) -> str:
    # First: does this question need retrieval?
    decision = llm.invoke(
        f"Does answering '{question}' require looking up specific documents? "
        "Answer YES or NO only."
    )

    if "YES" in decision.upper():
        # Retrieve and assess quality
        chunks = vector_db.search(question, k=5)
        quality = llm.invoke(f"Are these chunks relevant to '{question}'? {chunks}")

        if "NO" in quality.upper():
            chunks = web_search(question)  # Fallback to web

        return llm.invoke(f"Context: {chunks}\nQuestion: {question}")
    else:
        return llm.invoke(question)  # Answer from training knowledge

When to Use Each

Use Basic RAGUse Agentic RAG
Simple Q&A, FAQ botComplex research tasks
Single knowledge sourceMultiple data sources
Consistent, predictable queriesDiverse, open-ended queries
Low latency requirementHigh accuracy requirement
Small team, prototypeProduction enterprise