Concept #26Mediumpython-for-gen-ai

Compare LangChain, LlamaIndex, and Haystack. When would you use each?

#gen-ai#langchain#llm

Answer

LangChain vs LlamaIndex vs Haystack

These are the three major Python frameworks for building LLM applications. Each has a different focus and sweet spot.

LangChain

Focus: General-purpose LLM application framework — chains, agents, tools, memory.

Best for: Agentic workflows, tool use, multi-step reasoning, complex pipelines.

python
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain.tools import Tool
from langchain import hub

llm = ChatOpenAI(model="gpt-4o")

tools = [
    Tool(name="Search", func=search_web, description="Search the internet"),
    Tool(name="Calculator", func=calculate, description="Perform math"),
]

prompt = hub.pull("hwchase17/react")
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({"input": "What is the current Bitcoin price in EUR?"})

Pros: Huge ecosystem, excellent agent/tool support, active community Cons: Abstraction can be leaky; frequent breaking changes between versions

LlamaIndex

Focus: Data ingestion, indexing, and retrieval — purpose-built for RAG.

Best for: Building robust RAG systems over complex document collections, structured data querying.

python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

# Load and parse documents
documents = SimpleDirectoryReader("./data").load_data()
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=64)

# Build index (handles chunking, embedding, storage)
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[splitter],
)

# Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is our refund policy?")
print(response)
print(response.source_nodes)  # See which chunks were used

Pros: Best-in-class RAG primitives, great data connectors, structured data support Cons: Less flexible for non-RAG use cases; smaller community than LangChain

Haystack

Focus: Production-grade NLP pipelines — especially search, Q&A, and document processing.

Best for: Enterprise document search, hybrid (keyword + semantic) search, batch processing.

python
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

template = '''Given these documents, answer the question.
Documents: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{question}}
Answer:'''

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=store))
pipeline.add_component("prompt_builder", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))
pipeline.connect("retriever", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")

result = pipeline.run({"retriever": {"query": "What is RAG?"}, "prompt_builder": {"question": "What is RAG?"}})

Pros: Excellent hybrid search, production-tested, clean pipeline abstraction Cons: Steeper learning curve; less agent/tool support

Comparison Table

FeatureLangChainLlamaIndexHaystack
Primary focusChains & agentsRAG & retrievalSearch pipelines
RAG supportGoodExcellentExcellent
Agent/tool useExcellentBasicLimited
Hybrid searchBasicGoodExcellent
Learning curveModerateModerateHigh
CommunityLargestGrowingSmaller
Production maturityImprovingImprovingHigh

When to Choose Each

  • LangChain — building autonomous agents, complex multi-step workflows, chatbots with tools
  • LlamaIndex — RAG over structured/unstructured documents, complex retrieval strategies
  • Haystack — enterprise search, hybrid retrieval, production document Q&A pipelines

Pragmatic advice: Start with LangChain for most projects — it's the most versatile. Switch to LlamaIndex if your RAG retrieval quality is the bottleneck. Consider Haystack if you need robust hybrid search at enterprise scale.