Concept #139Hardextended-ai-concepts

What is Graph RAG?

#gen-ai#rag

Answer

What is Graph RAG?

Graph RAG (Graph-based Retrieval-Augmented Generation) is an advanced RAG approach that uses a knowledge graph instead of (or in addition to) a vector database to organize and retrieve information, enabling multi-hop reasoning and better understanding of relationships between entities.

Why Graph RAG?

Standard RAG retrieves text chunks based on similarity — it works well for direct questions but struggles with:

  • Multi-hop questions: "Who is the CEO of the company that acquired OpenAI's competitor?"
  • Relational questions: "What are all the products from companies that Apple acquired?"
  • Structural queries: "How does entity A connect to entity B?"

Graph RAG solves this by storing information as a knowledge graph where entities are nodes and relationships are edges.

Knowledge Graph Structure

text
Standard Vector RAG:
  Documents → Chunks → Embeddings → Vector DB
  Query → Similar chunks → LLM answers

Graph RAG:
  Documents → Entity extraction → Knowledge Graph
  Query → Graph traversal + vector search → LLM answers
text
Example Knowledge Graph:
  [Anthropic] --founded_by--> [Dario Amodei]
  [Anthropic] --created--> [Claude]
  [Claude] --is_a--> [LLM]
  [Dario Amodei] --previously_at--> [OpenAI]
  [OpenAI] --created--> [GPT-4]
  [GPT-4] --is_a--> [LLM]

Microsoft GraphRAG (Popular Implementation)

Microsoft released GraphRAG as an open-source library:

bash
pip install graphrag
python
# GraphRAG workflow
from graphrag.index import create_pipeline_config
from graphrag.index.run import run_pipeline_with_config

# 1. Indexing: extract entities and relationships from documents
await run_pipeline_with_config(
    config=create_pipeline_config(
        input_dir="./documents",
        output_dir="./graphrag_output",
        llm_model="gpt-4o"
    )
)

# 2. Query: uses both global (graph) and local (vector) search
from graphrag.query.indexer_adapters import read_indexer_entities
from graphrag.query.llm.oai.chat_openai import ChatOpenAI

# Local search (specific entity questions)
result = await local_search.asearch(
    "What are Anthropic's main AI safety initiatives?"
)

# Global search (broad thematic questions)
result = await global_search.asearch(
    "What are the major themes in AI safety research?"
)

Building a Simple Graph RAG

python
import networkx as nx
from anthropic import Anthropic
import chromadb

client = Anthropic()

class SimpleGraphRAG:
    def __init__(self):
        self.graph = nx.DiGraph()
        self.vector_db = chromadb.Client().create_collection("docs")

    def add_document(self, doc_id: str, text: str):
        # 1. Extract entities and relationships using LLM
        extraction = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content":
                f"Extract entities and relationships from this text as JSON. "
                f"Format: {{entities: [], relationships: [[entity1, relation, entity2]]}}\n\n{text}"}]
        )
        import json
        data = json.loads(extraction.content[0].text)

        # 2. Add to knowledge graph
        for entity in data.get("entities", []):
            self.graph.add_node(entity)
        for rel in data.get("relationships", []):
            if len(rel) == 3:
                self.graph.add_edge(rel[0], rel[2], relation=rel[1])

        # 3. Also add to vector store for similarity search
        self.vector_db.add(documents=[text], ids=[doc_id])

    def query(self, question: str) -> str:
        # 1. Vector search for relevant chunks
        results = self.vector_db.query(query_texts=[question], n_results=3)

        # 2. Extract entities from question
        entities_in_question = [
            node for node in self.graph.nodes
            if node.lower() in question.lower()
        ]

        # 3. Traverse graph for related entities
        graph_context = []
        for entity in entities_in_question:
            neighbors = list(self.graph.neighbors(entity))
            for neighbor in neighbors:
                edge_data = self.graph.edges[entity, neighbor]
                graph_context.append(f"{entity} --{edge_data.get('relation', 'relates_to')}--> {neighbor}")

        # 4. Combine vector + graph context
        context = "\n".join(results["documents"][0] + graph_context)

        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=512,
            messages=[{"role": "user", "content":
                f"Context:\n{context}\n\nQuestion: {question}"}]
        )
        return response.content[0].text

Graph RAG vs Standard RAG

Standard RAGGraph RAG
StructureFlat chunksEntities + relationships
Multi-hopPoorExcellent
Relationship queriesPoorExcellent
Setup complexityLowHigh
CostLowerHigher (entity extraction)
Best forDirect fact lookupComplex reasoning, analysis