Concept #122Easyextended-ai-concepts

What is a chunk in AI agent or AI model?

#gen-ai#rag

Answer

What is a Chunk in AI Agents and AI Models?

A chunk is a segment of text that has been split from a larger document for processing, embedding, or retrieval. Chunking is a core technique in RAG systems and any workflow where text is too large to process at once.

Why Chunking?

Two fundamental constraints make chunking necessary:

  1. Context window limit — LLMs can only process a fixed number of tokens at once
  2. Embedding quality — Embedding very long texts loses specificity; shorter chunks embed more accurately
text
Full 50-page document (150,000 tokens)
         ↓ chunk
Chunk 1: paragraphs 1-3   (500 tokens)
Chunk 2: paragraphs 4-6   (500 tokens)
Chunk 3: paragraphs 7-9   (500 tokens)
...
Chunk N: last paragraphs  (400 tokens)
         ↓ embed each chunk
         ↓ store in vector DB
         ↓ retrieve relevant chunks per query

Chunking Strategies

StrategyHowBest For
Fixed-sizeSplit every N tokensSimple, consistent
Sliding windowFixed size with overlapAvoid losing context at boundaries
Sentence-basedSplit at sentence boundariesMore natural, readable chunks
Paragraph-basedSplit at paragraph boundariesPreserves natural thought units
SemanticSplit when topic changesMost accurate, harder to implement
RecursiveTry paragraph → sentence → wordLangChain's RecursiveCharacterTextSplitter

Implementation Examples

python
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Recursive chunking (most common in production)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,          # Max tokens per chunk
    chunk_overlap=50,        # Overlap between chunks (avoids boundary loss)
    separators=["\n\n", "\n", " ", ""]  # Try these separators in order
)

document = '''
Chapter 1: Introduction to AI

Artificial Intelligence is the simulation of human intelligence...

Chapter 2: Machine Learning

Machine Learning is a subset of AI that learns from data...
'''

chunks = splitter.split_text(document)
print(f"Created {len(chunks)} chunks")
for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i}: {len(chunk)} chars — '{chunk[:60]}...'")

Fixed-Size Chunking with Overlap

python
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
    words = text.split()
    chunks = []
    start = 0

    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunk = " ".join(words[start:end])
        chunks.append(chunk)
        start += chunk_size - overlap  # Overlap by 'overlap' words

    return chunks

chunks = chunk_text(document, chunk_size=100, overlap=20)

Chunk Size Guidelines

Chunk SizeTrade-offs
128-256 tokensHigh precision retrieval, may miss context
512 tokensSweet spot — good balance
1024 tokensMore context per chunk, less precise retrieval
2048+ tokensRisk losing retrieval accuracy

Parent-Child Chunking

Advanced pattern: store small chunks for retrieval, return large parent for context:

python
from langchain.retrievers import ParentDocumentRetriever

# Child chunks: small (128 tokens) for precise retrieval
# Parent chunks: large (512 tokens) returned as context

retriever = ParentDocumentRetriever(
    vectorstore=chroma_db,
    docstore=in_memory_store,
    child_splitter=RecursiveCharacterTextSplitter(chunk_size=128),
    parent_splitter=RecursiveCharacterTextSplitter(chunk_size=512)
)

Key Insight

Chunk quality directly determines RAG quality. Poor chunking is one of the most common reasons RAG systems underperform — chunks that are too large lose retrieval precision; chunks too small lose context for answering.