Answer
What is a Chunk in AI Agents and AI Models?
A chunk is a segment of text that has been split from a larger document for processing, embedding, or retrieval. Chunking is a core technique in RAG systems and any workflow where text is too large to process at once.
Why Chunking?
Two fundamental constraints make chunking necessary:
- Context window limit — LLMs can only process a fixed number of tokens at once
- Embedding quality — Embedding very long texts loses specificity; shorter chunks embed more accurately
textFull 50-page document (150,000 tokens) ↓ chunk Chunk 1: paragraphs 1-3 (500 tokens) Chunk 2: paragraphs 4-6 (500 tokens) Chunk 3: paragraphs 7-9 (500 tokens) ... Chunk N: last paragraphs (400 tokens) ↓ embed each chunk ↓ store in vector DB ↓ retrieve relevant chunks per query
Chunking Strategies
| Strategy | How | Best For |
|---|---|---|
| Fixed-size | Split every N tokens | Simple, consistent |
| Sliding window | Fixed size with overlap | Avoid losing context at boundaries |
| Sentence-based | Split at sentence boundaries | More natural, readable chunks |
| Paragraph-based | Split at paragraph boundaries | Preserves natural thought units |
| Semantic | Split when topic changes | Most accurate, harder to implement |
| Recursive | Try paragraph → sentence → word | LangChain's RecursiveCharacterTextSplitter |
Implementation Examples
pythonfrom langchain.text_splitter import RecursiveCharacterTextSplitter # Recursive chunking (most common in production) splitter = RecursiveCharacterTextSplitter( chunk_size=512, # Max tokens per chunk chunk_overlap=50, # Overlap between chunks (avoids boundary loss) separators=["\n\n", "\n", " ", ""] # Try these separators in order ) document = ''' Chapter 1: Introduction to AI Artificial Intelligence is the simulation of human intelligence... Chapter 2: Machine Learning Machine Learning is a subset of AI that learns from data... ''' chunks = splitter.split_text(document) print(f"Created {len(chunks)} chunks") for i, chunk in enumerate(chunks[:3]): print(f"Chunk {i}: {len(chunk)} chars — '{chunk[:60]}...'")
Fixed-Size Chunking with Overlap
pythondef chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]: words = text.split() chunks = [] start = 0 while start < len(words): end = min(start + chunk_size, len(words)) chunk = " ".join(words[start:end]) chunks.append(chunk) start += chunk_size - overlap # Overlap by 'overlap' words return chunks chunks = chunk_text(document, chunk_size=100, overlap=20)
Chunk Size Guidelines
| Chunk Size | Trade-offs |
|---|---|
| 128-256 tokens | High precision retrieval, may miss context |
| 512 tokens | Sweet spot — good balance |
| 1024 tokens | More context per chunk, less precise retrieval |
| 2048+ tokens | Risk losing retrieval accuracy |
Parent-Child Chunking
Advanced pattern: store small chunks for retrieval, return large parent for context:
pythonfrom langchain.retrievers import ParentDocumentRetriever # Child chunks: small (128 tokens) for precise retrieval # Parent chunks: large (512 tokens) returned as context retriever = ParentDocumentRetriever( vectorstore=chroma_db, docstore=in_memory_store, child_splitter=RecursiveCharacterTextSplitter(chunk_size=128), parent_splitter=RecursiveCharacterTextSplitter(chunk_size=512) )
Key Insight
Chunk quality directly determines RAG quality. Poor chunking is one of the most common reasons RAG systems underperform — chunks that are too large lose retrieval precision; chunks too small lose context for answering.