Chunking

Chunking is the process of splitting large documents into smaller, self-contained text segments that can be independently embedded, indexed, and retrieved. It is a critical preprocessing step in any RAG pipeline because embedding models and LLM context windows have token limits, and retrieval quality depends heavily on how well chunks align with meaningful units of information.

Key characteristics of chunking include:

Fixed-Size vs. Semantic Chunking: Simple approaches split text by character or token count with overlap. More sophisticated methods use sentence boundaries, paragraph breaks, or section headings to create chunks that preserve semantic coherence.
Chunk Size Trade-Offs: Smaller chunks provide more precise retrieval but may lack sufficient context. Larger chunks carry more context but risk diluting relevance. Typical sizes range from 256 to 1024 tokens depending on the use case.
Overlap Strategy: Adjacent chunks often share overlapping text (e.g., 50-100 tokens) to prevent important information at chunk boundaries from being lost during retrieval.
Hierarchical Chunking: Advanced strategies create parent-child relationships between chunks, allowing retrieval of a small relevant section while providing the surrounding context from its parent chunk to the LLM.

Effective chunking strategy is one of the highest-leverage optimizations in RAG system design.

Chunking

Definition