RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language model (LLM) outputs by combining them with external knowledge retrieval. Instead of relying solely on the model's training data, RAG systems first retrieve relevant documents or data from a knowledge base, then use this context to generate more accurate and grounded responses.

The RAG architecture typically consists of three main components:

Retriever: Searches through a vector database or document store to find relevant information based on the user's query. Common retrieval methods include dense vector search (using embeddings) and sparse keyword-based search.
Knowledge Base: A collection of documents, databases, or other information sources that the retriever can search. This can include internal documentation, product catalogs, research papers, or any domain-specific content.
Generator: The LLM that receives both the original query and the retrieved context to produce a final response that is grounded in the provided information.

RAG is particularly valuable for enterprise applications where accuracy, source attribution, and access to proprietary or frequently updated information are critical. It helps reduce hallucinations and enables models to provide verifiable answers with citations.

RAG (Retrieval-Augmented Generation)

Definition