Skip to main content

RAG

RAG (Retrieval-Augmented Generation) is an AI framework that combines language models with external knowledge retrieval. Instead of relying solely on the model's training data, RAG enhances AI responses by retrieving relevant information from a separate knowledge base before generating the final output.

How RAG Works

  1. Retrieval: When a query is received, RAG searches through its knowledge base to find relevant documents or information
  2. Augmentation: The retrieved information is combined with the original query
  3. Generation: The language model generates a response using both the query and the retrieved context

Benefits

  • More accurate and up-to-date responses
  • Reduced hallucinations as responses are grounded in retrieved facts
  • Ability to access domain-specific knowledge
  • Transparent source attribution
  • Increased speed of answer compared to systems, where all facts are preserved in context.

Downsides

  • Requires additional setup for data indexing and retrieval
  • Higher infrastructure and operational costs due to additional components
  • More complex system architecture and maintenance
  • Performance depends heavily on retrieval quality
  • May introduce latency due to retrieval step
  • Requires careful tuning of retrieval algorithms
  • Data freshness and synchronization challenges
  • Limited by quality and coverage of knowledge base

Data retrieval algorithms

  • Semantic (Vector) search - done with building embeddings
  • Lexical search - traditional search by keywords. It is advised to use BM25 guidelines for building such search pipeline.
  • For best results it is advised to use Hybrid search - where results from semantic and lexical search pipelines are merged.