RAG

RAG (Retrieval-Augmented Generation) is an AI framework that combines language models with external knowledge retrieval. Instead of relying solely on the model's training data, RAG enhances AI responses by retrieving relevant information from a separate knowledge base before generating the final output.

How RAG Works

Retrieval: When a query is received, RAG searches through its knowledge base to find relevant documents or information
Augmentation: The retrieved information is combined with the original query
Generation: The language model generates a response using both the query and the retrieved context

Benefits

More accurate and up-to-date responses
Reduced hallucinations as responses are grounded in retrieved facts
Ability to access domain-specific knowledge
Transparent source attribution
Increased speed of answer compared to systems, where all facts are preserved in context.

Downsides

Requires additional setup for data indexing and retrieval
Higher infrastructure and operational costs due to additional components
More complex system architecture and maintenance
Performance depends heavily on retrieval quality
May introduce latency due to retrieval step
Requires careful tuning of retrieval algorithms
Data freshness and synchronization challenges
Limited by quality and coverage of knowledge base

Data retrieval algorithms

Semantic (Vector) search - done with building embeddings
Lexical search - traditional search by keywords. It is advised to use BM25 guidelines for building such search pipeline.
For best results it is advised to use Hybrid search - where results from semantic and lexical search pipelines are merged.

How RAG Works​

Benefits​

Downsides​

Data retrieval algorithms​

How RAG Works

Benefits

Downsides

Data retrieval algorithms