โ† All Techniques
๐Ÿ“–

RAG (Retrieval-Augmented Generation) โ€” Prompting Guide & Examples

RAG combines an AI language model with an external knowledge retrieval system. Instead of relying solely on training data, the model first retrieves relevant documents from a database, then generates answers grounded in those specific sources.

How It Works

Three components: (1) A vector database stores your documents as embeddings, (2) When a query arrives, relevant documents are retrieved via semantic search, (3) Retrieved documents are injected into the prompt as context, and the model generates an answer grounded in that context.

When to Use

Use RAG when you need answers based on specific documents, proprietary data, or frequently changing information. Essential for customer support bots, internal knowledge bases, legal document analysis, and any task requiring source-grounded answers.

Model-Specific Tips

ChatGPT / GPT-4

Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.

Claude

Claude's 200K context window is ideal for RAG โ€” you can include many retrieved documents. Use XML tags to clearly separate context from instructions.

Gemini

Gemini 1.5 Pro's 1M token context excels at RAG. Google's Vertex AI provides built-in grounding with Google Search or custom data stores.

Pros & Cons

Pros

  • โœ“ Grounds responses in actual sources
  • โœ“ Dramatically reduces hallucinations
  • โœ“ Works with proprietary/private data
  • โœ“ Answers stay up-to-date with source updates

Cons

  • โœ— Requires infrastructure (vector DB, embeddings)
  • โœ— Retrieval quality limits answer quality
  • โœ— More complex to set up than simple prompting
  • โœ— Chunking strategy significantly impacts results

Example Prompts

Based ONLY on the following documentation, answer the user's question. If the answer isn't in the provided context, say 'I don't have information about that in my current sources.' Context: {retrieved_documents} Question: {user_question}

You are a support agent for Acme Corp. Answer using ONLY the knowledge base articles provided below. Cite the specific article number for each claim. KB Articles: {retrieved_articles} Customer question: {question}

Analyze this contract clause using the legal precedents provided. Do not reference any legal knowledge outside of these sources. Precedents: {retrieved_cases} Clause to analyze: {contract_clause}

FAQ

What is RAG (Retrieval-Augmented Generation)?
RAG combines an AI language model with an external knowledge retrieval system. Instead of relying solely on training data, the model first retrieves relevant documents from a database, then generates answers grounded in those specific sources.
When should I use RAG (Retrieval-Augmented Generation)?
Use RAG when you need answers based on specific documents, proprietary data, or frequently changing information. Essential for customer support bots, internal knowledge bases, legal document analysis, and any task requiring source-grounded answers.
How does RAG (Retrieval-Augmented Generation) work?
Three components: (1) A vector database stores your documents as embeddings, (2) When a query arrives, relevant documents are retrieved via semantic search, (3) Retrieved documents are injected into the prompt as context, and the model generates an answer grounded in that context.
Does RAG (Retrieval-Augmented Generation) work with ChatGPT?
Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.
Does RAG (Retrieval-Augmented Generation) work with Claude?
Claude's 200K context window is ideal for RAG โ€” you can include many retrieved documents. Use XML tags to clearly separate context from instructions.