📖

RAG (Retrieval-Augmented Generation), Prompting Guide & Examples

RAG combines an AI language model with an external knowledge retrieval system. Instead of relying solely on training data, the model first retrieves relevant documents from a database, then generates answers grounded in those specific sources.

How It Works

Three components: (1) A vector database stores your documents as embeddings, (2) When a query arrives, relevant documents are retrieved via semantic search, (3) Retrieved documents are injected into the prompt as context, and the model generates an answer grounded in that context.

When to Use

Use RAG when you need answers based on specific documents, proprietary data, or frequently changing information. Essential for customer support bots, internal knowledge bases, legal document analysis, and any task requiring source-grounded answers.

Model-Specific Tips

ChatGPT / GPT-4

Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.

Claude

Claude's 200K context window is ideal for RAG, you can include many retrieved documents. Use XML tags to clearly separate context from instructions.

Gemini

Gemini 2.5 Pro's 1M token context excels at RAG. Google's Vertex AI provides built-in grounding with Google Search or custom data stores.

Pros & Cons

Pros

✓ Grounds responses in actual sources
✓ Dramatically reduces hallucinations
✓ Works with proprietary/private data
✓ Answers stay up-to-date with source updates

Cons

✗ Requires infrastructure (vector DB, embeddings)
✗ Retrieval quality limits answer quality
✗ More complex to set up than simple prompting
✗ Chunking strategy significantly impacts results

Example Prompts

Based ONLY on the following documentation, answer the user's question. If the answer isn't in the provided context, say 'I don't have information about that in my current sources.' Context: {retrieved_documents} Question: {user_question}

You are a support agent for Acme Corp. Answer using ONLY the knowledge base articles provided below. Cite the specific article number for each claim. KB Articles: {retrieved_articles} Customer question: {question}

Analyze this contract clause using the legal precedents provided. Do not reference any legal knowledge outside of these sources. Precedents: {retrieved_cases} Clause to analyze: {contract_clause}

FAQ

What is RAG (Retrieval-Augmented Generation)?

When should I use RAG (Retrieval-Augmented Generation)?

How does RAG (Retrieval-Augmented Generation) work?

Does RAG (Retrieval-Augmented Generation) work with ChatGPT?

Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.

Does RAG (Retrieval-Augmented Generation) work with Claude?

Claude's 200K context window is ideal for RAG, you can include many retrieved documents. Use XML tags to clearly separate context from instructions.

Model-Specific Tips

ChatGPT / GPT-4

Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.

Claude

Claude's 200K context window is ideal for RAG, you can include many retrieved documents. Use XML tags to clearly separate context from instructions.

Gemini

Gemini 2.5 Pro's 1M token context excels at RAG. Google's Vertex AI provides built-in grounding with Google Search or custom data stores.

Pros & Cons

Pros

✓ Grounds responses in actual sources
✓ Dramatically reduces hallucinations
✓ Works with proprietary/private data
✓ Answers stay up-to-date with source updates

Cons

✗ Requires infrastructure (vector DB, embeddings)
✗ Retrieval quality limits answer quality
✗ More complex to set up than simple prompting
✗ Chunking strategy significantly impacts results

Example Prompts

Analyze this contract clause using the legal precedents provided. Do not reference any legal knowledge outside of these sources. Precedents: {retrieved_cases} Clause to analyze: {contract_clause}

FAQ

What is RAG (Retrieval-Augmented Generation)?

When should I use RAG (Retrieval-Augmented Generation)?

How does RAG (Retrieval-Augmented Generation) work?

Does RAG (Retrieval-Augmented Generation) work with ChatGPT?

Use OpenAI's Assistants API with file search for built-in RAG. For custom setups, inject retrieved context into the system or user message.

Does RAG (Retrieval-Augmented Generation) work with Claude?

Claude's 200K context window is ideal for RAG, you can include many retrieved documents. Use XML tags to clearly separate context from instructions.

RAG (Retrieval-Augmented Generation), Prompting Guide & Examples

How It Works

When to Use

Model-Specific Tips

ChatGPT / GPT-4

Claude

Gemini

Pros & Cons

Pros

Cons

Example Prompts

FAQ

What to read next

10 Proven Prompt Techniques

Claude Prompts

AI Prompts for Developers

RAG (Retrieval-Augmented Generation), Prompting Guide & Examples

How It Works

When to Use

Model-Specific Tips

ChatGPT / GPT-4

Claude

Gemini

Pros & Cons

Pros

Cons

Example Prompts

FAQ