📏

Context Window

The maximum amount of text an AI model can consider at once — including your prompt, conversation history, and the response being generated.

Core Concept

Simple Explanation

The context window is the AI's working memory for a single conversation. It includes everything: your current question, previous messages in the conversation, any documents you've shared, and the response being generated. When you hit the context limit, older messages get dropped and the AI 'forgets' them. Modern models have huge context windows — Claude supports up to 1 million tokens, roughly 750,000 words.

💡An Analogy

Think of the context window as the AI's desk space. A small desk (4K tokens) only fits a few documents; you have to shuffle them to see different ones. A huge desk (1M tokens) can hold a whole book open at once, letting the AI reference any part instantly. The bigger the desk, the more complex the task the AI can handle without losing track.

Technical Detail

The context window is the maximum sequence length the transformer architecture can process in a single forward pass. It's constrained by (1) memory requirements that scale quadratically with context length in standard attention, (2) positional encoding schemes that affect long-context generalization, and (3) training cost — models must be trained to use long contexts effectively. Techniques like sparse attention, FlashAttention, and rotary positional embeddings (RoPE) have enabled context windows to grow from 512 tokens (original BERT) to over 1M tokens in modern flagship models.

Real Examples

Short context (4K-8K tokens)

Older models like GPT-3 had 4K contexts. Good for single-turn Q&A but lost memory in long conversations.

Medium context (32K-128K tokens)

Standard for most models today. GPT-4o has 128K, enough for 300 pages of text.

Long context (200K-2M tokens)

Claude 3/4 supports 200K natively, 1M in some tiers. Gemini 1.5 Pro handles 2M tokens.

What fits in 128K tokens

300 pages of a book, 10 research papers, a mid-size codebase, or several hours of transcribed audio.

Frequently Asked Questions

Does a bigger context window mean a better model?▼

Not necessarily. A bigger context window is like a bigger hard drive — useful, but doesn't make the model smarter. Research shows most models get worse at retrieval and reasoning as they use more of their context ('lost in the middle' effect). For many tasks, a well-designed RAG system with a smaller context outperforms stuffing everything into a large context.

What happens when I exceed the context window?▼

Different providers handle this differently. Most APIs return an error requiring you to truncate. ChatGPT and Claude's chat interfaces silently drop older messages when the context fills, causing the AI to 'forget' earlier parts of long conversations. Some tools use summarization to compress older context while preserving key information.

How do I make the most of long context?▼

Put the most important information near the beginning or end of your prompt — models show 'lost in the middle' effects for very long contexts. Structure long inputs with clear headers or delimiters. For complex tasks, chunking with explicit references often outperforms putting everything in one giant context. Test both approaches for your specific use case.

Related Terms

🔤

Token

The basic unit AI models use to process text — roughly corresponding to word parts, common words, or character sequences.

📚

Large Language Model (LLM)

A neural network trained on massive text data to understand and generate human-like language.

🔍

RAG (Retrieval-Augmented Generation)

A technique that lets AI models look up information before answering, improving accuracy and reducing hallucinations.

✍️

Prompt Engineering

The skill of writing instructions to AI models to get the best possible output.

Ready to apply these concepts?

Our free AI course teaches you to use these ideas in real projects.

Start Free AI Course →

📏