The maximum amount of text an AI model can consider at once — including your prompt, conversation history, and the response being generated.
The context window is the AI's working memory for a single conversation. It includes everything: your current question, previous messages in the conversation, any documents you've shared, and the response being generated. When you hit the context limit, older messages get dropped and the AI 'forgets' them. Modern models have huge context windows — Claude supports up to 1 million tokens, roughly 750,000 words.
Think of the context window as the AI's desk space. A small desk (4K tokens) only fits a few documents; you have to shuffle them to see different ones. A huge desk (1M tokens) can hold a whole book open at once, letting the AI reference any part instantly. The bigger the desk, the more complex the task the AI can handle without losing track.
The context window is the maximum sequence length the transformer architecture can process in a single forward pass. It's constrained by (1) memory requirements that scale quadratically with context length in standard attention, (2) positional encoding schemes that affect long-context generalization, and (3) training cost — models must be trained to use long contexts effectively. Techniques like sparse attention, FlashAttention, and rotary positional embeddings (RoPE) have enabled context windows to grow from 512 tokens (original BERT) to over 1M tokens in modern flagship models.
Older models like GPT-3 had 4K contexts. Good for single-turn Q&A but lost memory in long conversations.
Standard for most models today. GPT-4o has 128K, enough for 300 pages of text.
Claude 3/4 supports 200K natively, 1M in some tiers. Gemini 1.5 Pro handles 2M tokens.
300 pages of a book, 10 research papers, a mid-size codebase, or several hours of transcribed audio.
Not necessarily. A bigger context window is like a bigger hard drive — useful, but doesn't make the model smarter. Research shows most models get worse at retrieval and reasoning as they use more of their context ('lost in the middle' effect). For many tasks, a well-designed RAG system with a smaller context outperforms stuffing everything into a large context.
Different providers handle this differently. Most APIs return an error requiring you to truncate. ChatGPT and Claude's chat interfaces silently drop older messages when the context fills, causing the AI to 'forget' earlier parts of long conversations. Some tools use summarization to compress older context while preserving key information.
Put the most important information near the beginning or end of your prompt — models show 'lost in the middle' effects for very long contexts. Structure long inputs with clear headers or delimiters. For complex tasks, chunking with explicit references often outperforms putting everything in one giant context. Test both approaches for your specific use case.
The basic unit AI models use to process text — roughly corresponding to word parts, common words, or character sequences.
📚A neural network trained on massive text data to understand and generate human-like language.
🔍A technique that lets AI models look up information before answering, improving accuracy and reducing hallucinations.
✍️The skill of writing instructions to AI models to get the best possible output.
Our free AI course teaches you to use these ideas in real projects.
Start Free AI Course →