Glossary
Prompt EngineeringFoundational

Context Window

The maximum number of tokens a model can process in a single prompt and response cycle.

Definition

A context window is the maximum number of tokens that a large language model can process in a single interaction, encompassing both the input prompt and the generated output. It represents the model's working memory, determining how much information it can consider at once when generating a response.

Key characteristics of context windows include:

  1. Token-Based Measurement: Context windows are measured in tokens, not words. A token is typically a word fragment, punctuation mark, or common word. English text averages roughly 1.3 tokens per word.

  2. Shared Budget: The context window is shared between input and output. A model with a 200K token window that receives a 150K token prompt can generate at most 50K tokens in response.

  3. Rapid Expansion: Context windows have grown dramatically, from GPT-3's 4K tokens in 2020 to models like Claude and Gemini supporting 200K or more tokens by 2025, enabling processing of entire codebases or books.

  4. Attention Trade-offs: While larger windows allow more context, models may attend less reliably to information in the middle of very long contexts, a phenomenon known as the "lost in the middle" problem.

  5. Cost Implications: Larger context windows increase inference costs, since processing scales with the number of tokens. This makes efficient context management a key engineering concern.

Last updated: 3/11/2026