LLMs can only consider a limited amount of text at once. This "context window" is their working memory, and understanding it explains many AI behaviors.
Why does ChatGPT sometimes "forget" what you told it earlier?
It hasn't forgotten. It simply can't see that part of your conversation anymore.
LLMs have a context window: a maximum number of tokens they can consider at once. Everything the model knows about your conversation must fit within this window. Your messages, its responses, any system instructions. All of it competes for the same limited space.
When a conversation grows too long, older content gets pushed out. The model isn't storing memories elsewhere. If it's outside the window, it's gone.
How big is the context window?
It varies by model and keeps growing:
Context Window Sizes (Tokens)
GPT-3 (2020)
4,096
GPT-4 (2023)
128,000
GPT-5.1 (2025)
256,000
Claude 4.5 (2025)
200,000
Gemini 2.0 (2025)
2,000,000
These numbers sound large, but they fill up fast. A back-and-forth conversation accumulates tokens quickly. Every message you send, every response generated, every piece of context provided by the application โ all counts against the limit.
Why can't they just make it bigger?
They're trying. Context windows have grown dramatically. But there are real constraints.
Computational cost scales with context length. The attention mechanism requires comparing every token to every other token. Double the context, roughly quadruple the computation.
Quality can degrade with length. Models trained on shorter contexts may struggle to use very long ones effectively. Research shows that information in the middle of long contexts often gets less attention than information at the beginning or end.
Memory requirements grow. Long contexts require storing more intermediate values. A million-token context needs substantially more GPU memory than a thousand-token one.
The 'lost in the middle' problem
This is a documented phenomenon. LLMs often attend well to the beginning (primacy) and the end (recency), but middle content can get overlooked.
It's a trained behavior, not a hardcoded bug. Training data often puts key information at beginnings and endings. The model learns these positions matter more.
Put critical information at page 50 of a 100-page document, and the model may effectively ignore it, even though it's technically within the context window. Researchers are actively working on architectures that handle long contexts more uniformly, but it remains an open challenge.
What does this mean for how you use AI?
Understanding context windows changes how you interact with AI tools.
Front-load important information. Put critical context early in your prompt where it's less likely to be truncated and more likely to receive attention.
Be concise. Verbose prompts waste tokens. Every unnecessary word is space that could hold useful context.
Start fresh when needed. If a conversation has gone on too long and the AI seems confused, starting a new conversation with a clear summary of relevant context often works better than continuing. Many users discover this folk knowledge through trial and error.
Provide context explicitly. Don't assume the model remembers previous conversations. Each session typically starts with an empty context window.
๐
Context Window Visualizer
Watch the context fill up and see what gets forgotten as conversation grows
The context window shapes AI capabilities
Many limitations people attribute to AI "intelligence" are actually context window constraints.
Can't maintain a coherent project over weeks? Context window. Contradicts earlier statements? Context window. Needs repeated reminders? Context window.
As context windows grow, some of these limitations will ease. But the fundamental constraint remains: the model can only reason about what it can currently see. There's no background knowledge store, no long-term memory, no filing cabinet. Everything happens on a desk of limited size, and when the desk fills, things fall off.