How do LLMs remember conversations?

LLMs are stateless. Each request starts fresh. What feels like memory is the conversation history being re-sent every time. Real memory requires external systems.

How does the AI remember what we discussed earlier?

It doesn't. Not really.

LLMs are stateless. Each API call is independent. The model doesn't retain information between requests. It doesn't remember you, your previous questions, or what it said before.

What feels like memory is an illusion created by resending the entire conversation history with each new message.

The context window trick

When you chat with an AI:

Message 1: "My name is Alex"
Response 1: "Nice to meet you, Alex!"

Message 2: "What's my name?"

Behind the scenes, message 2 actually sends:

User: My name is Alex
Assistant: Nice to meet you, Alex!
User: What's my name?

The model sees the full conversation and "remembers" by reading its own previous responses. This isn't memory. It's reading a transcript.

Why statelessness?

Stateless design has benefits:

  • Scalability: Any server can handle any request; no session affinity needed
  • Reliability: No state to lose if a server fails
  • Simplicity: Each request is self-contained
  • Privacy: No persistent storage of conversations (by default)

But it means "memory" must be explicitly managed by the application layer.

Beyond context: external memory systems

True memory requires storage outside the model:

Vector databases: Store conversation snippets as embeddings. Retrieve relevant past interactions when needed. "Remember when we discussed X?" triggers retrieval.

Key-value stores: Store specific facts. "User prefers dark mode" persists across sessions without consuming context.

Conversation summaries: Periodically summarize conversations. Store summaries for later retrieval. Compress history without losing key points.

Hybrid approaches: Keep recent messages in full context, retrieve from long-term storage for older interactions.

Working memory vs. long-term memory

Think of it in two layers:

Working memory (context window):

  • Immediate, limited capacity
  • Exactly what's in the current prompt
  • Fast but ephemeral
  • Lost when context is cleared

Long-term memory (external systems):

  • Persistent across sessions
  • Requires explicit retrieval
  • Can be structured or semantic
  • Application-dependent

Neither is truly "in" the model. Working memory is the prompt. Long-term memory is databases the application queries.

The memory illusion

When AI feels like it remembers you:

  • An application stored your preferences
  • Your conversation history was retrieved and included
  • Previous summaries were injected into the prompt
  • An external database was queried

The model is told these things. It doesn't recall them. The magic is in the application layer, not the model.

Understanding this helps you work with AI more effectively. Want it to "remember" something important? Make sure the application stores it. Long conversation losing coherence? Start fresh with a summary of key points.

Sources & Further Reading

๐Ÿ”— Article
๐Ÿ“– Docs