How do LLMs remember conversations?

LLMs are stateless. Each request starts fresh. What feels like memory is the conversation history being re-sent every time. Real memory requires external systems.

How does the AI remember what we discussed earlier?

It doesn't. Not really.

LLMs are stateless. Each API call is independent. The model doesn't retain information between requests. It doesn't remember you, your previous questions, or what it said before.

What feels like memory is an illusion created by resending the entire conversation history with each new message.

The context window trick

When you chat with an AI:

Message 1: "My name is Alex"
Response 1: "Nice to meet you, Alex!"

Message 2: "What's my name?"

Behind the scenes, message 2 actually sends:

User: My name is Alex
Assistant: Nice to meet you, Alex!
User: What's my name?

The model sees the full conversation and "remembers" by reading its own previous responses. This isn't memory. It's reading a transcript.

Why statelessness?

Stateless design has benefits:

Scalability: Any server can handle any request; no session affinity needed
Reliability: No state to lose if a server fails
Simplicity: Each request is self-contained
Privacy: No persistent storage of conversations (by default)

But it means "memory" must be explicitly managed by the application layer.

Beyond context: external memory systems

True memory requires storage outside the model:

Vector databases: Store conversation snippets as embeddings. Retrieve relevant past interactions when needed. "Remember when we discussed X?" triggers retrieval.

Key-value stores: Store specific facts. "User prefers dark mode" persists across sessions without consuming context.

Conversation summaries: Periodically summarize conversations. Store summaries for later retrieval. Compress history without losing key points.

Hybrid approaches: Keep recent messages in full context, retrieve from long-term storage for older interactions.

Memory Architecture

💬New MessageUser input

→

🔍RetrieveRelevant memories

→

📋ConstructContext + memories

→

🧠GenerateLLM response

→

💾StoreUpdate memories

Memory in practice

Different applications handle memory differently:

ChatGPT / Claude: Store conversation threads. Each thread has full history up to context limits. New threads start fresh.

Personal AI assistants: May store user preferences, past interactions, and personal facts in external databases.

Customer service bots: Often retrieve customer history from CRM systems. The model doesn't remember; it's told about the customer each time.

Coding assistants: May index your codebase. "Remember" your code by retrieving relevant files, not by persistent memory.

The model itself never remembers. The application manages what to tell it.

Working memory vs. long-term memory

Think of it in two layers:

Working memory (context window):

Immediate, limited capacity
Exactly what's in the current prompt
Fast but ephemeral
Lost when context is cleared

Long-term memory (external systems):

Persistent across sessions
Requires explicit retrieval
Can be structured or semantic
Application-dependent

Neither is truly "in" the model. Working memory is the prompt. Long-term memory is databases the application queries.

The memory illusion

When AI feels like it remembers you:

An application stored your preferences
Your conversation history was retrieved and included
Previous summaries were injected into the prompt
An external database was queried

The model is told these things. It doesn't recall them. The magic is in the application layer, not the model.

Understanding this helps you work with AI more effectively. Want it to "remember" something important? Make sure the application stores it. Long conversation losing coherence? Start fresh with a summary of key points.

Sources & Further Reading

🔗 Article

Memory and Context

LangChain

📖 Docs

Multi-turn Conversations

Anthropic