What is an LLM?

Large Language Models are AI systems that have learned to predict and generate text by studying vast amounts of human writing.

What actually happens when you talk to ChatGPT?

When you type a message and hit send, the system receives your text and does something that sounds almost disappointingly simple: it predicts what text should come next.

That's the core of what a Large Language Model does. Given some text, predict what text would naturally follow. Your prompt becomes the beginning; the model's response is its best guess at a plausible continuation.

If it's just predicting text, why does it seem to understand things?

The model was trained by showing it enormous amounts of human writing (books, websites, conversations, code) and asking it, over and over: "What comes next?"

To predict well across such diverse text, the model had to develop something resembling comprehension. Consider what good prediction requires: to predict how a mystery story continues, you need to track clues and suspects. To predict the next line of code, you need to grasp what the code is doing. To predict how a physics explanation continues, you need to follow the logical thread.

Prediction, at sufficient scale, requires building internal models of how the world works. Not because understanding was the goal, but because understanding is useful for prediction.

How is this different from autocomplete on my phone?

Your phone's predictive text uses a small model that considers maybe a sentence and suggests common words. An LLM uses billions of parameters, considers thousands of words of context, and has absorbed patterns from trillions of words of training data.

This isn't just "bigger and more." Scale creates qualitative change. A small model learns that "Paris" often follows "The capital of France is." A larger model can write you a detailed historical analysis of Parisian urban development, synthesizing information it never saw explicitly combined during training.

A puddle evaporates imperceptibly, molecule by molecule. But keep heating water, and at 100°C something fundamental shifts: it flashes into steam. More of the same becomes something else entirely.

Language models undergo similar phase transitions. A small predictor is just autocomplete, useful for finishing "See you tom-" with "tomorrow." Increase the scale dramatically, and something qualitative shifts. The model doesn't just predict likely words; it generates coherent essays, debugs code, engages in philosophical discussion.

Nobody programmed these abilities. They crystallized from scale the way snowflakes crystallize from cold. Train a model to predict text well enough, across enough text, and it develops internal representations of grammar, facts, logic, emotion. Not because anyone asked for them, but because they're useful for the task of prediction.

→ Emergent capabilities and scaling laws

Researchers call these emergent capabilities: abilities that arise spontaneously as models grow larger, without being explicitly programmed.^[2] Nobody taught the model to summarize documents or translate between languages it rarely saw paired. These abilities surfaced from the pressure to predict text at massive scale.

How does it generate long, coherent responses?

One token at a time. Tokens are chunks of text, typically words or word-pieces. The model's parameters describe the probability distribution over all possible next tokens, and it samples from that distribution, adds the chosen token to the context, and repeats. A multi-paragraph response emerges token by token, each one conditioned on everything that came before.

Why would you want randomness in the answer?

For language, there usually isn't one correct continuation. Ask someone "How was your weekend?" and there are thousands of valid responses. The model faces the same situation at every token.

Without randomness, the model would always pick the highest-probability word. This sounds ideal but causes problems. The output becomes repetitive and mechanical. The model can get trapped in loops, repeating phrases because they keep being the most likely continuation of themselves.

Randomness lets the model explore the space of reasonable responses. You can adjust this through a parameter called "temperature." Lower temperature means more predictable, focused responses. Higher temperature means more creative, surprising ones.

What does this mean for how you use AI?

Understanding how LLMs work changes how you evaluate them.

When someone says an LLM "knows" something, you can understand that it has stored patterns around that concept. When it makes a confident mistake, you can understand that it predicted a plausible-sounding continuation that happened to be false. When new capabilities emerge in larger models, you can see that they are the result of the model optimizing for better prediction.

You're not working with a magical oracle or a search engine with personality. You're using a sophisticated pattern-matcher that has absorbed more human writing than any person could read in a thousand lifetimes. Its strengths and limitations flow directly from what it is: a system that learned to predict text, and to better do so, learned to make predictions about the world the text describes.

Sources & Further Reading

🎬 Video

But what is a GPT? Visual intro to transformers

3Blue1Brown · 2024

🎬 Video

Interpretability: Understanding how AI models think

Anthropic · 2025

🔗 Article

Tracing the thoughts of a large language model

Anthropic · 2025

🔗 Article

Large language model

Wikipedia

📄 Paper

Language Models are Few-Shot Learners[1]

Brown et al. · OpenAI · 2020

📄 Paper

Emergent Abilities of Large Language Models[2]

Wei et al. · Google Research · 2022