What is emergent behavior?

Emergence is when simple rules produce complex behavior. In LLMs, capabilities like reasoning appear spontaneously at scale, not from explicit programming.

Why do capabilities suddenly appear at certain scales?

You might expect AI capabilities to improve smoothly as models get bigger. Instead, some abilities are absent in smaller models, then suddenly present in larger ones. Like flipping a switch.

This is emergence: capabilities that arise from scale without being explicitly programmed. The model wasn't trained to do arithmetic. Yet at sufficient size, it can. Nobody taught it to translate between languages it rarely saw paired. Yet it does.

Emergence beyond AI

This phenomenon isn't unique to language models. It appears throughout nature and mathematics:

Water: Individual H₂O molecules don't have wetness. But enough molecules together are wet.
Flocking: Each bird follows simple rules. The flock exhibits complex, coordinated patterns.
Consciousness: Individual neurons aren't conscious. Yet somehow, enough neurons together produce experience.
Cities: No central planner designs traffic patterns. They emerge from individual decisions.

In each case, the whole exhibits properties absent from the parts. Simple components, simple rules, complex outcomes.

Emergent capabilities in LLMs

Documented emergent capabilities include:

Multi-step arithmetic: Small models can't add three-digit numbers. Large models can.
Chain-of-thought reasoning: The ability to work through problems step by step.
Cross-lingual transfer: Learning a skill in one language, applying it in another.
Theory of mind: Modeling what other agents might believe or want.
Code execution tracing: Mentally stepping through code to predict outputs.

These capabilities appear at different scale thresholds. Below the threshold: failure. Above it: success. The transition can be sharp.

Why does prediction create reasoning?

This is the core mystery. The training objective is simple: predict the next token. How does this create capabilities like reasoning or planning?

One hypothesis: to predict well across diverse text, you must model the processes that generate text. Humans reason, plan, and know facts. Text reflects this. To predict text well, the model must develop something like reasoning, planning, and factual knowledge.

The prediction task is simple. But achieving good prediction on all human text is not simple. It requires modeling the full complexity of human thought. LLMs are better at tasks heavily documented in text than purely physical ones, partly because humans writing for other humans assume a baseline of embodied physical understanding that LLMs lack.

Conway's Game of Life

Glider

32×24

10/s

Gen0Pop5

Emergence and unpredictability

Emergence is partly why AI capabilities are hard to forecast. We can predict loss improvements from scaling laws. We cannot easily predict which capabilities will emerge at which scales.

This makes frontier AI development partly exploratory. You build bigger models partly to discover what they can do. The capabilities weren't specified in advance; they're found empirically.

What should you take away?

Complex capabilities can arise from simple objectives applied at scale. Predicting text well enough requires modeling the world. We can't fully predict what will emerge as models grow.

Emergence is the bridge between simple training objectives and surprising capabilities. It's both the source of LLMs' impressive abilities and the reason their development contains genuine uncertainty.

Sources & Further Reading

📄 Paper

Emergent Abilities of Large Language Models

Wei et al. · Google Research · 2022

📄 Paper

Are Emergent Abilities of Large Language Models a Mirage?

Schaeffer et al. · Stanford · 2023

🔗 Article

Emergence

Wikipedia