What is temperature in AI?

Temperature controls randomness in text generation. Low temperature means predictable, focused responses. High temperature means creative, varied ones.

Why does the same prompt sometimes give different answers?

When a model generates text, it doesn't just pick the most likely next token. It samples from a probability distribution. This controlled randomness produces variety.

Temperature is the dial that controls this randomness. Low temperature makes the distribution sharper, concentrating probability on likely tokens. High temperature flattens the distribution, giving unlikely tokens more chance.

How temperature works

The model outputs raw scores (logits) for each possible next token. Before sampling, these scores are divided by the temperature value, then converted to probabilities.

  • Temperature = 0: Always pick the highest probability token. Completely deterministic.
  • Temperature = 0.7: Moderate randomness. Usually sensible with some variety.
  • Temperature = 1.0: Standard randomness. Probabilities used as-is.
  • Temperature = 2.0: High randomness. Less likely tokens get much more chance.

Lower temperature means safer, more predictable text. Higher temperature means riskier, more surprising text.

When to use different temperatures

Low temperature (0-0.3):

  • Factual questions
  • Code generation
  • Data extraction
  • Anything where there's a "right" answer

Medium temperature (0.5-0.8):

  • General conversation
  • Explanations
  • Problem-solving
  • Most everyday use

High temperature (1.0+):

  • Creative writing
  • Brainstorming
  • Exploring alternatives
  • When you want surprises

The determinism question

With temperature=0, is output fully deterministic? Mostly, but not always.

Sources of non-determinism:

  • GPU floating-point operations can vary slightly
  • Batch composition might affect results
  • API providers may use sampling internally even at temperature=0
  • Model updates can change behavior

If you need reproducibility, some APIs offer seed parameters. Even then, exact reproduction isn't guaranteed across different hardware or model versions.

Temperature and quality

Higher temperature doesn't mean better or worse. It means different.

Too low: repetitive, boring, gets stuck in loops Too high: incoherent, random, loses the thread

The sweet spot depends on the task. There's no universally optimal temperature. Experimentation reveals what works for your specific use case.

๐ŸŒก๏ธ
Temperature Comparison
Adjust temperature and see how the same prompt produces different outputs

Temperature is a symptom

The need for temperature reveals something about how LLMs work. They don't compute "the answer." They compute a probability distribution over possible answers. Sampling from that distribution is where the specific response emerges.

This is fundamentally different from a calculator or search engine, which return definite results. The LLM always sees multiple possibilities. Temperature determines how it navigates them.

Sources & Further Reading

๐Ÿ”— Article
Language Model Sampling Methods
Hugging Face ยท 2020
๐Ÿ“„ Paper
๐Ÿ“– Docs