What is LLM Temperature?
The knob that makes the model focused or creative.
Temperature is a sampling parameter that controls how random an LLM's token selection is. At temperature 0 the model picks the single most probable next token every time (greedy decoding, fully deterministic); at higher values it samples from the probability distribution, yielding more varied output. Typical ranges are 0-0.3 for factual tasks, 0.5-0.8 for general chat, and 0.8-1.2 for creative writing. Temperature does not change what the model knows — only how it chooses words.
Temperature is a sampling parameter that controls how random an LLM's token selection is. At temperature 0 the model picks the single most probable next token every time (greedy decoding, fully deterministic); at higher values it samples from the probability distribution, yielding more varied output. Typical ranges are 0-0.3 for factual tasks, 0.5-0.8 for general chat, and 0.8-1.2 for creative writing. Temperature does not change what the model knows — only how it chooses words.
In depth
Examples
- →Temperature 0: asking 'what is 2+2' and always getting '4' — deterministic, correct, boring
- →Temperature 0.3: customer-support replies — reliable answers with a touch of natural variation
- →Temperature 0.7: ChatGPT default — good balance of coherence and liveliness for chat
- →Temperature 1.0: creative writing prompts — unexpected word choices, vivid phrasing, occasional weirdness
- →Temperature 1.5: intentionally weird generation for brainstorming or artistic exploration
- →Self-consistency CoT: sample 10 responses at temperature 0.7, take majority answer — works because of temperature-driven variety
- →Code generation at temperature 0.1: minimizes syntactic hallucination while keeping deterministic enough to debug
- →Tycoon Astra uses temperature 0.2 for structured reports, 0.7 for strategic exploration, 0.3 for customer-facing copy
Related terms
Frequently asked questions
Does temperature 0 give me reproducible results?
In theory yes, in practice usually not. At temperature 0 the model always picks the highest-probability token, so the output should be deterministic. But real deployments have floating-point nondeterminism on GPU, batch-size effects (what else is in the batch affects scheduling), and for mixture-of-experts models, router nondeterminism. OpenAI and Anthropic both publish caveats about this. For true reproducibility use the provider's seed parameter (OpenAI supports this). For 'close enough' reproducibility — same answer 95% of the time — temperature 0 is fine. For production tests where you assert exact output equality, don't rely on temperature alone.
Should I always use temperature 0 for factual tasks?
Mostly yes, but not always. Low temperature (0-0.2) is right when you want consistent, focused answers — extraction, classification, structured output. Going all the way to 0 sometimes hurts because the top token occasionally happens to be wrong (e.g., a subtle formatting difference) and the model locks in. Temperature 0.1-0.3 gives you nearly-deterministic behavior with tiny variation that can catch edge cases. For QA and RAG, 0.1 is a common sweet spot. For code generation, 0 or 0.1. For creative factual tasks like 'write a business plan,' 0.3-0.5 lets the model vary phrasing while still staying grounded.
What's the difference between temperature and top-p?
Temperature rescales the entire probability distribution before sampling. Top-p (nucleus sampling) truncates the distribution to the top tokens whose probabilities sum to p, then samples from those. High temperature + low top-p: the model is restricted to the top probable tokens but samples flatly among them. Low temperature + high top-p: the model has access to the full distribution but strongly favors the top token. In practice they produce similar-feeling output and you should adjust one at a time. The OpenAI docs recommend not modifying both simultaneously. Default advice: start with temperature alone, reach for top-p only if you've already hit its limits.
Why is higher temperature considered 'more creative'?
Because creativity in language is partly about taking less-obvious word choices. At low temperature the model picks the most probable token at each step, which produces highly-templated, cliché-heavy output. At higher temperature the model picks less-probable but still-plausible tokens, producing more varied phrasing, unexpected turns, and novel combinations. The tradeoff is coherence — beyond temperature 1.2 or so, outputs start to derail mid-sentence. For genuine creative writing, a moderate temperature (0.8-1.0) with good prompting beats extreme values. 'Creative' doesn't mean maximally random.
Does temperature affect hallucination?
Yes, but less than people assume. Higher temperature increases variance in the output, including variance in the errors — a hallucination at temperature 0 and a hallucination at temperature 1 are differently wrong but both wrong. Low temperature does not prevent hallucination; it just makes it more consistent. The root cause of hallucination is that the model's highest-probability next token is confidently wrong when the question falls outside its reliable knowledge. Temperature modulates how often you see the most-probable answer vs alternatives, not whether that answer is true. For reducing hallucination you need RAG, grounding, and refusal training — temperature is a minor factor.
Run your one-person company.
Hire your AI team in 30 seconds. Start for free.
Free to start · No credit card required · Set up in 30 seconds