What is Chain-of-Thought Prompting?
The prompting technique that made LLMs do math.
Chain-of-thought (CoT) prompting is a technique where an LLM is instructed to write out its intermediate reasoning steps before producing a final answer. Introduced by Wei et al. at Google Research in 2022, CoT dramatically improves accuracy on arithmetic, commonsense, and symbolic reasoning tasks by forcing the model to decompose problems instead of jumping to an answer.
Chain-of-thought (CoT) prompting is a technique where an LLM is instructed to write out its intermediate reasoning steps before producing a final answer. Introduced by Wei et al. at Google Research in 2022, CoT dramatically improves accuracy on arithmetic, commonsense, and symbolic reasoning tasks by forcing the model to decompose problems instead of jumping to an answer.
In depth
Examples
- →Zero-shot: appending 'Let's think step by step' to a math word problem and getting the full derivation
- →Few-shot: showing 3 solved algebra problems with worked steps, then the model solves a 4th in the same format
- →Self-consistency: sampling 10 independent CoT traces for a tricky logic puzzle and taking the majority answer
- →OpenAI o1 and GPT-5 with internal reasoning tokens, where the user never sees the chain but benefits from it
- →Anthropic Claude 4.5 thinking mode that produces an extended-thought block before the visible reply
- →DeepSeek R1 showing its full reasoning trace in the output — cheap, transparent, and popular with developers
- →AI support agents using CoT to diagnose a bug report: 'First check the error, then compare to known issues, then propose a fix'
Related terms
Frequently asked questions
Does chain-of-thought actually make answers more accurate or just longer?
Both, but the accuracy gain is real and measurable. On GSM8K (grade-school math), PaLM 540B went from 18% to 57% accuracy with CoT. On BIG-Bench hard reasoning tasks, gains of 10-30 absolute points are typical. The mechanism is not just 'more text equals better' — randomly padding output does nothing. What works is generating genuine intermediate steps that decompose the problem. On tasks that don't require multi-step reasoning (simple factual recall, translation), CoT doesn't help and can slightly hurt by introducing extraneous detail.
Is chain-of-thought still necessary if I'm using a reasoning model like o1 or Claude 4.5 thinking?
Largely no, for the tasks those models were trained on. Reasoning models produce internal CoT automatically and their training teaches them when longer thinking helps. You usually don't need to prompt 'think step by step' — the model decides. But for tasks outside their training distribution (domain-specific workflows, unusual formats), explicit CoT prompting still helps because it shapes how the model structures its reasoning. The rule of thumb: if you're getting wrong answers from a reasoning model, add explicit CoT instructions describing the reasoning structure you want.
What's the cost difference between CoT and direct answering?
CoT typically produces 5-20x more output tokens than a direct answer. Output tokens are 3-5x more expensive than input tokens on most providers, so a CoT response can cost 15-100x a direct one. For batch scoring or classification at scale this matters a lot. Workarounds: use self-consistency only for ambiguous cases, cache the reasoning trace when the same question repeats, or switch to a smaller reasoning-trained model like DeepSeek R1 Distill which gives much of the benefit cheaper. For one-off strategic questions the cost is trivial and CoT is always worth it.
When does CoT hurt performance?
Three scenarios. (1) Small models under ~10B parameters — they don't have enough capability to reason and the extra tokens just add noise. Skip CoT and use direct prompting. (2) Tasks that require no reasoning — sentiment classification, translation, grammar correction — where CoT introduces overconfident rationales. (3) Latency-sensitive interfaces — a 400ms direct answer beats a 4-second CoT answer for autocomplete or voice. The mitigation is to reserve CoT for hard multi-step tasks and use direct prompting elsewhere.
Who invented chain-of-thought prompting?
Jason Wei and colleagues at Google Research published the seminal paper 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models' at NeurIPS 2022. The zero-shot variant 'Let's think step by step' was popularized by Kojima et al. in a concurrent 2022 paper. The technique built on earlier work on scratchpads and rationale generation, but the Wei et al. paper is the one that made it standard practice. Both papers are still among the most-cited in LLM research.
Run your one-person company.
Hire your AI team in 30 seconds. Start for free.
Free to start · No credit card required · Set up in 30 seconds