Chain-of-thought prompting was the first widely-adopted technique that showed you could improve LLM reasoning without retraining the model. The original Wei et al. paper in 2022 demonstrated that asking a large model to 'think step by step' before answering raised GSM8K math-word-problem accuracy from around 18% to over 56% on PaLM 540B. The finding generalized across providers and model families and became the foundation of modern prompt engineering.
There are three common flavors. Zero-shot CoT appends the literal phrase 'Let's think step by step' to a prompt and relies on the model's pretraining to produce a rationale. Few-shot CoT includes worked examples in the prompt showing the reasoning trace, so the model mimics the pattern. Self-consistency CoT samples multiple reasoning chains with temperature > 0 and takes a majority vote, trading inference cost for accuracy. All three are still in production use in 2026 alongside newer techniques.
CoT works because language models are trained to predict the next token, and more tokens of intermediate work make the final answer more likely to be correct. Each reasoning step constrains the probability distribution of what comes next, which is especially valuable for problems that require multi-step calculation or case analysis. Critically, CoT is emergent at scale: small models (under ~10B parameters) often do worse with CoT than without it, while frontier models like Anthropic Claude 4.5 and OpenAI GPT-5 show strong gains. This is why CoT exploded in usage around 2022 — it required models large enough to actually benefit.
The modern successor to raw CoT is the 'reasoning model' pattern introduced by OpenAI o1 in 2024 and extended by Claude 4.5 thinking, DeepSeek R1, and Gemini 2.5 Flash Thinking. These models are trained with reinforcement learning on reasoning traces, so they produce long internal chains of thought automatically and hide the raw trace from the user. Under the hood the mechanism is the same — generate more reasoning tokens, get better answers — but the training is built in rather than prompted.
For AI employees, CoT matters because it turns an LLM from an autocomplete engine into a problem-solver. When an
AI CFO needs to reconcile an invoice discrepancy, CoT lets it walk through each line item rather than guessing. When an AI developer debugs a stack trace, CoT lets it enumerate hypotheses in order. Tycoon's
AI CEO Astra uses CoT implicitly through Claude 4.5's thinking mode — you give her a strategic question, she works through options before recommending one, and you can optionally see the trace.