The word 'hallucination' is somewhat misleading — LLMs aren't having perceptual experiences. What they do is generate text that is statistically plausible given their training distribution, regardless of whether the underlying facts are correct. When you ask an LLM 'which 2023 paper proved X,' it will cheerfully invent a citation because inventing a plausible citation is exactly what the training data would predict. The model has no ground-truth lookup mechanism.
Researchers distinguish several types. Intrinsic hallucination: the output contradicts the input (e.g., summary contradicts the source document). Extrinsic hallucination: the output introduces claims not supported by any input (e.g., invented facts about a person). Factual hallucination: claims are verifiable and wrong. Faithfulness hallucination: claims don't match what the retrieved context actually says — a common RAG failure mode. Each type requires different mitigations; lumping them together leads to bad diagnoses.
The causes are well-understood by 2026. (1) Training data contains errors and contradictions; the model averages over them. (2) The next-token prediction objective has no penalty for plausible falsehoods vs plausible truths. (3) Model sizes and context windows aren't enough to memorize everything, so the model interpolates — and interpolation between facts produces plausible-sounding fictions. (4) RLHF/alignment training inadvertently teaches the model to sound confident because confident answers score higher in human preference data, even when they're wrong. Frontier models (Claude 4.5, GPT-5) hallucinate less than 2023-era models on factual QA benchmarks (down from 20%+ error rates to low single digits on well-covered topics) but still fail reliably on long-tail facts, recent events, and specific citations.
Mitigations fall into five categories. (1) Retrieval-augmented generation (RAG): give the model the actual source documents and instruct it to cite them. Dramatically reduces hallucination on factual QA but doesn't eliminate it — models still sometimes misread their context. (2) Citations and groundedness checks: require the model to emit citations, then verify citations point to real sources. (3) Chain-of-thought with self-verification: have the model write reasoning, then a second pass checks consistency. (4) Temperature control: lower temperature (0.0-0.3) reduces creative divergence but can still confidently hallucinate. (5) Guardrails: classifier models like HaluEval or GPT-4-as-judge flag suspicious outputs for review or refuse-to-answer when confidence is low.
The frontier research directions in 2026 are (a) calibrated confidence — teaching the model to say 'I don't know' when uncertain, which classical models are famously bad at, (b) tool use for ground truth — the model uses web search or structured DBs instead of relying on parametric memory, and (c) verifier models trained to catch hallucinations that production models produce. Anthropic, OpenAI, and Google have all published progress here but the problem is not solved.
For AI agents like those in Tycoon, hallucination is a product-killer. An
AI CFO that hallucinates a revenue figure erodes trust permanently; an AI developer that hallucinates an API signature wastes hours. The defenses are layered: RAG for all factual claims, citations with source verification, low temperature for factual tasks, and explicit 'I don't know' training in the agent prompts. Astra is instructed to refuse factual claims she can't ground in retrieved context — and to say so when the user asks for something she doesn't know.