LearnWhat is LLM Fine-Tuning?
When prompting isn't enough — teach the model directly.
Fine-tuning is the process of continuing to train a pretrained LLM on a smaller, task-specific dataset so the model internalizes a style, domain, or output format. It updates model weights, unlike prompting which leaves the model unchanged. Modern fine-tuning uses parameter-efficient methods like LoRA to update a small fraction of weights, making it affordable at $10-$500 per run rather than the millions required for pretraining.
Free to startNo credit card requiredUpdated Apr 2026
Short answer
Fine-tuning is the process of continuing to train a pretrained LLM on a smaller, task-specific dataset so the model internalizes a style, domain, or output format. It updates model weights, unlike prompting which leaves the model unchanged. Modern fine-tuning uses parameter-efficient methods like LoRA to update a small fraction of weights, making it affordable at $10-$500 per run rather than the millions required for pretraining.
In depth
Pretrained LLMs like Claude 4.5 or GPT-5 come out of the factory knowing general language and broad world knowledge. Fine-tuning teaches them specific behaviors that prompting can only approximate — your brand voice, your API response format, a rare domain vocabulary, or the nuance of your company's tone. The model goes from 'smart generalist' to 'smart specialist for this task' while keeping most of its general capability.
There are four main flavors. (1) Supervised fine-tuning (SFT): train on input-output pairs where you show the model what you want. 500-5000 examples is typical. This is the bread-and-butter approach — most 'fine-tunes' in production are SFT. (2) Parameter-efficient fine-tuning (PEFT), usually LoRA or QLoRA: only a tiny fraction of weights get updated, making training 10-100x cheaper and letting you store dozens of LoRA adapters per base model. Open-source PEFT is standard practice in 2026. (3) Reinforcement learning from human feedback (RLHF): train a reward model on pairs of better/worse outputs, then RL the LLM against the reward. Expensive but produces the best quality for subjective tasks; this is how ChatGPT became ChatGPT. (4) Direct preference optimization (DPO), introduced by Rafailov et al. (2023): a simpler alternative to RLHF that directly optimizes on preference pairs without a separate reward model. DPO has largely replaced classical RLHF for alignment fine-tuning because it's simpler and nearly as effective.
Fine-tuning is not always the right answer. The two most common mistakes are fine-tuning when you should have used RAG (you need the model to know current facts, not new behaviors) and fine-tuning when you should have prompted harder (you have 10 examples, which is not enough). Rough decision rule: if the task is 'produce outputs in this specific style/format I can show you' and you have 200+ examples, fine-tune. If the task is 'answer questions using my current knowledge base,' use RAG. If you're unsure, start with prompting and a few-shot examples; escalate to fine-tuning only when prompting plateaus below your quality bar.
The economics changed dramatically with LoRA and QLoRA. Full fine-tuning a 70B model requires multiple A100/H100 GPUs and thousands of dollars. LoRA fine-tuning the same model on a single consumer GPU runs $10-$100. Managed fine-tuning services — OpenAI's fine-tuning API, Anthropic's fine-tuning for Claude, Fireworks, Together, Modal — make it a zero-ops operation for open and closed models. The practical cost for most startups is dominated by data preparation (curating and cleaning 1K-5K examples), not compute.
For AI agents, fine-tuning matters for a narrow but important slice of use cases: enforcing output schema reliably, internalizing a brand voice, or learning a rare API grammar. Tycoon generally doesn't fine-tune the base models that power Astra — the frontier models from Anthropic and OpenAI are strong enough for most tasks and getting stronger. We reach for fine-tuning when a specific skill (programmatic SEO page generation, code refactoring to an unusual house style) needs consistent, format-locked outputs that prompting struggles to enforce.