Before function calling, using an LLM to trigger an action required brittle prompt engineering — 'output your answer as JSON in this exact format' — and frequent regex parsing of free text. Function calling formalized the pattern by making structured output a first-class capability of the model.
The mechanics are simple. You pass the LLM a list of functions with JSON Schema describing each parameter. The model, after reading the user's message, either replies with normal text or returns a 'function_call' (OpenAI terminology) / 'tool_use' (Anthropic terminology) block containing the function name and a JSON object of arguments conforming to the schema. Your application then executes that function, captures the result, passes it back to the model as a function result message, and the model continues the conversation using the returned data.
A concrete example: you define a function get_weather(city: string, unit: 'celsius' | 'fahrenheit'). The user asks 'What's the weather in Tokyo?'. Instead of hallucinating, the model returns { name: 'get_weather', arguments: { city: 'Tokyo', unit: 'celsius' } }. You call your real weather API, pass { temp: 18, condition: 'rain' } back to the model, and the model writes the natural-language reply 'It's 18°C and raining in Tokyo right now.'
This single primitive unlocks almost everything people call 'AI agents' in 2026. Web browsing is a function. Database queries are functions. Sending emails, booking meetings, running code, reading files, controlling other agents — all function calls. The Model Context Protocol (MCP) is a standardized way to expose functions to LLMs; frameworks like LangChain,
CrewAI, and AutoGen are orchestrators that manage the function-calling loop.
Important details most tutorials skip. Parallel function calls: modern models can return multiple function calls in one turn, letting you parallelize API calls and cut latency. Forced function calling: you can force the model to always call a specific function (useful for structured extraction). Function call history: keeping prior function results in the context is what makes agents remember what they already did. Schema quality: the clearer your parameter descriptions, the better the model's arg selection — treat function schemas like prompts, not just types.
Function calling reliability varies by model. Claude 4.5 and GPT-5 call functions correctly ~98% of the time with well-designed schemas. Smaller models (Llama 3.1 8B, Mistral Small) can drop to 80-90% and may return malformed JSON. For production systems, you almost always want JSON-mode or constrained decoding on top of function calling to guarantee parseability. Tycoon uses function calling extensively — every time Astra (the
AI CEO) takes an action (assign a task, query a metric, send a message to a teammate), that's a function call under the hood.