Learn

What is Function Calling?

How LLMs stop talking and start doing — the API that lets models invoke your code.

Function calling is an LLM capability where the model, given a set of function schemas, outputs structured JSON indicating which function to call and with what arguments — letting your application invoke real code, APIs, or database queries. OpenAI introduced the feature in June 2023; it is now standard across GPT, Claude, Gemini, and open-source models like Llama 3.1 and Mistral.

Free to startNo credit card requiredUpdated Apr 2026

Short answer

In depth

Before function calling, using an LLM to trigger an action required brittle prompt engineering — 'output your answer as JSON in this exact format' — and frequent regex parsing of free text. Function calling formalized the pattern by making structured output a first-class capability of the model. The mechanics are simple. You pass the LLM a list of functions with JSON Schema describing each parameter. The model, after reading the user's message, either replies with normal text or returns a 'function_call' (OpenAI terminology) / 'tool_use' (Anthropic terminology) block containing the function name and a JSON object of arguments conforming to the schema. Your application then executes that function, captures the result, passes it back to the model as a function result message, and the model continues the conversation using the returned data. A concrete example: you define a function get_weather(city: string, unit: 'celsius' | 'fahrenheit'). The user asks 'What's the weather in Tokyo?'. Instead of hallucinating, the model returns { name: 'get_weather', arguments: { city: 'Tokyo', unit: 'celsius' } }. You call your real weather API, pass { temp: 18, condition: 'rain' } back to the model, and the model writes the natural-language reply 'It's 18°C and raining in Tokyo right now.' This single primitive unlocks almost everything people call 'AI agents' in 2026. Web browsing is a function. Database queries are functions. Sending emails, booking meetings, running code, reading files, controlling other agents — all function calls. The Model Context Protocol (MCP) is a standardized way to expose functions to LLMs; frameworks like LangChain, CrewAI, and AutoGen are orchestrators that manage the function-calling loop. Important details most tutorials skip. Parallel function calls: modern models can return multiple function calls in one turn, letting you parallelize API calls and cut latency. Forced function calling: you can force the model to always call a specific function (useful for structured extraction). Function call history: keeping prior function results in the context is what makes agents remember what they already did. Schema quality: the clearer your parameter descriptions, the better the model's arg selection — treat function schemas like prompts, not just types. Function calling reliability varies by model. Claude 4.5 and GPT-5 call functions correctly ~98% of the time with well-designed schemas. Smaller models (Llama 3.1 8B, Mistral Small) can drop to 80-90% and may return malformed JSON. For production systems, you almost always want JSON-mode or constrained decoding on top of function calling to guarantee parseability. Tycoon uses function calling extensively — every time Astra (the AI CEO) takes an action (assign a task, query a metric, send a message to a teammate), that's a function call under the hood.

Examples

→OpenAI GPT-5 tools parameter — pass an array of function schemas, get tool_calls back in responses
→Anthropic Claude 4.5 tools API — pass tools array, receive tool_use blocks the client executes and returns as tool_result messages
→Google Gemini function calling — tools parameter with FunctionDeclaration objects; supports forced function calling mode
→Tycoon's AI CEO (Astra) — every tool Astra uses (assign_task, query_metric, send_message, search_memory) is defined as a function and invoked via function calling
→Cursor and Claude Code — read_file, write_file, run_shell_command are functions the coding agent calls constantly
→ChatGPT plugins (deprecated, replaced by GPTs and now custom GPT actions) — each plugin API endpoint was exposed as a function the model could call
→Tool-using agents in LangChain, LlamaIndex, CrewAI — all build on function calling underneath

FAQ

Frequently asked questions

Clear answers about wallet credit, usage, subscriptions, and how Tycoon charges for work.

Is function calling the same as tool use?

They're synonyms in practice but have different origins. OpenAI coined 'function calling' in 2023 for its Chat Completions API. Anthropic uses 'tool use' in the Claude API. Google uses 'function calling' in Gemini. Semantically they're the same primitive — the model outputs a structured call matching a schema, your code executes it, you return the result. Most people now say 'tool use' as a broader term that includes function calling plus specialized tools like code execution, computer use, and web search.

Which models support function calling?

As of 2026: all frontier commercial models (GPT-5, Claude 4.5 / Opus 4.5, Gemini 2.5, Grok 4) support it natively. Most open-source models do too: Llama 3.1 and 3.3, Mistral Large and Small 3, Qwen 2.5, DeepSeek V3. Quality varies — Claude 4.5 and GPT-5 are the most reliable at complex multi-tool scenarios; smaller open-source models work but may need more explicit schemas and benefit from JSON-mode constraints on decoding.

How is function calling different from MCP?

Function calling is the low-level capability: the model outputs a structured call, your app executes it. MCP (Model Context Protocol) is a standardized protocol for how function schemas and results get exchanged between LLM clients and tool servers. Think of function calling as the raw primitive and MCP as a networking standard on top of it. A function-calling integration is bespoke per app; an MCP server exposes tools that any MCP-compatible client can use without custom glue code. Most apps today use function calling directly; MCP is the emerging cross-app standard.

Can the model make up function arguments (hallucinate)?

Yes, especially with smaller models or poorly described schemas. The most common failure: the model invents a plausible-looking value for a parameter it didn't have information to fill. Mitigations: (1) mark parameters as optional where they truly are, so the model doesn't feel pressured to invent values; (2) use detailed parameter descriptions that specify valid formats and constraints; (3) validate arguments against your schema before executing and ask the model to retry on failure; (4) for critical actions (payments, deletions), require human confirmation regardless of model confidence.

How do I decide what to expose as a function versus putting it in the prompt?

Put it in the prompt if it's static context (instructions, role, style). Expose it as a function if it involves fetching current data, taking an action, or retrieving something too large for the prompt. Rule of thumb: if the answer to 'when would this change?' is 'every time someone asks', it's a function. If the answer is 'rarely', it's prompt. Also prefer functions for anything with side effects (sending email, writing to a DB, spending money) so you keep human control of when they actually execute.

Run your company with humans and AI agents.

Hire your AI team in 30 seconds. Start for free.

Free to start · No credit card required · Set up in 30 seconds