What is Function Calling?
How LLMs stop talking and start doing — the API that lets models invoke your code.
Function calling is an LLM capability where the model, given a set of function schemas, outputs structured JSON indicating which function to call and with what arguments — letting your application invoke real code, APIs, or database queries. OpenAI introduced the feature in June 2023; it is now standard across GPT, Claude, Gemini, and open-source models like Llama 3.1 and Mistral.
Function calling is an LLM capability where the model, given a set of function schemas, outputs structured JSON indicating which function to call and with what arguments — letting your application invoke real code, APIs, or database queries. OpenAI introduced the feature in June 2023; it is now standard across GPT, Claude, Gemini, and open-source models like Llama 3.1 and Mistral.
In depth
Examples
- →OpenAI GPT-5 tools parameter — pass an array of function schemas, get tool_calls back in responses
- →Anthropic Claude 4.5 tools API — pass tools array, receive tool_use blocks the client executes and returns as tool_result messages
- →Google Gemini function calling — tools parameter with FunctionDeclaration objects; supports forced function calling mode
- →Tycoon's AI CEO (Astra) — every tool Astra uses (assign_task, query_metric, send_message, search_memory) is defined as a function and invoked via function calling
- →Cursor and Claude Code — read_file, write_file, run_shell_command are functions the coding agent calls constantly
- →ChatGPT plugins (deprecated, replaced by GPTs and now custom GPT actions) — each plugin API endpoint was exposed as a function the model could call
- →Tool-using agents in LangChain, LlamaIndex, CrewAI — all build on function calling underneath
Related terms
Frequently asked questions
Is function calling the same as tool use?
They're synonyms in practice but have different origins. OpenAI coined 'function calling' in 2023 for its Chat Completions API. Anthropic uses 'tool use' in the Claude API. Google uses 'function calling' in Gemini. Semantically they're the same primitive — the model outputs a structured call matching a schema, your code executes it, you return the result. Most people now say 'tool use' as a broader term that includes function calling plus specialized tools like code execution, computer use, and web search.
Which models support function calling?
As of 2026: all frontier commercial models (GPT-5, Claude 4.5 / Opus 4.5, Gemini 2.5, Grok 4) support it natively. Most open-source models do too: Llama 3.1 and 3.3, Mistral Large and Small 3, Qwen 2.5, DeepSeek V3. Quality varies — Claude 4.5 and GPT-5 are the most reliable at complex multi-tool scenarios; smaller open-source models work but may need more explicit schemas and benefit from JSON-mode constraints on decoding.
How is function calling different from MCP?
Function calling is the low-level capability: the model outputs a structured call, your app executes it. MCP (Model Context Protocol) is a standardized protocol for how function schemas and results get exchanged between LLM clients and tool servers. Think of function calling as the raw primitive and MCP as a networking standard on top of it. A function-calling integration is bespoke per app; an MCP server exposes tools that any MCP-compatible client can use without custom glue code. Most apps today use function calling directly; MCP is the emerging cross-app standard.
Can the model make up function arguments (hallucinate)?
Yes, especially with smaller models or poorly described schemas. The most common failure: the model invents a plausible-looking value for a parameter it didn't have information to fill. Mitigations: (1) mark parameters as optional where they truly are, so the model doesn't feel pressured to invent values; (2) use detailed parameter descriptions that specify valid formats and constraints; (3) validate arguments against your schema before executing and ask the model to retry on failure; (4) for critical actions (payments, deletions), require human confirmation regardless of model confidence.
How do I decide what to expose as a function versus putting it in the prompt?
Put it in the prompt if it's static context (instructions, role, style). Expose it as a function if it involves fetching current data, taking an action, or retrieving something too large for the prompt. Rule of thumb: if the answer to 'when would this change?' is 'every time someone asks', it's a function. If the answer is 'rarely', it's prompt. Also prefer functions for anything with side effects (sending email, writing to a DB, spending money) so you keep human control of when they actually execute.
Run your one-person company.
Hire your AI team in 30 seconds. Start for free.
Free to start · No credit card required · Set up in 30 seconds