Back to blogGuides

Tool Use and Function Calling: The Practical Developer's Guide for 2026

Tool use is what separates agents from chatbots. This guide covers implementing function calling across OpenAI, Anthropic, and Google — with patterns for validation, error handling, and production deployment.

Alex Chen

March 29, 2026

8 min read

What Tool Use Actually Means

Tool use — also called function calling — is the mechanism by which an LLM decides to invoke an external function, API, or system based on the conversation context. Instead of generating a text response, the model outputs a structured call to a specific tool with specific arguments. Your application executes the tool and returns the result to the model, which then continues reasoning.

This is the fundamental capability that turns a language model into an agent. Without tool use, an LLM can only generate text. With tool use, it can search the web, query databases, send emails, create tickets, execute code, and interact with any system that exposes an API.

How It Works Across Providers

OpenAI Function Calling

OpenAI's implementation uses a tools parameter in the chat completions API. You define tools as JSON Schema objects describing the function name, description, and parameters.

The key fields in a tool definition:

OpenAI also supports tool_choice to force the model to call a specific tool (tool_choice: {"type": "function", "function": {"name": "search_database"}}) or to let the model decide (tool_choice: "auto"). In production, use forced tool calling at the start of workflows where you know the first step, then switch to auto for the reasoning loop.

Anthropic Tool Use

Anthropic's implementation passes tools in the tools array of the messages API. The structure is similar to OpenAI but with notable differences:

Google Gemini Function Calling

Gemini's tool use is defined through function_declarations in the generation config. The JSON Schema for parameters is the same standard, but Gemini adds a behavior configuration that controls how aggressively the model uses tools:

Gemini also supports code_execution as a built-in tool, allowing the model to write and run Python code in a sandboxed environment. This is uniquely powerful for mathematical, data analysis, and data transformation tasks.

Tool Description Best Practices

The quality of your tool descriptions is the single biggest factor in whether your agent uses tools correctly. Here are patterns that work consistently across all providers:

Error Handling in Production

Tool execution failures are inevitable. Your agent framework needs to handle them gracefully:

Retry Logic

Implement retries with exponential backoff for transient failures (network timeouts, rate limits). But do not retry tool calls where the parameters were wrong — send the error back to the model so it can correct its approach.

Fallback Tools

For critical workflows, define fallback tools that provide degraded but functional alternatives. If the primary database search fails, a cached or simplified search can keep the agent moving forward.

Error Messages to the Model

When a tool call fails, the error message you send back to the model matters enormously. Instead of a generic "Error occurred," send structured feedback: "The search_database tool returned an error: invalid date format '2026/03/30'. Expected format is YYYY-MM-DD. Please retry with the correct format." This gives the model the information it needs to self-correct.

Validation and Safety

Never execute tool calls without validation. The model can hallucinate tool names, generate malformed parameters, or request actions outside its authorized scope.

Tool use and function calling are foundational skills for any AI agent engineer. Companies hiring for these skills are listed daily on AgenticCareers.co.

Advanced Patterns

Parallel Tool Calling

Both OpenAI and Anthropic support parallel tool calling — the model can request multiple tool calls in a single response. This is essential for performance: if an agent needs to check inventory and look up pricing simultaneously, parallel calls cut latency in half.

To enable effective parallel tool calling, design your tools to be independent — each tool should accept all the parameters it needs without depending on the output of another tool. When tools have dependencies (tool B needs the output of tool A), the model will naturally sequence them across multiple turns.

Streaming with Tool Calls

In streaming mode, tool calls arrive as partial JSON that must be accumulated until complete. This is a common source of bugs. Use the provider's streaming helpers (OpenAI's stream event handler, Anthropic's stream manager) rather than parsing the raw stream yourself. The edge cases around partial JSON, multiple tool calls in a single chunk, and interleaved text and tool content are tricky to handle correctly.

Tool Call Chaining

Sophisticated agents chain tool calls — using the output of one tool as the input to another in a multi-step workflow. The agent might: (1) search for a customer by email, (2) retrieve their order history, (3) find the specific order they are asking about, (4) check the return eligibility, and (5) initiate the return. Each step depends on the previous one.

For reliable chaining, implement state management between tool calls. Store intermediate results in a session object that persists across the agent loop. This prevents the agent from losing context between steps and allows you to resume chains that are interrupted by errors or timeouts.

Dynamic Tool Registration

In production systems, the set of available tools may change based on the user's permissions, the conversation context, or the current step in a workflow. Rather than loading all tools at the start, dynamically adjust the tool list at each step. This reduces the model's decision space (fewer tools to choose from means more accurate selection) and enforces authorization at the tool-availability level.

Testing Tool Use

Testing tool use requires a specific strategy:

Real-World Tool Use at Scale

At production scale, tool use introduces challenges that do not appear in prototypes:

Tool latency budgets: Each tool call adds latency to the agent's response time. If an agent makes 3 tool calls averaging 500ms each, that is 1.5 seconds of just tool execution time. Set latency budgets for each tool and optimize aggressively — caching, connection pooling, and pre-fetching can all reduce tool latency significantly.

Tool versioning: When you update a tool's behavior or interface, you need to consider that the LLM was trained (or prompted) with the old tool description. Update tool descriptions alongside tool implementations, and run evaluation suites to verify the model still uses updated tools correctly.

Graceful degradation: Design your agent to function (with reduced capability) when individual tools are unavailable. If the search tool is down, can the agent still answer questions using its training data? If the database tool is slow, can it provide a partial answer while the query completes? Graceful degradation is the difference between a production-ready agent and a fragile demo.

Continue reading

Careers

The Definitive AI Agent Engineer Salary Guide (2026)

Maya Rodriguez · Mar 20

Careers

25 Agentic AI Interview Questions You Will Actually Get Asked (2026)

Daria Dovzhikova · Mar 19

Industry

The Great AI Talent War: Supply, Demand, and What's Next

Daria Dovzhikova · Mar 19