← All Articles

Schemas That Models Can Follow

A tool schema is only as reliable as the definition behind it. Vague descriptions produce vague calls. Constrained types produce correct ones.

When you give an LLM a function to call, you're giving it two things: a description of what the function does and a JSON Schema specifying the arguments.1 The description determines whether the model chooses the right tool. The schema determines whether the model calls it correctly.6

Robot studying a blueprint
Build to spec

Most developers get the description roughly right and the schema completely wrong.7

Loose schema Tight schema { "name": "search", "description": "Search for things", "parameters": { "query": {"type": "string"}, "status": {"type": "string"}, "date": {"type": "string"} } } Likely model output status: "canceled" | "stopped" | "off" date: "last week" | "Jan 2024" query: free-form, anything goes Output space: effectively unbounded { "name": "search_products", "description": "Search catalog by keyword.", "parameters": { "query": {"type": "string"}, "status": {"enum": ["active", "paused", "cancelled"]}, "date": {"format": "YYYY-MM-DD"} }, "required": ["query"] } Likely model output status: one of three exact strings date: "2024-09-15" query: required, on-topic keyword Output space: small, validated, reliable
The same tool, defined two ways.

On the left, the schema accepts whatever the model decides to emit: three string fields with no enums, no formats, and no required list. On the right, every added constraint narrows the space of outputs the model can plausibly generate. An enum names the legal status values, a format pins the date shape, and a required list refuses any call missing the parameter that actually matters. Same function signature, different output space.

The Description Is a Prompt

The function description isn't metadata. It's a prompt.2 The model reads it the same way it reads any other instruction, and it follows the same rules of zero-shot prompt engineering.9

↗ docs# Bad: vague, ambiguous
{
    "name": "search",
    "description": "Search for things"
}

# Good: specific, bounded
{
    "name": "search_products",
    "description": "Search the product catalog by keyword. Returns up to 10 results with name, price, and availability. Only searches products currently in stock."
}

The good description tells the model three things: what the function searches (products, not everything), what it returns (name, price, availability), and what it won't do (out-of-stock items). Each constraint reduces the space of possible misuse.8

When you have multiple tools that could plausibly apply to the same request, the description is the only thing the model uses to disambiguate. "Search for things" gives the model nothing to work with. "Search the product catalog by keyword" gives it a clear decision boundary.10

Name the Function Like You'd Name an API Endpoint

Function names should be verb-noun pairs that describe the action precisely. The model uses the name as a strong signal for when to invoke the tool.

↗ docs# Ambiguous
"name": "data"        # get data? set data? delete data?
"name": "process"     # process what?
"name": "handle"      # handle how?

# Clear
"name": "get_order_status"
"name": "cancel_subscription"
"name": "list_recent_invoices"

A model that sees cancel_subscription alongside a user saying "I want to cancel" will make the right call. A model that sees process might do anything.45

. . .

Constrain Everything

JSON Schema exists to express constraints.3 Every constraint you add is a guardrail the model can lean against. Every constraint you omit is an invitation to hallucinate.11

Use Enums Aggressively

If a parameter has a finite set of valid values, list them. Don't rely on the model figuring out what's valid from context.12

↗ docs# Unconstrained: the model will invent values
"status": {"type": "string"}

# Constrained: the model can only pick from these
"status": {
    "type": "string",
    "enum": ["active", "paused", "cancelled"],
    "description": "Subscription status to filter by"
}

Without the enum, a model might generate "status": "canceled" (American spelling), "status": "inactive", or "status": "stopped": each is plausible English, and each will fail your equality check downstream.13 The enum eliminates this entire class of error by reducing the valid output set to exactly the values your code accepts.14

Mark Required Fields

Every parameter that must be present should be in the required array. Models treat optional parameters casually. If they can skip a field, they sometimes will, even when the user provided the information.

"required": ["query", "category"]

Be deliberate about the split. If your function can operate with a default value for a parameter, make it optional and document the default in the description. If the function will fail without the parameter, make it required.

Add Descriptions to Every Parameter

Parameter-level descriptions are where most schemas fall short. The top-level function description explains what the tool does. Parameter descriptions explain what each argument means.

↗ docs"date_range": {
    "type": "string",
    "description": "ISO 8601 date range in format YYYY-MM-DD/YYYY-MM-DD. Start date must be within the last 90 days."
}

Without that description, the model might generate "last week", "2024-01-01 to 2024-02-01", or "January": all reasonable interpretations of a date range, none of which match the format your function expects.15

Specify formats, boundaries, and units explicitly in the description. The model will follow constraints it can see, and guess at constraints it can't.

. . .

The Nesting Problem

JSON Schema supports deeply nested objects. Models support them poorly.

↗ docs# This schema is technically valid but practically fragile
{
    "type": "object",
    "properties": {
        "filter": {
            "type": "object",
            "properties": {
                "conditions": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "field": {"type": "string"},
                            "operator": {"type": "string"},
                            "value": {}
                        }
                    }
                }
            }
        }
    }
}

Four levels of nesting, with an array of objects buried inside two outer objects. Models can generate this kind of structured output, but error rates climb with each level because of how autoregressive generation works.16 The failure modes are familiar to anyone who has parsed model output: missing braces, mismatched nesting, arguments placed at the wrong level.17

The fix is to flatten where possible.

↗ docs# Flattened: one level of nesting, clear semantics
{
    "type": "object",
    "properties": {
        "field": {"type": "string", "enum": ["price", "category", "date"]},
        "min_value": {"type": "number"},
        "max_value": {"type": "number"},
        "category": {"type": "string"}
    }
}

The flat version loses some expressiveness, but it gains a substantial amount of reliability, and for most applications that is the right tradeoff. If you need complex filtering, split it into multiple simpler tool calls rather than one elaborate schema.

Arrays Are Simpler Than You Think

Arrays of primitives work well in practice, while arrays of objects tend to be fragile. If you need the model to pass a list of items, keep the item type simple.

↗ docs# Reliable: array of strings
"tags": {
    "type": "array",
    "items": {"type": "string"},
    "description": "List of tags to filter by"
}

# Fragile: array of objects with multiple fields
"items": {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "product_id": {"type": "string"},
            "quantity": {"type": "integer"},
            "notes": {"type": "string"}
        }
    }
}

If you must use arrays of objects, keep them short (hint in the description: "maximum 5 items") and keep the objects flat.

. . .

Validation as Architecture

Schema design is your first layer of validation. Server-side validation is your second. Together, they form a defense that neither can provide alone. Note that each tool definition consumes context window space, so keeping schemas lean has a double benefit: reliability and efficiency.

The schema tells the model what to generate. Server-side validation catches what the model gets wrong. Between these two layers, you can achieve remarkably high reliability even with imperfect models.

import jsonschema

def validate_and_execute(tool_name, arguments, schema):
    try:
        jsonschema.validate(arguments, schema)
    except jsonschema.ValidationError as e:
        return {
            "error": f"Invalid arguments: {e.message}",
            "hint": f"Expected schema: {e.schema}"
        }
    return execute(tool_name, arguments)

When validation fails, return the error to the model with enough detail for it to self-correct. Most models will fix their arguments on the second attempt if you tell them what went wrong. Including the expected schema in the error response gives the model exactly the information it needs.18

This retry pattern turns a 90% first-attempt success rate into a 99% two-attempt success rate, because the first try is the model's best guess and the second try is an informed correction conditioned on the validator's feedback.19

. . .

Designing for the Model, Not for the Developer

There's a persistent temptation to design tool schemas the way you'd design an API for human developers. Complex types, flexible inputs, polymorphic fields, sensible defaults that require reading the docs to understand.20

Models don't read docs. They read the schema and the description, once, in context, alongside everything else in the conversation. Your schema needs to be self-documenting to a reader that will see it exactly once and then immediately generate arguments from memory.21

The defaults that work are roughly the opposite of the defaults you'd choose for a human-facing API: prefer flat schemas over nested ones, enums over free text, required fields over optional, and explicit formats over implicit conventions. Every ambiguity you resolve in the schema is a failure you prevent in production.22

The best tool schemas are boring. They have obvious names, explicit constraints, and descriptions that leave nothing to interpretation. The model fills in the rest, and boring schemas give it the least room to fill in something wrong.23

. . .

References

  1. OpenAI. (2023). "Function calling." OpenAI API Documentation.
  2. Anthropic. (2024). "Tool use." Anthropic Documentation.
  3. JSON Schema. (2020). "JSON Schema Specification." json-schema.org.
  4. Patil, S., et al. (2023). "Gorilla: Large Language Model Connected with Massive APIs." arXiv.
  5. Qin, Y., et al. (2023). "ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs." arXiv.
Function Calling Tool Schemas JSON Schema Schema Design Validation API Design
ML 101