Sources

Grounding, citations, and further reading for Tool Loops: Multi-Step and Parallel Calls.

All of this is optional. These are the sources used to write the article, listed here for grounding and so anyone who wants to go deeper on a specific point knows where to look.

The article itself is self-contained. Nothing on this page is required reading.

About the Sources

Provider documentation: OpenAI and Anthropic

Vendor-side reference for the function-calling and tool-use protocols.

The OpenAI and Anthropic guides are the canonical wire-format references. They describe the request shape, the response shape, the parallel-call structure, and the JSON-mode contract that every modern function-calling implementation follows. Useful as a baseline for what the protocol guarantees and where implementations diverge.

Foundational research papers (2022-2024)

ReAct, Toolformer, Gorilla, ToolLLM, and the survey.

The arXiv papers that shaped the modern function-calling pattern. ReAct established interleaved reasoning-and-action. Toolformer trained a model to emit tool calls as a learned token-emission pattern. Gorilla and ToolLLM scaled tool use to thousands of APIs. Wang et al. survey the resulting agentic-systems literature.

Distributed-systems writing

Fowler on contracts, AWS on retries, Barnett on RAG failure modes.

Three sources that pre-date the LLM era but apply directly to tool loops. Martin Fowler on consumer-driven contracts (the model is a consumer, the tool service is a producer). The AWS Builders' Library on timeouts, retries, and exponential backoff with jitter. Barnett et al.'s catalog of seven failure points, originally framed for RAG, that maps cleanly onto tool-call pipelines.

From One Call to Many

1The single-turn function-calling primitive ↩ Back to article

OpenAI's function-calling guide is the canonical vendor description of the single round trip the article opens with: the model emits a tool call, the runtime executes it, the result returns, the model produces text. The guide presents this as a complete capability rather than the building block it actually is. Useful as a starting point precisely because it makes the limitations of the single-turn shape obvious by omission.

OpenAI Platform Documentation. Read the guide

The Loop, Step by Step

2The four-move cycle in vendor terms ↩ Back to article

Anthropic's tool-use documentation describes the four-move cycle the article walks through. Move one: send the conversation plus tool definitions. Move two: model emits a tool_use block. Move three: runtime executes. Move four: runtime appends the tool_result and re-enters. The doc is explicit that each move belongs to a different actor, which is the point the article opens with.

Anthropic Documentation. Read the guide

3The probabilistic stopping rule ↩ Back to article

Yao et al.'s ReAct paper introduces the pattern of interleaved reasoning and action with the action layer external to the model. The article's claim that "the presence or absence of a tool call is the entire stopping rule" is the practical consequence of ReAct's design: the model cannot signal completion through any channel other than not asking for another action. ReAct also documents the failure mode where the model continues to reason after the task is functionally complete, which is the source of many runaway loops.

Yao et al., 2022. Read on arXiv

Sequential vs Parallel

4Multiple tool calls in one assistant message ↩ Back to article

Anthropic's parallel-tool-calls documentation specifies the wire format for emitting multiple tool calls in a single assistant message and returning multiple tool_result blocks in a single user message. The doc is explicit that the runtime is responsible for collecting all results before the next inference and for preserving the model's tool-use IDs unchanged through the round trip. This is the source for the article's discussion of ID linkage as a quiet failure mode.

Anthropic Claude API Documentation. Read the guide

When Calls Fail

5Seven failure points, applied to tool calls ↩ Back to article

Barnett et al. catalog seven failure points when engineering a RAG system. The taxonomy maps cleanly onto tool-call pipelines: missing content, schema drift, invalid arguments, downstream service failure, unexpected response shape, network timeout, and retry-induced duplicates. The article's "five distinct ways a tool call can fail" is essentially the Barnett taxonomy collapsed and rephrased for the tool-loop case.

Barnett et al., 2024. Read on arXiv

6Structured outputs and JSON-mode validation ↩ Back to article

OpenAI's structured-outputs guide explains the constrained-decoding mechanism that makes JSON validity nearly free in modern providers. The article's observation that "modern providers enforce JSON validity at the decoder level, so syntactically broken arguments will almost always parse" is the practical takeaway. The remaining failure modes are semantic, not syntactic.

OpenAI Platform Documentation. Read the guide

7Tool calls as consumer-driven contracts ↩ Back to article

Martin Fowler's 2006 essay on consumer-driven contracts is the pre-LLM antecedent to the article's framing. The model is a consumer; the tool service is a producer. When the producer changes its response shape and the consumer's expectations do not move with it, you get a successful call that returns the wrong shape. Fowler's prescription, that the consumer publishes its expectations and the producer respects them, is exactly the contract a robust tool-loop runtime should enforce on top of vendor JSON validation.

Fowler, 2006. Read on martinfowler.com

8Timeouts, retries, and idempotency ↩ Back to article

The AWS Builders' Library essay on timeouts, retries, and exponential backoff with jitter is the canonical engineering reference for the timeout-vs-failure problem the article describes. A timeout is not a failure; it is the absence of a response. Retrying a non-idempotent endpoint can cause the same destructive operation to run twice. Every recommendation in the article's failure-mode section that involves retry behavior is grounded in this essay.

Amazon Builders' Library. Read on amazon.com

When to Terminate

9The agentic-systems literature ↩ Back to article

Wang et al.'s 2023 survey on LLM-based autonomous agents catalogs the architectures, planning strategies, and termination conditions that the field has explored. The survey is useful for situating the article's stance: "even sophisticated agentic loops are still while-loops with a probabilistic termination condition." Most published agent designs add structure on top of that loop (planners, verifiers, memory) but none replace the underlying probabilistic stopping rule.

Wang et al., 2023. Read on arXiv