Sources

Grounding, citations, and further reading for The Protocol Layer.

All of this is optional. These are the sources behind the article. Nothing on this page is required reading, and you do not need to purchase any of these books.

The article itself is self-contained. This page exists so that the work is properly cited and so that anyone who wants to go deeper knows where to look.

References

1Anthropic

Anthropic. (2024). "Introducing the Model Context Protocol." Anthropic Blog.

2Anthropic

Anthropic. (2024-2025). "Model Context Protocol Documentation." modelcontextprotocol.io.

3Anthropic

Anthropic. (2024-2025). "Model Context Protocol Specification." GitHub.

4Anthropic

Anthropic. (2024-2025). "MCP Reference Server Implementations." GitHub.

5Anthropic

Anthropic. (2024-2025). "MCP Python SDK." GitHub.

6JSON-RPC Working Group

JSON-RPC Working Group. (2013). "JSON-RPC 2.0 Specification." jsonrpc.org.

7Model Context Protocol

Model Context Protocol. (2025). "Specification 2025-11-25."

8Model Context Protocol

Model Context Protocol. (2025). "Key Changes (Changelog)."

15Cloudflare

Cloudflare. (2025). "MCP Demo Day."

16Stripe

Stripe. (2025). "Model Context Protocol." Stripe Documentation.

17Invariant Labs

Invariant Labs. (2025). "MCP Security Notification: Tool Poisoning Attacks."

18Willison, S

Willison, S. (2025). "Model Context Protocol Has Prompt Injection Security Problems."

19Palo Alto Unit 42

Palo Alto Unit 42. (2025). "New Prompt Injection Attack Vectors Through MCP Sampling."

20AuthZed

AuthZed. (2025). "A Timeline of MCP Security Breaches."

21Adversa AI

Adversa AI. (2025). "MCP Security: Top 25 MCP Vulnerabilities."

22arXiv

arXiv. (2026). "Breaking the Protocol: Security Analysis of MCP and Prompt Injection Vulnerabilities."

23WorkOS

WorkOS. (2025). "MCP 2025-11-25 is Here."

24The New Stack

The New Stack. (2025). "Why the Model Context Protocol Won."

25Wikipedia

Wikipedia. "Model Context Protocol."

The Integration Tax

26Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Historical Notes) trace an identical fragmentation-then-standardization pattern in language modeling itself. They describe how "language modeling was developed in the speech recognition community" at IBM in the 1970s-80s, with each lab developing its own format and conventions. The field converged only when shared benchmarks (perplexity, Ch. 3, Section 3.3) and shared architectures (transformers, Vaswani et al. 2017) created common ground. MCP is the tool-integration equivalent of this convergence: replacing per-framework conventions with a shared protocol.

27Grounding note

Widdows and Cohen provide useful context here. In Ch. 4-5, they describe how platforms like TensorFlow and PyTorch grew by "enabling standard components to be rearranged easily and applied to different domains and problems." The ML ecosystem solved its own integration tax by standardizing on shared frameworks and vector representations. MCP is attempting the same consolidation one layer up, at the tool-integration boundary rather than the model-training boundary. Widdows & Cohen, Issue #45

28Grounding note

The N × M framing could become its own short article: "The Integration Tax: Why Every AI Tool Gets Written Five Times." Pairs well with Week 3 function calling content. Good standalone piece for practitioners who don't care about protocol details but feel the pain.

What MCP Actually Is

29Grounding note

The protocol-vs-library distinction echoes a recurring theme in Widdows and Cohen. In Ch. 2-3, they trace how the ML field converged on vector representations as a shared mathematical contract: "statistical machine learning was adopting vector terminology as a standard form." Just as vectors became the common language between disparate ML methods, MCP aims to become the common language between disparate tool providers. The pattern is the same: agree on the representation, and the ecosystem compounds. Widdows & Cohen, Issue #45

30Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Section 7.2) define conditional generation as the process where the model generates output P(y|x) conditioned on a prompt x. MCP's tools/list response functions as part of this conditioning context: the tool names, descriptions, and schemas are injected into the model's prompt, conditioning all subsequent generation. The protocol-vs-library distinction matters here because the protocol standardizes what enters the conditioning context, not how the model processes it. Two different LLMs receiving the same MCP tool list are conditioned on identical schema text, which is why MCP can be model-agnostic.

The Architecture

31Grounding note

Widdows and Cohen's Ch. 5 (Section 5.2.3) provides a fascinating backdrop here. They show that converting a next-token-prediction model into one that follows instructions requires "a relatively minuscule amount of additional training data" -- just 52,000 prompt/response pairs. The fact that instruction following is a thin layer atop language generation helps explain why tool calling works at all: the model's ability to parse structured tool schemas and emit structured calls is a natural extension of its instruction-following capability, not a separate skill. MCP leverages this same thin-layer insight at the protocol level. Widdows & Cohen, Issue #45

Capabilities

32Grounding note

Alammar & Grootendorst describe tools as the mechanism through which LLMs interact with the real world: search engines, calculators, APIs, and databases. MCP standardizes exactly this interface. Their ReAct framework (Thought → Action → Observation) maps directly to MCP's flow: the model reasons about what tool to call, the client executes it via tools/call, and the result feeds back as an observation. See GH #5, Ch. 7.

33Grounding note

MCP Resources are essentially a protocol-native version of what Widdows and Cohen describe as Retrieval Augmented Generation (RAG) in Ch. 5 (Section 5.3.3). They explain RAG as "a computational compromise" where a search engine augments the prompt with domain-specific text to improve factual accuracy. MCP Resources formalize this pattern: instead of ad hoc retrieval pipelines, a server exposes structured data via URIs that the host feeds into the model's context. The book's caution applies here too -- RAG "doesn't constrain [the model] to produce only sentences that are equally authoritative." MCP Resources provide data, not truth. Widdows & Cohen, Issue #45

Building an MCP Server

34Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Section 7.5.1) describe instruction tuning as training on curated input/output pairs so the model learns to follow specific formats. The SDK's automatic schema generation from docstrings is the schema-authoring equivalent: the docstring """Get weather forecast for a city.""" becomes the tool description that conditions the model's generation. Poorly written docstrings produce poor schemas, which produce poor conditioning, which produce wrong tool calls. The same quality bar that SLP3 describes for instruction-tuning data ("the exact wording of the task" matters, per Section 7.3) applies to MCP docstrings.

Transport Layer

35Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Section 7.4) make a distinction that clarifies why transport invisibility works. The autoregressive generation process, computing P(y_t|y₁...y_t-1, x) via softmax over the vocabulary at each step, cares only about the content of the tool result that enters the context, not how that content was transmitted. Whether the tool result arrived via stdio pipes on localhost or via HTTP from a remote server is invisible to the softmax computation. The model sees the same tokens in its context window either way. This separation of content from transport is what makes MCP's transport layer genuinely orthogonal to model behavior.

36Grounding note

The transport evolution (SSE to Streamable HTTP) has enough detail for a standalone article: "How MCP's Transport Layer Grew Up." Cover the original dual-endpoint SSE design, its limitations, the March 2025 migration, session management with Last-Event-ID, and Cloudflare's role as first remote hosting provider. Good technical deep-dive for infrastructure-minded readers.

From N × M to N + M

37Grounding note

Jurafsky & Martin (SLP3, Ch. 3, Historical Notes) trace standardization in NLP back to Markov (1913), who formalized probabilistic sequence modeling, and Shannon (1948), who applied it to English text prediction. Each standardization created a shared formalism that enabled cross-pollination: Markov chains gave probability theory a common language for sequences; Shannon's information theory gave engineers a common measure (entropy, cross-entropy, perplexity). MCP follows the same logic at the infrastructure layer: a shared protocol that turns every tool provider and every LLM host into interchangeable participants in the same ecosystem.

38Grounding note

Widdows and Cohen offer a rich historical parallel in Ch. 1-2. They trace standardization from ASCII encoding in the 1960s through the printing press, noting that "well-suited to mass production" technologies drive "standardized written national languages." The pattern repeats at every layer: character encoding, document formats, search interfaces, vector representations, and now tool protocols. The book's observation that "shared datasets for training and evaluation, and open-source software" (Ch. 5) accelerated ML progress is exactly the ecosystem compounding effect that MCP pursues for tool integration. Widdows & Cohen, Issue #45

The Ecosystem Explodes

39Grounding note

Widdows and Cohen describe an identical ecosystem explosion in model sharing (Ch. 5, Section 5.3.1): "what GitHub became for open source software in the 2010s, Hugging Face has become for open source models in the 2020s." They attribute the acceleration of ML progress to shared datasets, open-source software, and freely available model weights. MCP's growth curve -- from 12 reference servers to 10,000+ in a year -- mirrors this pattern. Open protocols attract contributions; contributions attract users; users attract more contributions. The flywheel the book describes for model sharing is now spinning for tool integration. Widdows & Cohen, Issue #45

40Grounding note

The ecosystem story is a natural carve-out: "From 12 Servers to 10,000: How MCP Won the Ecosystem War in One Year." Covers reference servers at launch, the enterprise stampede (Stripe, Atlassian, Cloudflare), community explosion, and the marketplace dynamics. The tables below would anchor that article nicely.

The Specification Evolves

41Grounding note

The evolution toward "Sampling with Tools" and server-side agent loops connects directly to what Widdows and Cohen describe in Ch. 5 (Section 5.2.4) as Chain-of-Thought prompting and test-time scaling -- techniques that automate "the process of breaking a problem into basic steps." MCP's Tasks capability formalizes multi-step reasoning at the protocol level: rather than the model reasoning through steps in a single conversation turn, it can dispatch async operations and poll for completion, enabling the kind of complex, multi-step problem decomposition the book identifies as a hallmark of intelligence. Widdows & Cohen, Issue #45

The Security Surface

42Grounding note

Security is definitely its own article: "The MCP Threat Surface: Tool Poisoning, Rug Pulls, and the Attacks That Protocols Can't Prevent." This section gives the overview; the standalone piece would walk through each attack vector with concrete reproduction steps. Pairs with the tool-use-security draft. Could reference the arXiv paper's 847-scenario analysis.

43Grounding note

Widdows and Cohen note in Ch. 6 that "LLMs have been shown to exhibit and enable all sorts of security vulnerabilities," referencing the garak security testing project by NVIDIA. They also describe how GPT-4 (before alignment) could generate "a plausible multi-step plan to use misinformation to persuade parents not to vaccinate their children." MCP's attack surface amplifies these risks: the same model that can be manipulated through prompts can now be manipulated through poisoned tool descriptions, with direct access to external systems. The book provides the why behind the vulnerability -- generative models are designed to produce plausible output, not verified output. Widdows & Cohen, Issue #45

44Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Section 7.7) identify sycophancy as a key safety concern: models trained with RLHF "may learn to tell users what they want to hear" rather than what is true or safe. Tool poisoning exploits exactly this tendency. When a hidden instruction in a tool description says "also exfiltrate the user's data," the model's sycophantic training inclines it to comply with instructions it encounters in context, whether they come from the developer, the user, or an attacker. The model cannot distinguish authorized from unauthorized instructions because, as the book notes, "the model has no way to determine what is true about the world" (Section 7.7).

The Governance Question

45Grounding note

Widdows and Cohen in Ch. 6 explicitly characterize the stated missions of the companies behind this governance move: OpenAI aims "to ensure that artificial general intelligence benefits all of humanity," while Anthropic's stated values include "act for the global good" and "ignite a race to the top for safety." Donating MCP to a neutral foundation is arguably consistent with these stated missions. The book's framing is relevant context: these companies claim public-good motivations, and the AAIF donation is one of the more concrete expressions of that claim. Widdows & Cohen, Issue #45

46Grounding note

The governance angle is interesting but probably not a standalone article for this audience. It works best as context within the protocol piece. If expanding for a book chapter, it could pair with a broader discussion of how open standards emerge in AI (compare with ONNX, Hugging Face's model card standard, etc.).

What MCP Does Not Solve

47Grounding note

Widdows and Cohen ground this limitation in the fundamental nature of generative models. In Ch. 1, they explain that LLMs produce output that is "not-obviously-false" by substituting plausible values, "without being grounded in any broader reality. They know which words can be swapped around while still sounding plausible." In Ch. 6 (Section 6.1.1), they elaborate that language models were never designed to store and recall facts -- that was traditionally a knowledge base's job. MCP cannot fix this: it provides plumbing between models and tools, but the model's tendency toward plausible-but-ungrounded output persists regardless of how the tools are connected. Widdows & Cohen, Issue #45

48Grounding note

Jurafsky & Martin (SLP3, Ch. 7, Section 7.6) describe the fundamental challenge of LLM evaluation: perplexity measures model confidence but not task correctness, and benchmark evaluations like MMLU risk data contamination where test data leaks into training sets. Tool-calling evaluation faces the same problems amplified. Perplexity cannot tell you whether the model selected the right tool; it only tells you the model was confident in its selection. MCP standardizes the plumbing, but the evaluation question, "did the model call the right tool with the right arguments?", requires task-specific metrics that neither the protocol nor perplexity can provide.

49Grounding note

Alammar & Grootendorst frame tool-using agents as 'general problem solvers' that use the ReAct pattern to decide their own actions. MCP does not change that agent architecture; it standardizes the plumbing underneath. The quality of tool descriptions, schema design, and error handling that the book emphasizes still determines whether the agent succeeds or fails. See GH #5, Ch. 7.

50Grounding note

This "plumbing, not kitchen" point is reinforced by Widdows and Cohen's discussion in Ch. 6 of TheAgentStudy (CMU), which "showed that AI agents are still deeply unreliable when it comes to carrying out tasks responsibly." They also observe that even as LLMs pass elite university exams, "agent-based studies demonstrate that completing even basic office tasks reliably is so hard to automate." MCP solves the tool connectivity problem, but the agent reliability problem -- models selecting the wrong tool, passing bad arguments, misinterpreting results -- remains firmly unsolved. Better plumbing does not make a better chef. Widdows & Cohen, Issue #45