← All Articles

How Testing Proves Code

"The programmer should make a number of definite assertions which can be checked individually, and from which the correctness of the whole program easily follows."

In June 1949, Alan Turing delivered a three-page paper at the inaugural EDSAC conference in Cambridge. He proposed that programs could be checked statically by attaching propositions to the arcs of a flowchart, and that termination could be proved separately by attaching a quantity that strictly decreases. The paper was effectively lost for thirty-five years. When Robert Floyd published the same idea in 1967, he had never heard of it.

This article is the history of that paper: what it actually says, why it disappeared, and why the discipline it foretold rediscovered it three times independently.

First page of Turing (1949) Checking a Large Routine from the EDSAC conference proceedings — Read the original: Turing, A. M. (1949). "Checking a Large Routine." *Report of a Conference on High Speed Automatic Calculating Machines*, University Mathematical Laboratory, Cambridge, pp. 67-69.¹

. . .

Part 1: The Paper (1949)

The paper is short. Three foolscap pages, delivered as a talk on Friday, 24 June 1949, at the inaugural conference of the Electronic Delay Storage Automatic Calculator at Cambridge.

The EDSAC computer at the University Mathematical Laboratory, Cambridge, around 1949 — EDSAC at the University Mathematical Laboratory, Cambridge
*The machine the conference was named after, six weeks after its first program.¹¹*

EDSAC had run its first program six weeks earlier¹⁰. Turing had spent the previous winter and spring at Manchester, where he was deputy director of the prototype computer there, and he travelled down for the Cambridge conference hosted by Maurice Wilkes. The printed text appears in the conference proceedings under the title "Checking a Large Routine," pages 67 to 69, and is catalogued as AMT-B-8 in the King's College Cambridge Turing Digital Archive.

EDSAC belongs to the first generation of stored-program computers¹², the machines that ended programming-by-rewiring. The Manchester Baby ran its first stored program in June 1948¹³, the Manchester Mark 1 followed in April 1949, and EDSAC followed a month later. Before these machines, programming meant setting switches and routing patch cables on a wiring harness. By the time Turing read his paper at Cambridge, the practice now called writing software was less than a year old. That is what makes Turing's question sharp at this moment in the history: a routine is now a written artifact held in memory, and a written artifact can be checked by another written artifact. You cannot write a routine to test a routine when there is no routine, only wires.

EDSAC and the vocabulary of 1949

Three pieces of 1949 vocabulary need translating before the paper reads cleanly. A "routine" is a program: a sequence of stored instructions that a stored-program computer will execute when control reaches its first instruction. To "check" a routine, in Turing's usage, is to make sure it is right. The opening sentence of the paper poses the question directly: "How can one check a routine in the sense of making sure that it is right?"

The accepted answer in 1949 was to load the routine into the machine, run it on a few inputs, and inspect the output. The technique scaled poorly. Each run took minutes, the operator had to interpret a cathode-ray display by eye⁹, and the state of an intermediate variable at the moment of interest was often not directly visible.

Animation of an operator adjusting an EDSAC cathode-ray-tube display — Reading the Output by Eye

For a small program the loop of run-and-look was workable. For a "large routine" the loop became unaffordable. Turing's paper is the alternative he proposes.

The factorial routine

The worked example is the factorial routine. \(n!\) is computed by repeated addition only: multiplication itself is a loop. The outer loop counts up through \(r = 1, 2, \ldots, n\), and at each step the inner loop adds \(r\) to a running total \(s\) times, so that the running total becomes \(s \cdot r\). When the inner counter catches up to the outer one, the outer counter advances by one. The variables held in store across the two loops are \(r\), \(s\), and \(u\): \(r\) tracks the outer multiplier, \(s\) tracks the inner addition counter, and \(u\) holds the partial result. At termination \(u\) contains \(n!\).

The flowchart, reproduced as Figure 1 of the paper, names seven control points labelled A through G and connects them with arrows for the two nested loops. The flowchart is the load-bearing artefact of the paper. Everything else is annotation hung on the arrows.

Turing's 1949 flowchart for the factorial routine, redrawn with the seven control points A through G that anchor his assertions.

Assertions on arcs

The technique introduced by the paper is to annotate the flowchart. At each control point, the programmer records what is true of the machine's state when control arrives there. Turing tabulates these claims as Figure 2: one column per ringed control point, listing the contents of each store line and the inductive variables that govern the local situation. The reader who wants to check the routine no longer has to follow the entire computation in a single sitting. The checker takes one column at a time and verifies, locally, that the assertion at the head of an arrow follows from the assertion at the tail combined with the operation in the box between them.

This is the move that makes the difference. A claim about the routine as a whole is reduced to a finite set of arc-level checks, each one small enough that the checker can finish it without keeping the entire program in mind. Turing notes explicitly that the columns of the assertion table can be taken "in any order and quite independently." The verification of the routine is the conjunction of the local verifications. Each arc check is a test of a definite property; the composition of arc checks is the proof of correctness.

Variant functions and termination

The assertion discipline as described so far is incomplete. The arc checks establish that if the routine ever stops, then the value left in line 31 is \(n!\). They cannot, on their own, establish that the routine stops at all. Turing addresses this directly. The programmer should also give "a further definite assertion": a quantity that decreases continually and vanishes when the machine stops. For the factorial routine he provides two versions. The pure-mathematics version uses an ordinal, \((n - r)\omega^2 + (r - s)\omega + u\), which lexicographically orders the three nested loop counters. The "less highbrow" version replaces \(\omega\) with a large finite weight, \(2^{80}(n - r) + 2^{40}(r - s) + u\), which is an integer that strictly decreases at every step of either loop.

This is the first appearance in the literature of what is now called a variant or ranking function for termination. The partial-correctness argument and the termination argument are kept separate. The local arc checks deliver partial correctness, the variant delivers termination, and the conjunction delivers total correctness. Jones, in his 2013 commentary on the paper⁷, observes that the two-track structure (local checks for partial correctness, a ranking function for termination) is exactly the structure used in modern verification systems. The vocabulary has shifted in seventy-five years. The structure of the argument has not.

The deterministic premise

The technique rests on a premise Turing does not argue for. The programmer can enumerate the states the machine can reach because the machine is deterministic: the same input and the same starting state produce the same trace of intermediate states every time. Without that premise, an assertion at a control point has no single state to be true of, and a variant has no single quantity to decrease. The premise is so naturally satisfied by the kind of routine running on EDSAC in 1949 that Turing does not pause to name it. It has carried forward into the verification tradition and into the broader testing tradition that runs alongside it, still mostly unnamed. The whole framework presupposes a determinate program, and that presupposition has remained the default mental model of software ever since.

. . .

Part 2: The Dormant Decades (1949 to 1983)

For thirty-five years the paper sat unread. The conference proceedings circulated in small numbers, the transcription of Turing's manuscript was riddled with errors (the factorial sign was omitted everywhere it appeared, and ten other identifiers were printed incorrectly), and the paper does not appear in the bibliographies of the people who eventually rediscovered its content. Morris and Jones, in their 1984 reprint², are explicit on this point: "In spite of the early date of Turing's paper and the insight shown, we know of no evidence that the paper influenced the later contributors to the ideas of program proofs." The technique was independently rediscovered three times between 1967 and 1975.

Wilkes does not adopt it

Maurice Wilkes hosted the EDSAC conference at which Turing read the paper. Two years later, with David Wheeler and Stanley Gill, he co-authored "The Preparation of Programs for an Electronic Digital Computer," the first textbook of programming. The book covers instruction sets, subroutine libraries, scaling of fixed-point arithmetic, and the practical discipline of getting a routine onto paper tape and through the machine. None of Turing's machinery from the EDSAC conference appears in it, neither the assertion discipline nor the variant function. The technique presented at Cambridge in June 1949 did not propagate into the textbook that Turing's host wrote two years later.

Cover of The Preparation of Programs for an Electronic Digital Computer by Wilkes, Wheeler, and Gill, 1951 — The First Textbook of Programming
*Wilkes, Wheeler, and Gill (1951), The Preparation of Programs for an Electronic Digital Computer.*

Floyd 1967: the first rediscovery

Eighteen years pass between the EDSAC conference and Robert Floyd's "Assigning Meanings to Programs"³. Floyd's 1967 paper is the standard origin point cited in the verification literature for the inductive-assertion method. He attaches propositions to the arcs of a flowchart. He defines verification as the requirement that each command preserve the assertion across its outgoing arc. He gives a termination argument based on a well-founded ordering on a quantity attached to each loop. The construction is, point for point, the construction in Turing 1949.

Floyd does not cite Turing 1949, and he does not appear to have known about it. His bibliography reaches back to John McCarthy's 1963 call for a separate theoretical treatment of program semantics and to Peter Naur's 1966 paper on "general snapshots," but it does not include the EDSAC conference proceedings. The reprint of Turing's paper is still seventeen years in the future.

Hoare 1969 and Dijkstra 1975: the second and third

Tony Hoare's "An Axiomatic Basis for Computer Programming"⁴ introduces the triple \(\{P\}\,S\,\{Q\}\): if \(P\) holds before \(S\) executes, then \(Q\) holds after. A program is annotated with triples at each step, and proof rules compose them into a claim about the program as a whole. The shape of the argument is the same as Turing's. Local claims, locally checkable, composed into a global claim. The notation has migrated from the arcs of a flowchart to the lines of a textual program, but the unit of work has not changed.

Dijkstra's 1975 "Guarded Commands, Nondeterminacy and Formal Derivation of Programs"⁵ shifts the framing once more. Instead of attaching pre- and post-conditions and checking that they compose, Dijkstra defines, for each command, a predicate transformer: given the postcondition, compute the weakest precondition that guarantees it. The composition of weakest preconditions across a program reduces verification to evaluating a single derived expression. The vocabulary is heavier than Hoare's and very different from Turing's, but the underlying claim is still that a program proof is the composition of finitely many local correctness arguments.

Neither Hoare 1969 nor Dijkstra 1975 cites Turing 1949. Hoare returns to the question in his 2003 retrospective⁶ and concedes the point: the technique was Turing's first, and both he and Floyd had been unaware of the 1949 paper at the time their own papers appeared. The same conclusion is reached, from the historical side, in the Morris and Jones reprint: no evidence has surfaced of any of the three rediscoveries having been influenced by the original.

. . .

Part 3: The Recognition (1984 to present)

The inflection point is 1984. By then the verification community has its three canonical sources for the inductive-assertion method (Floyd 1967, Hoare 1969, Dijkstra 1975), has built early mechanized proof systems on top of them, and has begun to compile the historiography of the field. Two researchers go back to the original 1949 proceedings, find that Turing got there first, and decide that this ought to be put on the record.

Morris and Jones reprint the paper

F. L. Morris of Syracuse University and Cliff Jones of Manchester collaborate during Jones's 1980 to 1981 sabbatical at Oxford on a corrected reprint of the 1949 paper. The IEEE Annals of the History of Computing publishes the result in April 1984 under the title "An Early Program Proof by Alan Turing." The article reproduces Turing's text in full, with the proceedings' transcription errors corrected to the best of the authors' reading, and adds a commentary that situates the technique in the modern verification literature. It is the readable version. Almost every subsequent citation of Turing 1949 traces back to this reprint rather than to the original conference text.

The Morris and Jones commentary makes the key historical claim. No evidence has surfaced that the 1949 paper influenced any of the later contributors. The standard explanation is the combination of limited circulation and the unusable state of the printed text. The reprint exists to make the paper available so that nobody else has to rediscover it.

Hoare claims it as ancestor

Tony Hoare's 2003 retrospective "Assertions: A Personal Perspective," in the same journal that had run the Morris and Jones reprint nineteen years earlier, settles the lineage from the inside. Hoare names Turing 1949 as the precursor of Hoare logic, acknowledges that neither he nor Floyd was aware of the paper at the time their own work was published, and traces the technique forward through the intervening decades. The retrospective is the closest the verification community has come to a formal statement that Turing 1949 is the origin of the inductive-assertion method, written by the person who has the strongest standing to make that statement.

Modern proof assistants implement the original argument

The Toccata project at INRIA hosts a working mechanization of Turing's 1949 factorial routine in the Why3 verification language⁸. The mechanization names the inductive variables Turing named, attaches the loop invariant Turing wrote in Figure 2, and supplies a variant clause matching the integer \(2^{80}(n - r) + 2^{40}(r - s) + u\) that Turing chose. Why3 generates the verification conditions implied by the invariant and the variant, hands them to an SMT solver, and reports them all discharged. Coq, Isabelle/HOL, and Lean have parallel exercises in their own libraries.

The point of the Toccata mechanization is not the verification itself. The factorial routine is trivial to verify. The point is that a 1949 proof technique, written on foolscap by a man who would not live to see a computer with more than a few thousand words of memory, is still the technique that a 2024 proof assistant uses. The vocabulary has been formalized, the bookkeeping has been mechanized, and the back end is an SMT solver rather than a graduate student with a pencil. The shape of the argument has not changed.

(* Why3 mechanization of Turing 1949, faithful to the original variables *)
let factorial (n: int) : int
  requires { n >= 0 }
  ensures  { result = fact n }
=
  let ref r = 1 in
  let ref s = 0 in
  let ref u = 0 in
  while r <= n do
    invariant { 1 <= r <= n + 1 }
    invariant { 0 <= s <= r }
    invariant { u = s * r }                          (* Turing's assertion at B *)
    variant   { (n - r), (r - s) }                   (* lexicographic, matches his ordinal *)
    if s = r then
      r := r + 1; s := 0; u := 0                (* outer loop: increment r *)
    else
      u := u + r; s := s + 1                    (* inner loop: accumulate r *)
  done;
  u

Why3 mechanization of Turing's factorial routine.

. . .

What survives

Every working programmer today tests their code in some form: unit tests, integration tests, property-based tests, fuzz tests, type checks, runtime assertions, contract checks.

Correctness is something the programmer constructs alongside the code, an idea that descends from this paper.

The 1949 paper is the first written articulation of the underlying principle, namely that when you write a routine, you also write something else whose job is to check the routine. The specific techniques Turing proposed (assertions on flowchart arcs, a ranking function for termination) became the specialized practice now called formal verification. The framework has lasted partly because the programs it was built to reason about have stayed deterministic, the same assumption Turing baked in without arguing for it.

A 1949 lab-coat inspector pressing a stethoscope against a single vacuum tube in a wall of thousands — The unit test.

These are the first tests.

. . .

References

Textbook grounding, chapter-level citations, and further reading for each numbered reference in this article live on the companion sources page.

Turing, A. M. (1949). Checking a Large Routine. In Report of a Conference on High Speed Automatic Calculating Machines, University Mathematical Laboratory, Cambridge, pp. 67-69. Archive: AMT-B-8, King's College Cambridge.
Morris, F. L. and Jones, C. B. (1984). An Early Program Proof by Alan Turing. IEEE Annals of the History of Computing, Vol. 6, No. 2, pp. 139-143. DOI.
Floyd, R. W. (1967). Assigning Meanings to Programs. Proceedings of Symposium in Applied Mathematics, Vol. 19, American Mathematical Society.
Hoare, C. A. R. (1969). An Axiomatic Basis for Computer Programming. Communications of the ACM, Vol. 12, No. 10.
Dijkstra, E. W. (1975). Guarded Commands, Nondeterminacy and Formal Derivation of Programs. Communications of the ACM, Vol. 18, No. 8.
Hoare, C. A. R. (2003). Assertions: A Personal Perspective. IEEE Annals of the History of Computing, Vol. 25, No. 2, pp. 14-25.
Jones, C. B. (2013). Turing's "Checking a Large Routine". In S. B. Cooper and J. van Leeuwen (eds.), Alan Turing: His Work and Impact, Elsevier, pp. 455-460. Newcastle ePrints 183999.
INRIA Toccata gallery. Why3 verification of Turing's factorial routine.
Campbell-Kelly, M. The EDSAC Simulator. Department of Computer Science, University of Warwick. www.dcs.warwick.ac.uk/~edsac/.
University of Cambridge Computer Laboratory (1999). EDSAC99: 50th Anniversary of EDSAC. cl.cam.ac.uk/events/EDSAC99/.
University of Cambridge Computer Laboratory. Archive Photographs of EDSAC. cl.cam.ac.uk/relics/archive_photos.html.
von Neumann, J. (1945). First Draft of a Report on the EDVAC. Moore School of Electrical Engineering, University of Pennsylvania. Overview.
Williams, F. C. and Kilburn, T. (1948). The Manchester Baby (Small-Scale Experimental Machine), Manchester. en.wikipedia.org/wiki/Manchester_Baby.

History Program Verification Turing Hoare Logic LLM Evaluation