Exhaled, Trembling, Dark
251,022 tokens across five books, measured against 100 million words of the British National Corpus. The numbers confirmed what readers always felt: Bradbury wrote with his whole body.
I met Ray Bradbury several times. I didn't know him personally. I was mostly overawed. But he was impossible not to watch. He believed in libraries. He told the internet to go to hell. He didn't believe in the future so much as he insisted on building one worth having, and spent his career warning us about the ones that weren't.
Years later, I built a toolkit that measures how writers use words. Corpus linguistics, frequency analysis, morphological classification. The kind of computational machinery that sounds like everything Bradbury would have found suspicious. He wrote on a typewriter in his garage. He didn't trust computers much.
But when I pointed this machinery at his books, something happened that no algorithm was designed to produce. The numbers drew a portrait. Not of technique or structure, but of a man who reached for the physical world every time he sat down to write.
The Method
The analysis uses pystylometry, an open-source Python toolkit for stylometric analysis. The relevant module here is BNC frequency analysis: for every unique word in a corpus, the tool computes how many times the author used it (observed count) versus how many times it would be expected to appear in a text of that length, based on the word's frequency in the British National Corpus (BNC), a 100-million-word reference corpus of general English.
The ratio of observed to expected is the signal. A ratio of 10.0 means the author used a word ten times more often than a typical English writer would. A ratio of 0.3 means they used it at less than a third of the expected rate. When a word is overused or underused consistently across all five books, it stops being coincidence and starts being signature.
The toolkit also classifies every word using a morphological taxonomy derived from established linguistic frameworks (Zwicky & Pullum's clitic theory, Plag's compound classification, Bauer's morphological analysis). This lets us filter the results to pure lexical words, stripping away contractions, possessives, hyphenated compounds, and other orthographic noise. What remains is the author's vocabulary, uncontaminated by the mechanics of English punctuation.
The corpus: five Ray Bradbury books from the 1950s. 251,022 tokens. 15,522 unique word forms.
What Bradbury Reached For
Here are the words Ray Bradbury used far more than a typical English writer, consistently across all five books. Lexical words only, sorted by overuse ratio.
| Word | Observed | Expected | Ratio |
|---|---|---|---|
| exhaled | 14 | 0.2 | 59.4x |
| humming | 28 | 0.7 | 40.6x |
| whirled | 20 | 0.5 | 37.6x |
| trembled | 28 | 0.9 | 32.7x |
| cried | 202 | 7.5 | 27.0x |
| blew | 87 | 3.3 | 26.1x |
| avalanche | 11 | 0.5 | 24.0x |
| numb | 16 | 0.7 | 23.2x |
| vanish | 18 | 0.8 | 22.2x |
| quivering | 16 | 0.7 | 21.5x |
| trembling | 58 | 2.7 | 21.3x |
| shriek | 9 | 0.4 | 20.6x |
| screamed | 54 | 2.7 | 19.8x |
| burned | 74 | 3.8 | 19.5x |
| sky | 224 | 12.5 | 17.9x |
| whisper | 36 | 2.2 | 16.7x |
| thunder | 31 | 2.0 | 15.5x |
| dust | 106 | 7.0 | 15.1x |
| earth | 351 | 24.4 | 14.4x |
| whispered | 93 | 6.6 | 14.2x |
| civilization | 24 | 1.8 | 13.2x |
| roar | 21 | 1.7 | 12.7x |
| monsters | 14 | 1.1 | 12.4x |
| dark | 363 | 33.8 | 10.7x |
| moon | 79 | 7.5 | 10.6x |
| silent | 96 | 9.5 | 10.1x |
| wind | 191 | 19.2 | 9.9x |
| stood | 330 | 33.3 | 9.9x |
| smell | 84 | 9.4 | 8.9x |
| silence | 127 | 14.9 | 8.5x |
Every word appeared in the same overuse category across all five Bradbury titles independently.
Read that list again. Not as data points, but as a vocabulary.
Exhaled. Humming. Whirled. Trembled. Cried. Blew. Avalanche. Numb. Vanish. Quivering. Trembling. Shriek. Screamed. Burned. Sky. Whisper. Thunder. Dust. Earth. Whispered.
This is a man who wrote with his lungs and his skin. Every top-ranked word is physical, sensory, elemental. Things you feel before you think about them. Air moving. Bodies shaking. Sound arriving before meaning does.
The overuse ratios cluster into recognizable themes. There is a vocabulary of fire and heat: burned (19.5x), blazed (16.6x), blazing (13.3x), ashes (14.8x), coals (10.9x). There is a vocabulary of darkness and night: dark (10.7x), moon (10.6x), silent (10.1x), silence (8.5x). There is a vocabulary of violent motion: whirled (37.6x), jerked (17.8x), screamed (19.8x), roared (12.5x). And there is a vocabulary of the body under stress: trembled (32.7x), numb (23.2x), quivering (21.5x), trembling (21.3x), shivering (19.9x), eyelids (19.6x), flesh (11.0x), wrist (9.1x), breath (8.4x).
Bradbury's characters do not merely observe the world. They are assaulted by it. The wind does not blow; it blew (26x expected). Things do not disappear; they vanish (22x). People do not speak quietly; they whisper (16.7x) and whispered (14.2x). The sensory register is turned up to the point where the nervous system itself becomes the subject.
And then, floating quietly in the middle of this storm: civilization (13.2x). The word he kept returning to, the concept he kept interrogating, while everything around it burned and trembled and screamed.
What Bradbury Avoided
The underused words are equally revealing. These are words Bradbury consistently used at less than half their expected frequency, across all five books.
| Word | Observed | Expected | Ratio |
|---|---|---|---|
| has | 101 | 650.6 | 0.16x |
| number | 23 | 123.8 | 0.19x |
| form | 21 | 86.5 | 0.24x |
| report | 17 | 69.5 | 0.24x |
| is | 839 | 2,504.7 | 0.33x |
| use | 52 | 157.8 | 0.33x |
| within | 39 | 116.0 | 0.34x |
| which | 332 | 932.9 | 0.36x |
| should | 100 | 278.9 | 0.36x |
| early | 33 | 85.4 | 0.39x |
| does | 69 | 172.4 | 0.40x |
| are | 502 | 1,180.9 | 0.43x |
| already | 37 | 86.0 | 0.43x |
| such | 88 | 189.4 | 0.46x |
| by | 588 | 1,286.6 | 0.46x |
These words were consistently below expected frequency across all five titles.
Look at the register of these words. Number. Form. Report. Use. Within. Which. Should. Such. Already. This is the vocabulary of administration. Of memos and committee minutes and policy documents. The language of people who describe the world from a desk.
Bradbury used "which" at one-third of its expected rate. He used "should" at just over a third. He used "has" at one-sixth. These are not rare words he failed to learn. They are common words he instinctively refused. He wrote fiction the way people talk to each other in kitchens, not the way they write reports in offices.
The underuse data also reveals something about Bradbury's narrative mode. The copula "is" appears at just 0.33x expected. The auxiliary "has" at 0.16x. "Are" at 0.43x. These are the verbs of states, of things that simply exist. Bradbury's prose is built on actions: things that burn, tremble, scream, vanish. His characters do not exist in a condition. They are always in motion, always arriving at or fleeing from something.
The difference between the overused and underused columns is the difference between a poet and a bureaucrat.
Words the Corpus Has Never Seen
The third category is words Bradbury used that do not appear in the BNC at all. When filtered to pure lexical forms confirmed by WordNet (real English words, not character names or OCR artifacts), a peculiar list emerges.
| Word | Count | Books |
|---|---|---|
| dentifrice | 12 | 1/5 |
| internes | 13 | 1/5 |
| sawhorses | 5 | 1/5 |
| shoveling | 3 | 2/5 |
| clamor | 3 | 2/5 |
| huckleberries | 2 | 2/5 |
| earthman | 2 | 1/5 |
| squinched | 3 | 1/5 |
| bankbook | 2 | 1/5 |
| chinaberry | 1 | 1/5 |
| coffeepot | 1 | 1/5 |
| thundershower | 1 | 1/5 |
| sabertooth | 1 | 1/5 |
| pinochle | 1 | 1/5 |
| liverwurst | 1 | 1/5 |
These are real English words the 100-million-word British corpus never encountered.
These words tell two stories.
The first is about the Atlantic. The BNC is a British corpus. Words like "shoveling" (American spelling), "clamor" (American spelling of "clamour"), and "liverwurst" (an American deli word) are absent not because they are rare, but because they are American. Bradbury grew up in Waukegan, Illinois, and his prose is saturated with mid-century American domestic life. Bankbooks. Coffeepots. Pinochle games on the porch. The BNC, compiled from British sources, has no frame of reference for this world.
The second story is about genre. "Earthman" and "sabertooth" do not appear in a balanced British corpus because the BNC samples general English, not science fiction. These words exist at the boundary where Bradbury's imagination exceeded the reference frame.
And then there is "huckleberries." A word that has appeared in exactly zero of 100 million British English words, but that Bradbury reached for in two separate books. Because huckleberries are what you pick in the summer in Illinois, barefoot, with stained fingers, in the kind of childhood that Bradbury spent his entire career trying to preserve in amber.
How the Toolkit Works
The analysis pipeline has three stages, each addressing a distinct problem in corpus linguistics.
Stage 1: Frequency comparison. For each unique word in the input corpus, the tool queries bnc-lookup, an O(1) hash-based lookup library that provides relative frequency data for 669,417 word forms from the BNC. The expected count is computed as relative_frequency(word) × corpus_length. The ratio of observed to expected yields the overuse/underuse signal.
Stage 2: Morphological classification. Every word is classified using a three-layer taxonomy rooted in established linguistic theory. The top layer identifies orthographic class (lexical, apostrophe, hyphenated, unicode, numeric). The second layer identifies morphological category (contraction, possessive, compound, prefixed). The third layer identifies specific pattern (negative contraction, copula, noun-compound, self-compound). This classification draws on Zwicky & Pullum's (1983) criteria for distinguishing clitics from affixes, Plag's (2003) compound taxonomy, and the BNC's own CLAWS4 tokenization conventions.
Stage 3: Combinatoric validation. The --combinatoric flag runs the frequency analysis on the concatenated corpus (standard mode) and then independently on each input file. For every word, the tool counts how many individual books classify it the same way. A word overused in 5/5 books is a vocabulary fingerprint. A word overused in 1/5 might be a topic artifact of a single story. This turns a single ratio into a measure of consistency, which is the distinguishing feature of style versus subject matter.
The command that produced the Bradbury analysis:
poetry run bnc \ --input-dir ~/bradbury \ --format excel \ --output-file /tmp/bradbury.xlsx \ --combinatoric
Download the full analysis (Excel, 676 KB). Contains observed counts, expected frequencies, overuse ratios, morphological classifications, and combinatoric scores for all 15,522 unique word forms.
What the Linguistics Tells Us
Corpus frequency analysis is not literary criticism. It cannot tell you that Fahrenheit 451 is about censorship, or that The Martian Chronicles is an elegy for colonialism. What it can tell you is something more primitive and, in its own way, more honest: what a writer's hands reached for when they were not thinking about reaching.
Bradbury overused "trembling" at 21x its expected frequency across five books spanning a decade. He did not decide to do this. No one sits down and says, "I will use the word trembling twenty-one times more than average." It happened because when Bradbury imagined a person standing in a room, the first thing he felt was the tremor in their hands. When he imagined a landscape, the first thing he saw was the sky. When he imagined sound, it arrived as a whisper or a scream, never as a remark or a comment.
The underuse data is just as telling. Bradbury's avoidance of formal connectives, of the copula, of abstract nouns like "form" and "number" and "report," reveals a writer who built prose out of concrete images rather than logical scaffolding. His sentences are not arguments. They are events. Something happens, and then something else happens, and the reader is carried forward by momentum rather than by reason.
This is what linguists call register: the systematic adaptation of language to context and purpose. Bradbury's register is so consistent across five books and a decade of writing that it shows up as a statistical signature. The overused words are overwhelmingly from the physical and sensory domains. The underused words are overwhelmingly from the analytical and administrative domains. The man wrote fiction the way you would tell a story around a campfire, not the way you would present findings to a committee.
The Fingerprint
In forensic linguistics, a stylometric fingerprint is a pattern of word choice stable enough to identify an author across different works, topics, and time periods. The gold standard is the function word profile: how often a writer uses "the," "of," "and," "but." These words are largely invisible to conscious control, which makes them resistant to imitation.
But Bradbury's fingerprint extends well beyond function words. His overuse of sensory and elemental vocabulary is not a stylistic affectation that could be turned on or off. It persists across science fiction (The Martian Chronicles), dystopian allegory (Fahrenheit 451), horror (The October Country), and nostalgic realism (Dandelion Wine). Regardless of genre, regardless of subject, the man reached for fire and darkness and trembling.
That consistency is what makes it a fingerprint rather than a theme. A theme is deliberate. A fingerprint is involuntary.
Ray Bradbury died on June 5, 2012, at 91. He had written something almost every day of his adult life. He once said, "I don't need an alarm clock. My ideas wake me."
The numbers in this analysis are just numbers. Ratios and counts and expected frequencies. But look at them long enough and they resolve into something unmistakable: a man who experienced the world as a sequence of physical sensations, and who spent sixty years finding words for what that felt like. Exhaled. Trembling. Dark. Sky. Wind. Dust. Silence.
A personal note: Bookfellows, 238 N. Brand Blvd.
References
- Zwicky, Arnold M. and Geoffrey K. Pullum. (1983). "Cliticization vs. Inflection: English N'T." Language, 59(3), 502-513.
- Plag, Ingo. (2003). Word-Formation in English. Cambridge University Press.
- Bauer, Laurie. (2003). Introducing Linguistic Morphology. 2nd ed. Georgetown University Press.
- Burnard, Lou. (2007). "Wordclass Tagging in BNC XML." Users Reference Guide for the British National Corpus.
- Burrows, John F. (1987). Computation into Criticism: A Study of Jane Austen's Novels and an Experiment in Method. Oxford University Press.
- Miller, George A. (1995). "WordNet: A Lexical Database for English." Communications of the ACM, 38(11), 39-41.
Tools
- pystylometry — Python toolkit for stylometric analysis (BNC frequency, Burrows' Delta, Yule's K, and more).
- bnc-lookup — O(1) word existence and frequency lookup against 669,417 BNC word forms.




