← All Articles

Attaining Tonality

Everyone wants their chatbot to sound like them. The problem is that "sounding like you" means different things depending on how you actually write.

In the television series Silicon Valley, the character Gilfoyle builds a chatbot trained on his own communications. The bot works. It responds to colleagues with terse, cutting remarks. One-word put-downs. Deadpan dismissals.

Dinesh and Gilfoyle from Silicon Valley — Video: Gilfoyle's bot passes the Turing test by being consistently hostile.

The joke is that Gilfoyle's personality is so narrow, so consistently acerbic, that a simple pattern-matcher can replicate it. His "voice" consists almost entirely of attitude. No elaborate sentence structures, no layered arguments, no extended prose. Just sarcasm, delivered in as few words as possible.

This is the easy case.

The Hard Case

Now consider someone whose writing involves extended arguments, professional contexts, and careful reasoning. Their voice emerges not in catchphrases but in the aggregate shape of their prose. The length of their sentences. Their preferred transitions. Whether they favor active or passive construction. How they introduce examples. When they use semicolons.

None of these features is distinctive in isolation. Every literate professional uses semicolons sometimes. Everyone writes both short and long sentences. The signature lies in the distribution, the proportions, the unconscious habits that accumulate across thousands of words.

How do you train a bot to capture that?

Tonality Is Not Topicality

First, clarify the target. Tonality is not the same as topicality.

If someone writes obsessively about Victorian literature or competitive chess or 18th-century naval history, that content becomes identifying. Mention the Battle of the Nile and readers familiar with the author will recognize a telltale interest. But content changes. Today's memo concerns quarterly projections; yesterday's concerned vendor negotiations. Neither topic is distinctive. The writer's voice persists anyway.

Topicality is what you write about. Tonality is how you write about anything.

A chatbot could, in principle, adopt someone's favorite topics through retrieval augmentation or fine-tuning on domain-specific content. That would produce a simulacrum of the author's interests. It would not produce a simulacrum of the author's voice. The challenge of tonality remains even when topics are generic or imposed from outside.

This distinction matters because most personal voice-cloning projects conflate the two. They feed a model examples of someone's writing and expect it to reproduce the style. What the model often reproduces is the subject matter. The tone flattens into standard LLM prose.

The Function Word Path

Researchers solved a version of this problem two centuries ago.

When statisticians Mosteller and Wallace investigated the Federalist Papers in 1964, they faced a question of authorship attribution. Twelve essays had been claimed by both Alexander Hamilton and James Madison. The subject matter provided no clues; both wrote about constitutional governance. The vocabulary of nouns, verbs, and adjectives overlapped substantially.

The solution came from an unexpected direction: function words. Articles, prepositions, conjunctions. Words like "upon," "whilst," "also," "enough." These carry grammatical rather than semantic meaning. They are used unconsciously. And they differ between authors.

Hamilton used "upon" 3.24 times per thousand words. Madison used it 0.23 times. A fourteen-fold difference.

Interactive: The word "upon" was the single strongest discriminator in the entire study.

Hamilton preferred "while"; Madison preferred "whilst." Hamilton used "commonly"; Madison never did. The differences were not dramatic in any individual case, but combined through Bayesian inference, they produced overwhelming certainty. All twelve disputed papers were attributed to Madison with probabilities exceeding 99.9%.

This is stylometry. The measurement of style through features that operate below conscious awareness.

What Function Words Reveal

Function words are particularly useful for authorship because they resist conscious manipulation. An author choosing to write about the Constitution will select content words appropriate to that subject: "ratification," "federalism," "amendment." These choices are deliberate. But that same author will distribute "the," "of," "upon," and "which" according to deep-seated habit. The choices are automatic.

This is why function-word profiles are sometimes called linguistic fingerprints. They encode style independent of content. They persist across topics, genres, and decades of an author's career. They are difficult to fake deliberately.

For the tonality problem, this suggests a path forward. If you want a chatbot to sound like a specific person, analyze that person's function-word distributions. Then bias the model's output accordingly. Weight toward "upon" for Hamilton-like voice; weight away from it for Madison-like voice.

The path has limits. Modern language models don't operate on individual words in the way that makes this intervention straightforward. But the principle holds: tonality lives in the small choices, not the big ones.

Beyond Function Words

Function words are the beginning, not the end. Researchers have identified dozens of stylometric features that contribute to authorial fingerprints.

Sentence length distribution. Some writers favor short, punchy sentences. Others prefer complex constructions that span multiple clauses. The average matters less than the variance. Does the writer alternate between short and long, or maintain a consistent rhythm?

Vocabulary richness. The type-token ratio measures how many unique words appear relative to total words. A low ratio indicates repetitive vocabulary; a high ratio indicates lexical diversity. Different writers operate at different points on this spectrum.

N-gram patterns. Beyond individual words, writers have characteristic phrases. "In order to" versus "so that." "However" at the start of a sentence versus mid-sentence. Character trigrams capture sublexical patterns: preferences for certain letter combinations regardless of which words contain them.

Syntactic structure. Parse a thousand sentences and you can measure clause depth, passive voice frequency, the ratio of subordinate to coordinate conjunctions. These features capture the architecture of thought, how an author constructs arguments at the grammatical level.

Each feature contributes small amounts of signal. The power comes from aggregation. Burrows' Delta, a standard stylometric distance measure, combines the fifty or one hundred most frequent words into a single similarity metric. Texts by the same author cluster together in this high-dimensional space. Texts by different authors separate.

Burstiness

One feature deserves special attention: burstiness.

Human writing is variable. Within a single document, the style shifts. Paragraphs introducing a new topic may be more formal; paragraphs elaborating familiar points may relax into colloquial rhythm. Vocabulary clusters in bursts around themes, then gives way to different clusters. The texture is uneven.

Machine-generated text, by contrast, tends toward monotony. A language model optimizes for likely continuations, which means it regresses toward the mean of its training data. It smooths out idiosyncrasy. The result is prose that maintains consistent vocabulary diversity, consistent sentence length, consistent formality throughout. Too consistent.

Burstiness measures this variance. Compute any stylometric feature in rolling windows across a document, then measure how much the feature fluctuates. Human writing fluctuates more than machine writing.

Tools like GPTZero use burstiness as a primary signal for detecting AI-generated text. The logic is simple: if the style never varies, something is wrong.

For tonality cloning, burstiness is both obstacle and opportunity. A chatbot that replicates someone's average style but lacks their variance will feel flat. The aggregate statistics might match, but the texture will differ. Real voices have rhythm, departures, surprises. Capturing those requires modeling not just the central tendency of someone's style but its distribution.

Injecting Variance

One approach is to train on more than examples; train on contrasts. Show the model passages where the author deviates from their own baseline alongside passages where they adhere to it. Let the model learn when deviation is appropriate.

Another approach uses stylistic steering during generation. Rather than sampling uniformly from the model's probability distribution, introduce periodic perturbations. Force an unusually short sentence after several long ones. Insert a rare vocabulary item. These interventions create artificial burstiness, mimicking the natural variation that characterizes human writing.

Neither approach is perfect. Both require extensive corpus of the target author's writing, extensive enough to characterize not just averages but ranges. For most individuals, such corpora don't exist. Even prolific writers rarely produce the hundreds of thousands of words needed for reliable stylometric profiling.

There is an irony here. The same features that once distinguished Hamilton from Madison now distinguish human from machine. Variance, burstiness, the irregular rhythm of natural prose: these were originally markers of individual authorship. Now they are markers of authorship at all. The problem has shifted levels. We are no longer asking "does this sound like Craig?" but "does this sound like a person?" The techniques for achieving individual tonality have become the techniques for achieving human plausibility.

The Persona Problem

There's a deeper difficulty. What you're trying to capture isn't just style; it's persona. And persona involves more than how words are arranged.

It involves what the writer notices. What they find worth commenting on. How they structure an argument: bottom-up from evidence or top-down from claims. Whether they address counterarguments before or after presenting their own position. How much they expect the reader to know. How much patience they have for preamble.

These are cognitive habits, not linguistic ones. They show up in word choice and sentence structure, but they originate upstream, in how a person thinks through a problem before writing a word.

A chatbot can be tuned to prefer certain words and certain sentence lengths. It cannot be tuned to think like a specific person. The stylometric features are downstream symptoms of upstream cognition. Matching the symptoms does not replicate the cause.

This is why Gilfoyle's chatbot is a joke. The voice being captured is so thin, so devoid of cognitive content, that surface features suffice. For richer voices, surface features produce only parody.

What Problem Are You Actually Solving?

Before pursuing tonality, ask a prior question: what does your audience actually need?

The answer changes everything.

Voice, it turns out, has components. Four of them, roughly: the linguistic markers you can measure (function words, sentence patterns), the natural variance that makes writing feel human, the cognitive patterns that reflect how someone thinks, and the audience layer that shapes how all of this gets deployed. True tonality sits at the intersection of all four. But most use cases don't need all four.

Click a use case on the right. Notice how the diagram grays out everything except what that use case actually requires. Self-attribution? You need linguistic markers and natural variance to avoid AI-detection tells, but you don't need the cognitive layer or audience adaptation. A course Q&A bot? Cognitive patterns and audience needs matter; the instructor's linguistic fingerprint doesn't.

The center, the golden intersection where all four circles meet, is true tonality. Few use cases live there. Most live in the overlapping regions of two circles. They need some components of voice but not all.

Consider the specific cases:

The uncanny valley. A chatbot representing a departed loved one. The audience knows this person intimately. They will sense wrongness they cannot articulate. This sounds like the hardest case, but consider: the audience is not adversarial. They are not running stylometric analysis. They want to connect. They are meeting an emotional need, not conducting forensics. Constrain the scope to brief interactions. Include characteristic phrases, familiar topics, expressions of affection. Optimize for recognition, not perfection. The audience is not trying to catch the bot; they are trying to feel something.

The institutional extension. A virtual teaching assistant, a customer service bot "from" a known figure. The audience knows it's a bot. They want expertise and appropriate register, not deception. Topicality matters; tonality is a nice-to-have. You need domain knowledge and professional warmth. Individual voice is secondary.

The self-attribution case. Content you will claim as your own. The audience is readers of your future work who might notice discontinuity with your past work. But if they don't know your past work well, the bar is low. You might just need to avoid AI-detection tells: inject variance, maintain burstiness, break the monotony of machine prose.

This is less about sounding like you and more about sounding like someone.

The anonymous professional context. Reports, documentation, emails where no one cares whose voice it is. You need clarity, domain knowledge, appropriate formality. Tonality is irrelevant. This is not the problem you have.

Many people pursuing tonality discover, upon reflection, that they need something else. Topicality: the right subject matter and domain expertise. Sentiment: warmth, authority, or Gilfoyle-style acerbity. Register: formal versus casual, technical versus accessible. These are simpler problems than full stylometric replication. Solve the problem you actually have.

If You Still Need Tonality

For cases where individual voice genuinely matters, several approaches help.

Focus on the most distinctive features. Analyze your target author. What stands out? Do they have strong function-word preferences? Unusual sentence-length distributions? Characteristic phrases that recur? Build a feature profile and focus attention on the features with highest variance from population norms.

Use few-shot prompting aggressively. Include multiple examples of the target author's writing in the system prompt. Models pick up on stylistic patterns from examples even without explicit instruction. The examples should span topics and contexts to capture range, not just typical cases.

Iterate with human feedback. Style is subjective and contextual. A feature that matters in one context may be irrelevant in another. The only reliable test is whether readers who know the target author find the output convincing. Test, adjust, repeat.

Accept the limits. A language model is not a person. It does not have experiences, opinions, blind spots, or grudges. It cannot replicate the cognitive layer that gives writing its deepest character. What it can do is approximate the surface layer, well enough for some purposes, not well enough for others.

The Signature That Remains

Mosteller and Wallace solved the Federalist Papers authorship problem by focusing on what the authors couldn't control: their unconscious preferences for "upon" versus "on," "whilst" versus "while." These markers persisted regardless of topic, audience, or argumentative strategy.

Two hundred years later, we face the inverse problem. Not attribution but generation. Not identifying who wrote something, but making something that reads as if a specific who wrote it.

An empty leather office chair in the corner — He hasn't written anything in years.

The stylometric tradition offers guidance. Tonality lives in the small words, the sentence rhythms, the phrase patterns that accumulate below conscious awareness. Burstiness matters: real voices vary in ways that averaged-out models do not. And the hardest features to capture are the cognitive ones, how a person thinks, that manifest in style but originate elsewhere.

Gilfoyle's chatbot worked because Gilfoyle had no there there. Sarcasm and brevity, end of profile. For writers with actual voices, the task is harder. The fingerprint exists, but the fingers are complicated.

. . .

References

Mosteller, F. and Wallace, D.L. (1964). Inference and Disputed Authorship: The Federalist. Addison-Wesley.
Burrows, J. (2002). "'Delta': A Measure of Stylistic Difference and a Guide to Likely Authorship." Literary and Linguistic Computing, 17(3): 267-287.
Stamatatos, E. (2009). "A survey of modern authorship attribution methods." Journal of the American Society for Information Science and Technology, 60(3): 538-556.
Tian, E. (2023). "GPTZero." AI detection using burstiness and perplexity metrics.
Eder, M., Rybicki, J., and Kestemont, M. (2016). "Stylometry with R: A Package for Computational Text Analysis." The R Journal, 8(1): 107-121.

Tonality Style Linguistics