A Brief History of Text Generation
From Shannon's hand-picked letters to modern LLMs. The real outputs from ELIZA, RACTER, char-rnn, and GPT, and why each generation felt like a breakthrough.
From Shannon's hand-picked letters to modern LLMs. The real outputs from ELIZA, RACTER, char-rnn, and GPT, and why each generation felt like a breakthrough.
How a 1994 data compression algorithm became the foundation of modern AI. The untold story of Byte Pair Encoding's journey from C Users Journal to GPT-4.
Understanding how LLMs transform text into tokens, and why this seemingly simple process has profound implications for cost, context limits, and model behavior.
BPE, WordPiece, SentencePiece, Unigram. Four algorithms, four trade-offs, none of them know what a word is.
Tamil speakers pay 7x more tokens than English speakers for the same meaning. The hidden cost of tokenization and why morphology sets a compression ceiling.
Over a million AI agents registered on a social network built exclusively for them. They formed religions and drafted constitutions. A critical analysis of what is actually happening, and what the security implications mean.
Reddit usernames that break GPT. Invisible characters that bypass filters. The edge cases where tokenization fails spectacularly.
How a 2001 method for comparing corpora became a detector for AI-generated text, pasted content, and ghostwriters. Chi-squared drift detection across sliding windows.
251,022 tokens across five books, measured against the British National Corpus. The frequency data draws a portrait of a man who wrote with his lungs and his skin.
How words turn into coordinates, and why "king minus man plus woman" equals "queen". The story of how meaning became geometry.
Static embeddings gave "oracle" one vector for priestess, database, and Matrix character. Then attention learned to compute meaning from context.
Explore how LLMs manage context windows, from quadratic attention scaling to truncation strategies, and why the most expensive tokens are often the ones you never meant to send.