In September 1995, The Washington Post published a 35,000-word manifesto titled “Industrial Society and Its Future.” The author was unknown. The FBI had spent 17 years and millions of dollars hunting the person who wrote it, a serial bomber who had killed three people and injured nearly two dozen more. Traditional forensic evidencePhysical evidence collected from a crime scene and analyzed scientifically to establish facts or reconstruct events; includes biological materials, trace evidence, and physical objects examined by forensic specialists. had turned up nothing: no fingerprints, no DNA, no traceable materials.[s] What finally cracked the case was not a strand of hair or a chemical residue. It was a phrase: “eat your cake and have it too.”
The field that made this breakthrough possible is forensic stylometryThe quantitative analysis of writing style to determine authorship, using statistical and computational methods to identify distinctive language patterns., the quantitative study of writing style applied to questions of authorship.[s] Every person uses language in subtly distinctive ways: specific word choices, punctuation habits, sentence structures, and unconscious preferences for certain prepositions over others. Linguists call this an idiolectAn individual's unique way of using language, including distinctive word choices, grammar patterns, and unconscious linguistic preferences., a personal version of a shared language.[s] Forensic stylometry treats these patterns as evidence, measuring them with statistical and computational tools to determine who wrote a disputed text.
The Unabomber’s Words
The FBI’s decision to publish the manifesto was a calculated gamble. Agents hoped someone would recognize the writing.[s] In Schenectady, New York, a woman named Linda Patrik read the essay and thought it sounded like her brother-in-law, Ted Kaczynski. She showed it to her husband, David Kaczynski, who recognized unusual expressions his estranged brother favored, including “cool-headed logicians.”[s] David approached the FBI with his suspicions.
FBI Supervisory Special Agent James Fitzgerald then conducted a systematic linguistic comparison between Kaczynski’s known writings and the manifesto. The similarities were striking. Both texts used “analyse” instead of “analyze,” “licence” instead of “license,” and “wilfully” instead of “willfully.” Both inverted the common idiom into “you can’t eat your cake and have it too.”[s]
Sociolinguist Roger Shuy identified another revealing detail: the manifesto’s unusual spellings, such as “clew” for “clue,” matched spelling reforms championed by The Chicago Tribune during the 1940s and 1950s. These reforms were never widely adopted, but someone who grew up reading that newspaper would have absorbed them. Kaczynski was born in Chicago in 1942.[s]
The FBI’s linguistic analysis, combined with biographical facts, provided the basis for a search warrant.[s] On April 3, 1996, agents arrested Kaczynski at his Montana cabin, where they found bomb-making materials and a carbon copy of the manifesto. Fitzgerald’s work marked the first time forensic stylometry was used in a federal case to obtain a search warrant.[s]
Forensic Stylometry Goes Digital
The Unabomber case proved that writing style could function as evidence. In the decades since, computers have transformed forensic stylometry from a painstaking manual process into a rapid, scalable discipline. The foundational work dates to 1964, when statisticians Frederick Mosteller and David Wallace spent three years analyzing function wordsGrammatical words like articles, prepositions, and conjunctions that serve structural rather than semantic purposes in language. in the Federalist Papers by hand, ultimately attributing twelve disputed essays to James Madison.[s] Modern software can perform equivalent analyses in seconds.
The most public demonstration came in 2013, when Patrick Juola, a computer scientist at Duquesne University, used his Java Graphical Authorship Attribution Program (JGAAP) to analyze a detective novel called The Cuckoo’s Calling, credited to debut author Robert Galbraith. A Sunday Times reporter had received a tip that J.K. Rowling was the real author. Juola’s program compared the novel against works by Rowling and three other British female crime novelists, tracking word length distributions, the 100 most common words, character four-grams, and word bigrams.[s]
Rowling was the only author who consistently matched across all four tests. As Juola explained, “everyone has a particular way of writing that’s almost impossible to hide.”[s] Confronted with the evidence, Rowling admitted the pseudonym was hers.
When Anonymity Is Life or Death
Rowling’s unmasking was embarrassing but harmless. For whistleblowers, dissidents, and anonymous sources, the same technology poses a far graver threat. If forensic stylometry can identify a bestselling novelist from her prepositions, it can identify a government employee who leaks classified documents, or an activist who publishes criticism of an authoritarian regime.
Researchers at Drexel University have explored both sides of this problem. Their Privacy, Security and Automation Lab developed two competing tools: JStylo, which identifies authors, and Anonymouth, which helps writers disguise their style. JStylo can select the correct author from a pool of 40 candidates with 80 to 85 percent accuracy, given a writing sample of about 6,500 words.[s]
“When people want to speak anonymously, whether it be for reporting on human rights issues or whistleblowing or simply voicing unpopular opinions, they need to know how to be safe and whether stylometry may reveal their identity,” said Rachel Greenstadt, the lab’s director.[s]
Anonymouth works by running the same analyses as JStylo, then suggesting changes the author can make to mask their writing fingerprint. The tool does not encode text; it coaches writers on which habits to alter. The approach reflects a growing subfield called adversarial stylometryThe practice of deliberately modifying writing style to evade authorship detection by stylometric analysis tools., the practice of deliberately altering writing to evade authorship detection.
Code Has a Fingerprint Too
Forensic stylometry no longer applies only to prose. Researchers have demonstrated that programmers leave stylistic signatures in source code, from variable naming conventions to the structure of their abstract syntax trees. A 2024 study from the University of Bologna assembled a dataset of 114,400 code snippets from 104 open-source developers and achieved 69 to 71 percent accuracy in attributing code to the correct author, even for programmers not seen during training.[s]
This matters because anonymous code contributions are common in open-source projects, leak repositories, and cybercrime investigations. If forensic stylometry can attribute a piece of malware or a leaked codebase to a specific programmer, the implications for both law enforcement and civil liberties are significant.
The Limits of Linguistic Evidence
Forensic stylometry is powerful, but it is not a fingerprint in the forensic sense. Juola himself was careful to note that his analysis of Rowling’s novel did not prove authorship; it showed that Rowling “or someone who writes surprisingly like Rowling” was the most likely candidate.[s] The technique works best as corroborating evidence alongside other investigative methods.
Authorship attribution also requires a comparison corpusA collection of texts with known authorship used as reference material in stylometric analysis to identify writing patterns.. When Miles Taylor revealed himself as the anonymous “resister” who wrote the 2018 New York Times op-ed criticizing the Trump administration from within, forensic stylometry had been unable to identify him because he had never published anything else to compare against.[s]
Courts treat the admissibility of forensic stylometry evidence with caution. As linguists have noted, many American judges and lawyers have little experience with linguistic expertise, and the journey from linguistic coincidence to admissible evidence remains a case-by-case determination.[s]
The tension at the heart of forensic stylometry is unlikely to resolve. The same science that brought a serial bomber to justice can strip protection from those who speak truth to power. Every advance in detection generates new research in evasion, and every tool built to protect anonymity can also shield criminals. The Unabomber’s shadow falls in both directions.
On September 19, 1995, The Washington Post printed a 35,000-word manifesto titled “Industrial Society and Its Future” at the demand of an unknown serial bomber. The FBI had spent 17 years pursuing the case, designated UNABOMFBI code designation for the investigation of mail bombs sent to universities and airlines. Stands for UNiversity and Airline BOMbing. for its university and airline bombing targets, without identifying a suspect. The bomber had killed three people, injured 23, and deliberately left false forensic clues. He ripped the skins off batteries to prevent tracing. He made his own epoxy from melted deer hooves instead of using commercial glue.[s] No fingerprints, no DNA, no traceable materials remained on any device.[s]
What the bomber could not erase was his writing style. The decision to publish his manifesto handed investigators the one form of evidence he could not scrub: his idiolectAn individual's unique way of using language, including distinctive word choices, grammar patterns, and unconscious linguistic preferences., the unique constellation of vocabulary, syntax, and unconscious linguistic habits that constitutes a personal fingerprint in language.[s] The field that exploited this evidence is forensic stylometryThe quantitative analysis of writing style to determine authorship, using statistical and computational methods to identify distinctive language patterns., the quantitative analysis of writing style for authorship attribution.[s]
Forensic Stylometry in the Unabomber Investigation
The manifesto’s publication produced thousands of tips. The decisive one came from David Kaczynski, whose wife Linda Patrik recognized the writing as reminiscent of her brother-in-law, Ted. David identified distinctive phrases, including “cool-headed logicians,” a term his brother favored.[s]
FBI Supervisory Special Agent James Fitzgerald, who later became the Bureau’s first trained forensic linguist, conducted a systematic comparison. He catalogued lexical, orthographic, and syntactic parallels between the manifesto and Kaczynski’s known correspondence. Both texts used British-influenced spellings: “analyse” for “analyze,” “licence” for “license,” “wilfully” for “willfully,” “instalment” for “installment.” Both reversed the standard American idiom into “you can’t eat your cake and have it too.” Both employed unusual vocabulary including “chimerical” and “middle-class vacuity.”[s]
Sociolinguist Roger Shuy contributed a geographic inference. The manifesto’s spellings, such as “clew” for “clue,” matched reforms The Chicago Tribune had promoted from the 1940s through the 1950s. These reforms never gained widespread adoption, meaning the writer had likely absorbed them during formative years in or near Chicago. Kaczynski was born there in 1942. The manifesto also used “rearing children” rather than “raising children,” a dialectal marker consistent with the northern United States, and slang terms like “broad” and “chick” that suggested a man who came of age in the 1960s.[s]
The FBI’s search warrant affidavit included detailed side-by-side textual comparisons. The FBI stated that “our linguistic analysis determined that the author of those papers and the manifesto were almost certainly the same.”[s] This analysis provided the legal basis for a search warrant, the first time forensic stylometry was used in a federal criminal case for that purpose.[s] On April 3, 1996, agents arrested Kaczynski at his Montana cabin. Inside they found bomb-making materials, 40,000 handwritten journal pages, and a carbon copy of the manifesto.
A notable caveat: according to the search warrant affidavit, none of the outside academic experts consulted had independently named Kaczynski as a suspect. The identification depended on David Kaczynski’s family knowledge combined with Fitzgerald’s linguistic analysis.[s]
Computational Forensic Stylometry
The Unabomber case relied on manual linguistic comparison. The discipline’s computational roots trace to 1964, when Frederick Mosteller and David Wallace published a three-year statistical study of the Federalist Papers. They measured the frequency of function wordsGrammatical words like articles, prepositions, and conjunctions that serve structural rather than semantic purposes in language., articles, prepositions, and conjunctions in disputed essays, ultimately attributing twelve papers to James Madison based on Bayesian inference.[s]
Modern forensic stylometry automates and scales this approach. Patrick Juola’s JGAAP (Java Graphical Authorship Attribution Program) analyzes “literally millions of different features,” according to Juola, tracking word length distributions, the frequency of the 100 most common words, character four-grams (groups of four adjacent characters, capturing word stems and cross-word patterns), and word bigrams (pairs of adjacent words).[s]
In 2013, Juola applied JGAAP to determine whether J.K. Rowling had written The Cuckoo’s Calling under the pseudonym Robert Galbraith. He compared the novel against Rowling’s The Casual Vacancy and novels by Ruth Rendell, P.D. James, and Val McDermid. Rowling was the only candidate who consistently matched across all four independent tests. Peter Millican at the University of Oxford conducted a parallel analysis and reached the same conclusion.[s] Rowling subsequently confirmed authorship.
Juola emphasized the method’s limitations: “Stylometry is much less reliable and accurate than DNA. All we really knew was that it was either by Rowling herself, or by someone who wrote in a very similar style to Rowling.”[s]
Adversarial StylometryThe practice of deliberately modifying writing style to evade authorship detection by stylometric analysis tools. and the Privacy Arms Race
The same forensic stylometry techniques that identify criminals can strip anonymity from whistleblowers and dissidents. Drexel University’s Privacy, Security and Automation Lab, directed by Rachel Greenstadt, developed JStylo and Anonymouth to address both sides of this equation. JStylo attributes authorship with 80 to 85 percent accuracy from a pool of 40 candidates, given a 6,500-word sample. Anonymouth coaches writers on modifying their style to evade detection.[s]
The subfield of adversarial stylometry, the deliberate alteration of writing to prevent attribution, has produced increasingly sophisticated tools. Researchers have demonstrated that manual obfuscation can reduce forensic stylometry accuracy to the level of random guessing. Automated tools can iteratively modify text while preserving semantic content, though such obfuscation remains imperfect: altered texts can often be detected as machine-modified, meaning the act of disguise itself leaves traces.
Code Stylometry
Forensic stylometry now extends beyond natural language into source code. Programmers exhibit distinctive patterns in variable naming, indentation, comment style, and the structural choices reflected in abstract syntax trees. A 2024 study at the University of Bologna assembled 114,400 code snippets from 104 open-source developers and trained a k-nearest neighbors classifier on code2seq embeddings. The system achieved 69 to 71 percent accuracy in attributing code to individual authors, including authors absent from the training set.[s]
Earlier work by Aylin Caliskan-Islam and colleagues at Drexel University, presented at USENIX Security 2015, demonstrated that abstract syntax tree features are particularly resistant to obfuscation, making code stylometry more robust than text-based approaches against deliberate disguise.[s]
Evidentiary Standards and Limitations
Forensic stylometry evidence faces scrutiny in courts. Its admissibility depends on jurisdiction and methodology, typically evaluated under established standards for scientific reliability.[s] As legal scholars Peter Tiersma and Lawrence Solan observed, “the vast majority of American lawyers and judges have little or no experience with linguistic expertise in a legal matter.”[s]
The technique also has structural constraints. It requires a comparison corpusA collection of texts with known authorship used as reference material in stylometric analysis to identify writing patterns. of known writing; when Miles Taylor revealed himself in 2020 as the anonymous author of a New York Times op-ed and book criticizing the Trump administration, forensic stylometry had failed to identify him because he had no prior publications.[s] Short texts remain difficult to analyze reliably. And authorship attribution in multilingual or collaborative writing contexts introduces additional complications that current models handle poorly.
The field continues to advance. Machine learning models have continued to improve accuracy in controlled settings with known author pools. But the fundamental tension persists: every improvement in forensic stylometry that helps catch a bomber also narrows the space in which a whistleblower can safely speak. The Unabomber’s shadow, cast by 35,000 words in 1995, still falls across both sides of that line.



