Dead link epidemic: 38% Gone in Critical Digital Memory Crisis

Reading mode

The dead link epidemic is not a tidy web maintenance problem. It is a record problem: the news story, court footnote, government page, scientific article, or Wikipedia reference can still look stable after the linked evidence behind it disappears. Pew Research Center found that 25% of webpages sampled from 2013 to 2023 were no longer accessible by October 2023, and that 38% of pages from 2013 were gone a decade later^[s].

That matters because journalism has long carried the burden of the “first rough draft of history,” a phrase often attributed to Washington Post publisher Philip L. Graham, though Slate traced older uses and variants^[s]. In the print age, a source could be difficult to reach, buried in a library stack or a courthouse box. In the web age, a dead URL can sever the evidence itself.

Why the dead link epidemic matters

For historians, the dead link epidemic changes the survival conditions of evidence. Pew researchers found broken links on 23% of sampled news pages and 21% of sampled government pages; they also found that 54% of sampled English Wikipedia pages had at least one dead link in the references section^[s]. Those are not marginal spaces. They are the daily record of policy, war, elections, health scares, school board disputes, court cases, and public memory.

The fast moving social record is even more fragile. Pew tracked a sample of public tweets posted in spring 2023 and found that 18% were no longer publicly visible at the end of its monitoring period^[s]. A future historian trying to reconstruct a protest, a rumor, a government warning, or an eyewitness account may find the article that mentioned the post, but not the post itself.

From broken links to lost context

A dead link is the simplest failure. Harvard Law Review’s Perma study described link rot as a URL no longer serving content, while reference rot means the page still loads but no longer contains the cited information^[s]. That second failure is quieter. The reader sees a working page and may never know the evidence has changed.

The legal record shows why this is not only a newsroom problem. The Perma researchers reported reference rot in more than 70% of URLs in the sampled Harvard law journals and in 50% of URLs in U.S. Supreme Court opinions^[s]. When a court opinion, a law review article, or a policy brief points to a web source that later changes, later readers inherit a citation that looks authoritative but may no longer prove what it once proved.

The dead link epidemic is visible inside a major U.S. news institution. Columbia Journalism Review described a Harvard Law School project that examined New York Times links from the launch of the Times website in 1996 through mid 2019^[s]. The researchers found that 25% of deep links were completely inaccessible, with 6% of 2018 links rotted, 43% of 2008 links rotted, and 72% of 1998 links rotted^[s]. They also found that 13% of reachable links in a human reviewed sample had drifted significantly from the content the Times originally linked^[s].

The archive race

The story is not only disappearance. The Internet Archive said in April 2026 that its analysis of Pew’s dataset found the Wayback Machine had rescued roughly 15% of otherwise dead pages, and that it had archived about 72% of the full dataset it checked^[s]. By October 22, 2025, the Internet Archive said the Wayback Machine had preserved 1 trillion web pages^[s].

Public institutions have also treated the web as part of the historical record. The National Archives says long-term preservation of government website content is critical to public understanding of government and history, and that it began capturing congressional websites at the end of every Congress in 2006^[s]. The International Internet Preservation Consortium says the Library of Congress began its U.S. Elections Web Archive as a pilot project in 2000 and now maintains more than 100 event based and thematic web archive collections^[s].

What survives becomes history

The dead link epidemic does not erase the early 21st century in a single dramatic collapse. It erodes the connective tissue between claim and evidence. That kind of loss can bias the surviving record. Large institutions, high traffic pages, archived public documents, and content that somebody thought to preserve may have better odds of surviving. Small local pages, temporary campaign sites, deleted posts, revised government pages, and ordinary personal sites may be easier to lose.

The answer is not nostalgia for paper. Paper records burned, molded, and disappeared too. The lesson is that the web needs archival habits suited to its speed. Citations should point to preserved copies when the original page is evidence. Newsrooms and scholars should treat link preservation as part of publishing, not as cleanup work after the link fails. The dead link epidemic is a reminder that the first draft of history is only useful when future readers can still inspect the sources underneath it.

The dead link epidemic is best understood as a provenance failure, not a convenience failure. A web citation has to identify a resource, deliver the cited content, and preserve enough context for a later reader to test the claim. HTTP links can do the first two jobs for a while. Without capture, versioning, and durable citation practice, they do not reliably do the third.

What the dead link epidemic measures

Researchers use several lenses to measure the dead link epidemic. Pew Research Center sampled just under 1 million webpages from Common Crawl for its decade scale analysis, then checked whether those pages were still reachable^[s]. That method found 25% of pages from 2013 to 2023 inaccessible by October 2023, with the oldest cohort showing the largest loss: 38% of sampled 2013 pages were unavailable a decade later^[s].

That is link rot. Reference rot is broader. Harvard Law Review’s Perma study defined link rot as a URL no longer serving content and reference rot as a still working URL whose cited information is gone or changed^[s]. For historical work, reference rot can be more dangerous than a 404 page because it hides the failure inside a page that appears healthy.

In law, that distinction changes the diagnosis. The Perma researchers reported that more than 70% of URLs in three sampled Harvard law journals and 50% of URLs in U.S. Supreme Court opinions suffered reference rot^[s]. In scholarship, a PLOS One study of science, technology, and medicine articles found one in five STM articles suffered reference rot, rising to seven in ten when considering only STM articles that contained web references^[s].

Decay has a shape

The dead link epidemic is not evenly distributed. A URL lifespan study archived at Zenodo examined 27.3 million URLs archived from 1996 to 2021 by the Internet Archive and found that only 35% remained active in 2023^[s]. The same study found root URLs had a half-life of nine years, compared with one year for deep links^[s]. That matters because citations rarely point only to a domain home page. They point to a report, a press release, a docket page, a staff biography, a local notice, or a vanished article.

The New York Times study shows the same pattern in journalism. Columbia Journalism Review reported that the Harvard team examined links in Times articles from 1996 through mid 2019, using a dataset provided by the Times^[s]. Among deep links, 25% were completely inaccessible. By publication year, 6% of 2018 links had rotted, compared with 43% of 2008 links and 72% of 1998 links^[s]. In a separate human review, 13% of reachable links had drifted significantly from their original context^[s].

Archiving changes the denominator

The dead link epidemic looks less absolute when web archives are counted, but it does not disappear. Internet Archive researchers said their April 2026 review of Pew’s dataset found the Wayback Machine had archived about 72% of the full dataset, including 16% that were dead on the live web but rescued by an archived copy^[s]. The same article said Turn All References Blue had fixed more than 30 million broken links across hundreds of wikis using InternetArchiveBot, WaybackMedic, and the Wayback Machine^[s].

Scale helps, but coverage is not total. Internet Archive identified limits including resource constraints, JavaScript heavy pages, bot blocking, login walls, paywalls, deep web content, and delayed discovery^[s]. That means the archived web is not a perfect mirror of the live web. It is a second historical layer, shaped by crawl policy, site permissions, technical barriers, and chance.

How historians should read the web record

For historians, the dead link epidemic makes web citations artifacts in their own right. A citation is not only a route to a page. It is evidence of what an author thought could support a claim at a particular moment. When the page disappears, the citation becomes a trace of a missing source. When the page changes, it can become more misleading because the outward form of evidence remains.

Government web preservation shows the stakes. NARA says preserving government website content is critical to public understanding of government and history, and it began capturing congressional websites at the end of every Congress in 2006^[s]. The Library of Congress Web Archive, according to the International Internet Preservation Consortium, includes more than 100 event based and thematic collections, and its U.S. Elections Web Archive began as a pilot project in 2000^[s].

The practical conclusion is strict but simple. A citation to the live web should be treated as incomplete unless the cited state is also preserved. Perma.cc was built for that premise, tying citation to capture so authors and editors can preserve the cited page at publication time^[s]. The web can still serve as the first draft of digital history, but only if historians, librarians, journalists, courts, and publishers preserve the draft before revision, deletion, and domain failure turn it into hearsay.

Dead Link Epidemic: 38% of 2013 Webpages Are Already Gone

Why the dead link epidemic matters

From broken links to lost context

The archive race

What survives becomes history

What the dead link epidemic measures

Decay has a shape

Archiving changes the denominator

How historians should read the web record

Sources

Why the dead link epidemic matters

From broken links to lost context

The archive race

What survives becomes history

What the dead link epidemic measures

Decay has a shape

Archiving changes the denominator

How historians should read the web record

Sources

Related

State Torture: The Enlightenment Abolished It, Then Democracies Reinvented It

Uncensored AI: What the Term Actually Means and What It Does Not

The Radium Girls: How Dying Factory Workers Built the Foundation of US Worker Safety Law

The 1970s Energy Crisis Playbook: Why Modern Governments Are Repeating the Same Strategic Errors