More than 340 local news sites across the United States are now limiting the Internet Archive’s ability to access and preserve their stories.[s] This is not a minor technical dispute. This is journalism archive preservation collapsing in real time, driven by a panicked response to AI scraping that may do more long-term damage than the scraping itself.
The position is straightforward: news organizations blocking the Internet Archive are trading their own historical record for the illusion of AI protection. They are accelerating the very erasure they should be fighting. Independent journalists who grasp this are building alternative infrastructure, and they will be the ones whose work survives the next decade.
The Journalism Archive Preservation Crisis
In January 2026, Nieman Lab reported that 241 news sites from nine countries explicitly disallow at least one Internet Archive crawling bot.[s] By May, Nieman Lab’s updated sample had grown to 382 disallowing sites, 342 of them local. Many of those sites are owned by five of the seven largest local news publishers: USA Today Co., McClatchy, Advance Local, MediaNews Group, and Tribune Publishing. The latter two are subsidiaries of Alden Global Capital, the hedge fund infamous for asset-stripping newsrooms.
The stated rationale is AI scraping. The New York Times said it was blocking the Internet Archive’s bot because the Wayback Machine provides unauthorized access to Times content, including by AI companies.[s]
This reasoning has a fatal flaw. As computer scientist Michael Nelson put it: “Common Crawl and Internet Archive are widely considered to be the ‘good guys’ and are used by ‘the bad guys’ like OpenAI. In everyone’s aversion to not be controlled by LLMs, I think the good guys are collateral damage.”[s]
The collateral damage is journalism archive preservation itself. When a local paper shuts down or switches content management systems, the Wayback Machine is often a crucial surviving record. In 2024, thousands of articles vanished from Western Massachusetts papers during a CMS migration. When The Hook, a Charlottesville weekly, closed in 2012, its archived site went offline a decade later, erasing over 22,000 stories.[s]
“Blocking the Internet Archive’s web crawlers threatens one of the most effective ways that we capture and store news content for the long term,” said Edward McCain, a journalism librarian at the University of Missouri. “In the present we may have some workarounds, but in the long run, it weakens a vital link in primary source materials that we need to understand where we’ve been and where we want to go.”[s]
History Repeats: Archives Destroyed Gradually Through Neglect
The pattern is familiar. Like the Library of Alexandria, whose collections were archives destroyed gradually through neglect rather than a single dramatic fire, journalism’s digital record is eroding through a thousand small decisions. Each outlet that blocks the Wayback Machine, each CMS migration that loses backlinks, each closure without archival handoff chips away at the public record.
Internet Archive founder Brewster Kahle warned that “if publishers limit libraries, like the Internet Archive, then the public will have less access to the historical record.”[s]
NYU professor Meredith Broussard noted the deeper problem: “Every news organization, especially local news organizations, generally launch thinking, ‘we’re going to put stuff on the internet and it’s going to be there forever,’ and that’s not true. Anybody who told you the internet is forever lied.”[s]
The Decentralized Response
While corporate chains limit archive access, independent journalists are building alternative infrastructure. FFDW’s 2025 impact material says the Filecoin Foundation for the Decentralized Web partnered with Fasila to preserve critical journalism by supporting more than 20 journalists and archiving important reporting and associated digital records. It says those materials are stored on the Filecoin network and made accessible via IPFS through Fasila’s Alive-In archive.[s]
This is not a marginal experiment. The project “presents the archived stories on a dedicated, metadata-rich platform that supports discovery and use by journalists, researchers, educators, and filmmakers, demonstrating a replicable model for safeguarding cultural memory through decentralized storage.”[s]
In the same 2025 material, FFDW said its work with the Freedom of the Press Foundation supported infrastructure, user experience, and security enhancements for decentralized tools used by journalists around the globe.[s] That collaboration includes SecureDrop, the open-source submission system used by newsrooms for secure document exchange and communication between journalists and sources.
The logic is simple: if platforms can deplatform you, and if archives can exclude you, then a durable path for journalism archive preservation is infrastructure you control.
Platform Independence as Survival Strategy
The broader trend points toward self-hosted infrastructure. “Newsrooms will reclaim control over technology,” predicted Ben Werdmuller of ProPublica. “Collaboration and independent, mission-aligned open-source teams will create tools that serve core newsroom needs, including secure communication, privacy-preserving analytics, and sustainable distribution.”[s]
LaSharah S. Bunting of The 19th argued that “the strongest newsrooms in 2026 won’t be the ones chasing every platform shift or business trend, but the ones prepared to survive multiple futures at once.”[s]
For individual journalists, this means reconsidering platform dependency. One practitioner warned that as Substack became more popular, it changed into a fuller social platform and began locking users into a closed environment that becomes harder to leave.[s]
The alternative is open-source, self-controlled infrastructure. The same critic listed Ghost’s advantages as open-source software, user control over data, and Fediverse support.[s] WordPress, WriteFreely, and other self-hosted options offer similar independence.
The Stakes Are Political
This is not merely a technical or business problem. The context is major media consolidation. The FCC approved the Nexstar-Tegna merger by waiving rules prohibiting any chain from reaching more than 39 percent of U.S. households; the combined entity will reach 80 percent.[s]
“The merger, if it survives legal challenges, would further consolidate broadcast and cable news in the hands of Trump-friendly owners with right wing leanings,” noted Investigative Post.[s]
Journalist and author Nora Benavidez, quoted by Investigative Post, argued that the biggest billionaire-owned media companies curry favor with Trump to protect financial and business interests that can clash with journalistic missions.[s]
When ownership capture and platform dependency combine with archive blocking, the result is journalism that can be erased, redirected, or silenced at will. The reporters who recognize this, like journalists killed for their reporting in countries where press freedom has already collapsed, understand that infrastructure is not neutral.
The Counterargument
The publishers blocking the Internet Archive are not acting without reason. AI companies have aggressively scraped copyrighted content. Gannett CEO Mike Reed said OpenAI sent about 70 million bot requests to Gannett’s local and USA Today platforms in September 2025.[s] The financial pressure on news organizations is real, and licensing deals represent potential revenue.
The Atlantic’s CEO Nick Thompson explained the logic: “Because of the damages that can be done when you let all your content be scraped, because of all the leverage you lose, there will be worthy products that you previously gave your data to and now you can’t.”[s]
This is understandable. But it is also shortsighted. The leverage being preserved is leverage over AI licensing negotiations. The cost is the public record itself. Publishers are optimizing for a revenue stream that may or may not materialize while undermining the journalism archive preservation their readers depend on.
Meanwhile, government data breaches expose millions of citizens to identity theft and surveillance, demonstrating that centralized digital infrastructure is inherently fragile. The lesson extends to journalism: relying on any single preservation mechanism is a risk.
What Should Change
First, news organizations should separate AI licensing negotiations from archive access. Blocking the Internet Archive does not prevent AI scraping; it undermines journalism archive preservation. These are different problems requiring different solutions.
Second, independent journalists should build infrastructure they control. This means self-hosted websites, email lists they own, archives on decentralized storage. Ghost, WordPress, IPFS: the tools exist.
Third, funders and foundations should support journalism archive preservation directly. In December 2025, the Internet Archive partnered with the Poynter Institute and Investigative Reporters and Editors on an initiative that aims to train 300 newsrooms in digital preservation and in using the Internet Archive’s services by the end of 2027.[s] It needs to scale.
Fourth, newsrooms should treat their archives as institutional assets. The Reuters Institute documented how The Economist, Charlie Hebdo, and Nigeria’s Archivi.ng are reviving their archives as editorial tools. “The first thing newsrooms can start by doing is making it ridiculously easy for their own journalists to discover their internal archives,” said Fu’ad Lawal of Archivi.ng.[s]
Charlie Hebdo uses its archives to onboard new journalists, connecting them to the publication’s history and editorial identity. “Many of our readers have followed the paper for decades. They often know the paper better than we do. So when we receive criticism, it helps to understand where it’s coming from, historically.”[s]
The Opening
There is a structural opportunity in the chaos. Social media’s credibility collapse creates demand for verified, community-rooted journalism. “In 2026, social media platforms will face a significant supply problem: The supply is increasingly fake, and the fake is increasingly indistinguishable from the real,” wrote Suffolk University professor Jonas Kaiser.[s]
Kaiser argued that as social media embraces inauthenticity, community-focused journalism will find both an audience and a renewed democratic purpose.[s]
This is the moment for journalism archive preservation to become a movement priority, not an afterthought. The journalists who control their own infrastructure, who maintain their own archives, who build direct relationships with their communities: they will be the ones still publishing when the platforms have moved on.
When the record disappears, the ability to hold power accountable disappears with it.
The choice is not between protecting content from AI and preserving it for history. The choice is whether to own your infrastructure or rent it from entities that will eventually optimize against your interests. Independent journalism has always required independence. Now it requires independent archives.



