AI copyright law is broken. Not in the slow, dignified way that legal frameworks typically age into obsolescence, but in the spectacular, contradictory way that happens when a centuries-old system collides with technology it was never designed to anticipate. Over 70 copyright infringement lawsuits have been filed against AI companies as of early 2026[s], more than double the count at the end of 2024. The largest copyright settlement in U.S. history, $1.5 billion, was paid by a single AI company for downloading pirated books[s]. And three federal judges, ruling on related questions about AI training and copyright in the span of a few months, reached conclusions that pointed in entirely different directions[s].
The intellectual property commons, that shared legal understanding of what creators own, what the public may use, and where fair useA US legal doctrine allowing limited use of copyrighted material without permission for purposes like criticism, education, or commentary. begins and ends, is dead. What replaces it will determine whether the next generation of artists, journalists, and researchers gets compensated for the raw material that powers the most profitable technology sector on earth, or whether their work simply becomes free fuel for someone else’s billion-dollar model.
AI Copyright Law in the Courtroom: Three Rulings, Three Realities
The clearest proof that AI copyright law has no coherent doctrine came in June 2025, when two federal courts in Northern California issued summary judgmentA court ruling that resolves a case without a full trial, granted when there is no genuine dispute over the key facts and the law clearly favors one side. opinions within 48 hours of each other, and a third in Delaware had already set a contradictory baseline months earlier.
In Bartz v. Anthropic, Judge William Alsup ruled that training a large language modelA machine learning system trained on vast amounts of text that predicts and generates human language. These systems like GPT and Claude exhibit surprising capabilities but also make confident errors. on copyrighted books was “transformative, spectacularly so”[s]. He compared the process to human reading and learning: the purpose of ingesting text to build statistical patterns, he reasoned, is fundamentally different from the authors’ purpose in writing books for entertainment or education. Fair use, granted. But Alsup drew a sharp line at how Anthropic obtained the books. Downloading millions of pirated copies from shadow librariesOnline repositories that illegally distribute copyrighted books and academic papers, bypassing traditional publishing systems. like Library Genesis was not fair use, regardless of the intended training purpose. That distinction drove the $1.5 billion settlement, covering approximately 482,460 pirated works at roughly $3,000 per book[s].
Two days later, in Kadrey v. Meta, Judge Vince Chhabria reached a similar conclusion on training but disagreed on the piracy question. He refused to treat the shadow-library downloads as a separate act, analyzing Meta’s copying and training as a single instance of reproduction under fair use[s]. Yet Chhabria’s opinion came with an important warning: the fourth fair use factor, market harm, could swing against AI companies in future cases as licensing markets mature. The ruling was a win for Meta, but a narrow one.
Meanwhile, in Thomson Reuters v. ROSS Intelligence, Judge Stephanos Bibas had already granted summary judgment to Thomson Reuters in February 2025, finding that ROSS’s use of Westlaw headnotesA summary of legal points written by court reporters that precedes a published court opinion, but is not part of the official judicial ruling. to train a competing legal research tool was not fair use as a matter of law[s]. The critical difference: ROSS built a tool that directly competed with the product it copied from. When AI training produces a market substitute rather than something genuinely new, the fair use defense collapses.
Three rulings. Training is transformative, except when the output competes directly. Piracy is a separate offense, except when it is not. Market harm barely matters today, except it might matter enormously tomorrow. This is not a legal framework. It is three judges improvising with a statute written before anyone imagined a machine that could read every book ever published in an afternoon.
The U.S. Copyright Office Weighs In
The U.S. Copyright Office released Part 3 of its comprehensive AI policy report on May 9, 2025, the most detailed articulation yet of how AI copyright law applies to generative model training[s]. The Office did not mince words. Compiling a training dataset using copyrighted works “clearly implicates the right of reproduction.” Fair use cannot be assessed by looking at training alone; courts must evaluate what those trained models actually produce in the field[s].
The Office was particularly blunt about market dilution. AI systems generate content at a speed and scale that pose “a serious risk of diluting markets for works of the same kind as in their training dataThe collection of information used to teach an AI system how to perform tasks, forming the foundation of the system's knowledge and capabilities..” Thousands of AI-generated romance novels flooding the market means fewer sales for the human authors whose work taught the model to write romance in the first place. If a licensing market exists, or could plausibly develop, for training on copyrighted works, then bypassing it cuts directly against a fair use defense.
The report also took a hard line on retrieval-augmented generation (RAG) systems, which pull real-time content from the web during output generation. The Copyright Office treats RAG as categorically different from conventional training: both the initial unauthorized reproduction and the later output of that material are potential infringements. This matters for companies like Perplexity AI, which now faces multiple lawsuits from news publishers including the New York Times and the Chicago Tribune[s].
Europe Chooses Transparency, America Chooses Litigation
While U.S. AI copyright law develops through expensive, contradictory case law, the European Union opted for regulation. Starting August 2, 2025, every provider of a general-purpose AI model must implement a copyright policy and publish a “sufficiently detailed summary” of its training data using a mandatory template issued by the European AI Office[s]. From 2026, developers must check whether a data source carries a copyright reservation and exclude or license that content before using it[s].
The enforcement mechanism has teeth. The AI Office can fine non-compliant providers up to 3% of annual global turnover or EUR 15 million, whichever is higher, with enforcement powers fully operational by August 2026[s]. The EU’s approach represents a philosophical choice: creators have the right to opt out, companies must prove compliance before training, and transparency is the default. It is the inverse of the American model, where companies train first and argue fair use later.
The gap between these two systems creates a legal patchwork that global AI companies must navigate simultaneously. A training pipeline acceptable in New York may be illegal in Brussels. Content scraped lawfully in one jurisdiction becomes a liability in another.
Japan and the Outlier Model
Japan stands at the opposite end of the spectrum. Article 30-4 of its Copyright Act, amended in 2018, allows copyrighted works to be used for machine learning without prior authorization, provided the use does not involve enjoying the expressive content itself and does not “unreasonably prejudice the legitimate interests” of copyright owners[s]. Singapore adopted a similar framework. These permissive regimes make Tokyo and Singapore attractive for AI development precisely because they eliminate the AI copyright law uncertainty that plagues the U.S. and EU.
Japan’s permissiveness is not unlimited. Under pressure from the domestic manga and anime industries, the country’s IPIntellectual property in the film industry, referring to existing stories, characters, or brands used as the basis for movies rather than original content. Strategic Program has added caveats: if a model is predominantly trained on a specific artist’s style to create a direct market substitute, the copyright exception may not apply, exposing developers to infringement liability[s]. It is a targeted exception, but it signals that even the most innovation-friendly jurisdictions recognize that AI copyright law must eventually account for the economic displacement of working creators.
The Counterargument: Innovation Cannot Wait for Licensing
The strongest argument for broad AI training rights is practical: modern language models require billions of tokensThe basic units of text that AI language models process and count, typically representing words, parts of words, or punctuation marks. to achieve competence. Negotiating individual licenses for every copyrighted work in a training corpusA large collection of text, images, or other data used to train artificial intelligence models through machine learning algorithms. would be impossibly slow and prohibitively expensive. The companies that built the current generation of AI systems did so because they could access the entire open web. Restrict that access, and you do not get better-compensated authors; you get fewer, worse AI models concentrated in the hands of companies rich enough to afford licensing deals.
There is genuine force to this argument. The music industry’s early response to digital piracy, suing individual downloaders while resisting streaming, delayed legitimate innovation by a decade. A similar overreaction to AI copyright law enforcement could push development to jurisdictions with weaker protections, enriching nobody.
But the analogy has limits. Spotify pays artists (however poorly). The AI training pipeline, as it existed through 2024, paid nobody. The Anthropic settlement did not emerge from a licensing negotiation; it emerged from a company downloading half a million pirated books and getting caught. The settlements and licensing deals that followed in 2025, including Universal Music Group’s agreement with Udio and Warner Music Group’s deal with Suno[s], happened only after litigation made the cost of not licensing higher than the cost of paying. Copyright enforcement, in other words, is what created the licensing market that innovation advocates now say should replace copyright enforcement.
What Has to Change
The current AI copyright law framework is not merely outdated; it is incoherent. Fair use, a doctrine designed for cases involving a handful of works and identifiable transformative purposes, cannot scale to a technology that ingests the entire written output of human civilization to produce a general-purpose prediction engine. Four observations about what must follow:
First, compulsory licensingA government's authority to allow production of patented products without the patent holder's consent, typically during health emergencies. is coming. The alternative, asking every judge in every jurisdiction to reinvent fair use doctrine from first principles for every new model architecture, is producing contradictions faster than appellate courts can resolve them. A statutory license for AI training, with mandatory compensation and transparent reporting, would give companies legal certainty and creators guaranteed income.
Second, transparency requirements will spread. The EU’s mandatory training data disclosure template is the template that other jurisdictions will copy. The 20 million ChatGPT logs that a federal judge ordered OpenAI to hand over in January 2026[s] demonstrate that courts are willing to look inside the black box. Companies that cannot account for what they trained on will find themselves in the same position as Anthropic: writing very large checks.
Third, the human authorship requirement is settled but insufficient. The U.S. Supreme Court’s March 2026 denial of certiorariA writ by which a higher court reviews a lower court's decision, granted at the higher court's discretion. in Thaler v. Perlmutter confirmed that AI-generated works require human authorship to be copyrightable[s]. But that ruling says nothing about the millions of works that are partially AI-assisted, which is where the real boundary disputes will occur.
Fourth, and most critically, the global fragmentation of AI copyright law is itself a policy failure. A television production team using AI-generated assets might find materials acceptable in Japan but infringing in the EU[s]. Without international coordination, the default outcome is a race to the bottom, where AI training migrates to the jurisdiction with the weakest protections, and creators everywhere lose.
The intellectual property commons is not coming back. The question is whether what replaces it will be a coherent, enforceable system that compensates the people whose work makes AI possible, or an incoherent patchwork where the only winners are the companies that can afford the best lawyers.



