Summary: US Copyright Office AI Report Part 3 (Generative AI Training)
A plain-English summary of the U.S. Copyright Office's Part 3 report on generative AI training, what it says about fair use, market harm, pirated training data, and why courts still cite it a year on.

Summary: US Copyright Office AI Report Part 3 (Generative AI Training)
On May 9, 2025, the United States Copyright Office released the pre-publication version of Part 3 of its landmark report on Copyright and Artificial Intelligence — the single most anticipated document on AI training and fair use published by any US government agency. One year later, Part 3 remains the analytical backbone that courts, regulators, and AI companies cite when arguing about whether training generative AI on copyrighted works is lawful.
If you only have time to read one summary of the US Copyright Office AI report Part 3, this is it. Below is a plain-English breakdown of what the Office actually said, what it stopped short of saying, and why Part 3 still drives the conversation in May 2026.
Quick background: where Part 3 fits in
The Copyright Office's AI initiative, launched in early 2023, is publishing its findings as a multi-part report:
- Part 1 (July 2024) — Digital replicas (deepfakes, voice clones, unauthorized likenesses).
- Part 2 (January 2025) — Copyrightability of AI outputs. Confirmed the human-authorship requirement: purely AI-generated material is not copyrightable, but works with sufficient human creative contribution can be.
- Part 3 (May 2025, pre-publication) — The use of copyrighted works to train generative AI systems.
Part 3 is the hardest and most consequential piece, because it addresses the question at the center of every major AI copyright lawsuit: Is it fair use to copy billions of copyrighted works to train a model?
The Office released the pre-publication version "in response to congressional inquiries and expressions of interest from stakeholders." The agency has said a final version will follow with no substantive changes to the analysis or conclusions.
The headline answer: "it depends," but with real guardrails
Part 3 does not declare training either legal or illegal. It rejects both extremes: the idea that training is categorically fair use, and the idea that it is categorically infringement. Instead, the Office lays out a structured fair use analysis and reaches conclusions that cut against the most aggressive industry positions.
In short: the Office concluded that training a generative AI model on copyrighted works generally involves acts of reproduction that implicate copyright, and that whether those acts are excused as fair use depends heavily on what the model does, where the data came from, and what the outputs compete with.
How the Office analyzed fair use
Part 3 walks through the four statutory fair use factors from Section 107 of the Copyright Act.
Factor 1 — Purpose and character of the use
The Office recognized that training can be transformative when the model learns patterns, relationships, and statistical features rather than reproducing expressive content. But it pushed back hard on the idea that all training is automatically transformative.
Key takeaways:
- Research and analysis uses sit closer to classic transformative use.
- Commercial deployment of models that generate outputs competing with the training data is less likely to be transformative.
- The purpose at the output layer matters, not just at the ingestion layer. A model marketed to produce text, images, or code in the same market as the originals faces a harder Factor 1 argument.
- Access to the training data matters. Using works obtained through pirated sources or circumvented paywalls weighs heavily against fair use, an especially significant signal for ongoing litigation involving shadow libraries.
Factor 2 — Nature of the copyrighted work
Generative AI systems are typically trained on creative, expressive works — novels, journalism, photography, code, music, film. Because fair use traditionally gives more latitude when the underlying work is factual rather than creative, this factor generally weighs against AI developers who train on expressive corpora.
Factor 3 — Amount and substantiality
Training usually involves copying works in their entirety, often at massive scale. The Office acknowledged that some intermediate copying may be necessary and reasonable in relation to the purpose. But wholesale ingestion of complete works at the scale of modern foundation models is not a small problem under Factor 3.
Factor 4 — Effect on the market
This is the factor where Part 3 arguably broke new ground.
The Office accepted that training can harm rights holders through more than direct substitution. It specifically recognized market dilution and loss of licensing revenue as cognizable harms:
- If a model can produce outputs that substitute for the original works (for example, AI-generated articles, images, or music in the same style and market), that is a classic Factor 4 harm.
- Even without direct substitution, the emergence of licensing markets for training data means the failure to license can itself be a market harm that disfavors fair use.
- Harm to the broader market for works of a type — even if no single output replaces a specific original — can count.
This reasoning pushes back against the "no identifiable copy comes out, therefore no harm" argument that some AI defendants have made in litigation.
What the Office did (and did not) recommend
Part 3 is analytical, not legislative. But it made several important policy observations.
Licensing markets can and should develop
The Office was notably optimistic that voluntary licensing markets for training data can emerge. Early commercial deals between publishers, image libraries, music catalogs, and AI developers were cited as evidence that the market is maturing without new legislation.
Because of this, the Office suggested it is premature for Congress to impose compulsory licensing or statutory licenses for AI training. Collective licensing, extended collective licensing, or a compulsory regime might become appropriate later if markets fail — but the Office did not see that failure yet.
Transparency is important, but Part 3 did not mandate it
Part 3 identified transparency about training data as a useful input to functioning licensing markets. However, unlike the EU AI Act — which now requires providers of general-purpose AI models to publish sufficiently detailed summaries of training content — the US Copyright Office stopped short of recommending a specific federal transparency mandate. The ball was left with Congress and ongoing litigation.
Opt-outs are helpful but not sufficient
The Office addressed technical opt-out mechanisms (for example, robots.txt signals for AI crawlers, metadata flags, and dedicated headers). It described them as useful, but noted that opt-out-by-default does not shift the underlying legal question. Whether a use is fair use does not turn on whether a website owner knew how to block an AI scraper.
Why the release was controversial
The timing of Part 3 created one of the more remarkable stories in modern copyright policy.
The pre-publication draft went out on Friday, May 9, 2025. The following day, reports emerged that the Trump administration had fired Register of Copyrights Shira Perlmutter, who had led the Office's AI initiative. Whether the firing was connected to the contents of Part 3 has been widely debated; what is clear is that the draft's conclusions are noticeably less favorable to unlimited AI training than some in the industry had hoped.
The Office later confirmed that a final Part 3 will be published without substantive changes to its analysis or conclusions. That commitment mattered, because it signaled that the pre-publication version can be treated as the Office's settled view for now.
What Part 3 means for each audience
For AI developers
- Training on copyrighted works is not per se infringement, but it is not per se fair use either. Expect a work-by-work, market-by-market analysis in court.
- The source of training data matters. Using pirated copies weighs heavily against fair use. Using licensed datasets strengthens the Factor 1 argument.
- Model outputs that directly compete in the markets of training works raise serious Factor 4 risk, even without verbatim reproduction.
- Invest in licensing pipelines and documentation. Courts reading Part 3 will expect AI companies to have tried.
For creators and rights holders
- Part 3 validates the legal theory that AI training implicates copyright and that lost licensing opportunities count as market harm.
- It supports the position that works obtained through pirated or unauthorized channels are the weakest ground for fair use defenses.
- It does not, however, give rights holders a clean categorical win. The Office clearly contemplated that some training uses will qualify as fair use.
For businesses deploying AI
- Part 3 raises the stakes for indemnification language in vendor contracts. Ask your AI vendors where their training data came from and what licenses they hold.
- Pair deployment with internal policies on AI-generated output (see our guide on drafting a corporate policy for AI-generated content).
- Expect customer and partner due diligence questions about training-data provenance.
For policymakers
- The Office does not currently favor new compulsory licensing.
- Transparency and opt-out tooling are the areas most likely to see near-term legislative movement.
- International alignment, especially with the EU AI Act's training-data transparency regime, remains an open question. For context on the EU side, see Breaking down the EU AI Act's copyright transparency requirements.
How Part 3 is showing up in 2026 litigation
One year after release, Part 3 is cited routinely in motion practice and at oral argument. A few patterns stand out:
- Plaintiffs lean on Part 3's treatment of market dilution and pirated-source sensitivity (Factor 1 access analysis, Factor 4 harm analysis).
- Defendants cite the Office's rejection of categorical infringement and its acknowledgment that training can be transformative.
- Courts have generally treated Part 3 as persuasive but not binding — useful analytical framing rather than a rulebook.
The NVIDIA shadow-library ruling and the continuing Meta AI training litigation both show courts taking seriously the argument that the source of the training data is material to fair use — a point that tracks Part 3's reasoning closely. For a case-by-case view of where the litigation stands, see the AI copyright lawsuit tracker.
Key takeaways
- Part 3 is the first comprehensive US government analysis of whether training generative AI on copyrighted works is fair use.
- The Office's answer is structured, not absolute: fair use depends on purpose, data sources, outputs, and market effects.
- Market dilution and lost licensing revenue count as Factor 4 harms.
- Pirated training data weighs heavily against fair use.
- The Office favors voluntary licensing markets over new compulsory licensing, for now.
- Part 3 is a pre-publication draft, but the Office has stated the final version will not change its analysis or conclusions.
- Courts, litigants, and regulators continue to use Part 3 as the default analytical framework one year on.
Related reading
- Is AI Training Fair Use? How Global Copyright Laws Are Evolving in 2026
- The Ultimate 2026 AI Copyright Lawsuit Tracker
- AI Copyright Compliance: The 2026 Survival Guide for Businesses
- Breaking Down the EU AI Act Copyright Transparency Requirements
- Primary source: U.S. Copyright Office — Copyright and Artificial Intelligence
This article summarizes the U.S. Copyright Office's pre-publication report and related legal developments as of May 2026. It is not legal advice. For specific questions about your situation, consult a qualified attorney.
Related Articles
When Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights
BuzzFeed greenlit an AI-generated Cuppy series through Amazon's Project Nara. Original creator Loryn...
AnalysisAI Remixes, Colorizations & Copyright: Who Owns a Machine-Altered Masterpiece?
The Ansel Adams Trust's condemnation of an unauthorized AI-colorized print of 'Moonrise' has exposed...
AnalysisAI Copyright Licensing in 2026: How Big Tech-Publisher Deals Are Reshaping the Industry
From OpenAI's Reddit deal to publisher lawsuits against Meta, 2026 marks a turning point in AI copyr...
AnalysisAI Music Copyright Lawsuits: Suno, Udio & the State of Music AI in 2026
RIAA's lawsuits against Suno and Udio reshaped how AI music is built and licensed. Here is where the...
AnalysisSora 2 and Voice Cloning: The Next Wave of AI Copyright Battles
As Sora 2 pushes the boundaries of AI video generation and voice cloning becomes indistinguishable f...