Analysis May 15, 2026 10 min read

Summary: US Copyright Office AI Report Part 3 (Generative AI Training)

A plain-English summary of the U.S. Copyright Office's Part 3 report on generative AI training, what it says about fair use, market harm, pirated training data, and why courts still cite it a year on.

Summary: US Copyright Office AI Report Part 3 (Generative AI Training)

On May 9, 2025, the United States Copyright Office released the pre-publication version of Part 3 of its landmark report on Copyright and Artificial Intelligence — the single most anticipated document on AI training and fair use published by any US government agency. One year later, Part 3 remains the analytical backbone that courts, regulators, and AI companies cite when arguing about whether training generative AI on copyrighted works is lawful.

If you only have time to read one summary of the US Copyright Office AI report Part 3, this is it. Below is a plain-English breakdown of what the Office actually said, what it stopped short of saying, and why Part 3 still drives the conversation in May 2026.

Quick background: where Part 3 fits in

The Copyright Office's AI initiative, launched in early 2023, is publishing its findings as a multi-part report:

Part 1 (July 2024) — Digital replicas (deepfakes, voice clones, unauthorized likenesses).
Part 2 (January 2025) — Copyrightability of AI outputs. Confirmed the human-authorship requirement: purely AI-generated material is not copyrightable, but works with sufficient human creative contribution can be.
Part 3 (May 2025, pre-publication) — The use of copyrighted works to train generative AI systems.

Part 3 is the hardest and most consequential piece, because it addresses the question at the center of every major AI copyright lawsuit: Is it fair use to copy billions of copyrighted works to train a model?

The Office released the pre-publication version "in response to congressional inquiries and expressions of interest from stakeholders." The agency has said a final version will follow with no substantive changes to the analysis or conclusions.

The headline answer: "it depends," but with real guardrails

Part 3 does not declare training either legal or illegal. It rejects both extremes: the idea that training is categorically fair use, and the idea that it is categorically infringement. Instead, the Office lays out a structured fair use analysis and reaches conclusions that cut against the most aggressive industry positions.

In short: the Office concluded that training a generative AI model on copyrighted works generally involves acts of reproduction that implicate copyright, and that whether those acts are excused as fair use depends heavily on what the model does, where the data came from, and what the outputs compete with.

How the Office analyzed fair use

Part 3 walks through the four statutory fair use factors from Section 107 of the Copyright Act.

Factor 1 — Purpose and character of the use

The Office recognized that training can be transformative when the model learns patterns, relationships, and statistical features rather than reproducing expressive content. But it pushed back hard on the idea that all training is automatically transformative.

Key takeaways:

Research and analysis uses sit closer to classic transformative use.
Commercial deployment of models that generate outputs competing with the training data is less likely to be transformative.
The purpose at the output layer matters, not just at the ingestion layer. A model marketed to produce text, images, or code in the same market as the originals faces a harder Factor 1 argument.
Access to the training data matters. Using works obtained through pirated sources or circumvented paywalls weighs heavily against fair use, an especially significant signal for ongoing litigation involving shadow libraries.

Factor 2 — Nature of the copyrighted work

Generative AI systems are typically trained on creative, expressive works — novels, journalism, photography, code, music, film. Because fair use traditionally gives more latitude when the underlying work is factual rather than creative, this factor generally weighs against AI developers who train on expressive corpora.

Factor 3 — Amount and substantiality

Training usually involves copying works in their entirety, often at massive scale. The Office acknowledged that some intermediate copying may be necessary and reasonable in relation to the purpose. But wholesale ingestion of complete works at the scale of modern foundation models is not a small problem under Factor 3.

Factor 4 — Effect on the market

This is the factor where Part 3 arguably broke new ground.

The Office accepted that training can harm rights holders through more than direct substitution. It specifically recognized market dilution and loss of licensing revenue as cognizable harms:

If a model can produce outputs that substitute for the original works (for example, AI-generated articles, images, or music in the same style and market), that is a classic Factor 4 harm.
Even without direct substitution, the emergence of licensing markets for training data means the failure to license can itself be a market harm that disfavors fair use.
Harm to the broader market for works of a type — even if no single output replaces a specific original — can count.

This reasoning pushes back against the "no identifiable copy comes out, therefore no harm" argument that some AI defendants have made in litigation.

What the Office did (and did not) recommend

Part 3 is analytical, not legislative. But it made several important policy observations.

Licensing markets can and should develop

The Office was notably optimistic that voluntary licensing markets for training data can emerge. Early commercial deals between publishers, image libraries, music catalogs, and AI developers were cited as evidence that the market is maturing without new legislation.

Because of this, the Office suggested it is premature for Congress to impose compulsory licensing or statutory licenses for AI training. Collective licensing, extended collective licensing, or a compulsory regime might become appropriate later if markets fail — but the Office did not see that failure yet.

Transparency is important, but Part 3 did not mandate it

Part 3 identified transparency about training data as a useful input to functioning licensing markets. However, unlike the EU AI Act — which now requires providers of general-purpose AI models to publish sufficiently detailed summaries of training content — the US Copyright Office stopped short of recommending a specific federal transparency mandate. The ball was left with Congress and ongoing litigation.

Opt-outs are helpful but not sufficient

The Office addressed technical opt-out mechanisms (for example, robots.txt signals for AI crawlers, metadata flags, and dedicated headers). It described them as useful, but noted that opt-out-by-default does not shift the underlying legal question. Whether a use is fair use does not turn on whether a website owner knew how to block an AI scraper.

Why the release was controversial

The timing of Part 3 created one of the more remarkable stories in modern copyright policy.

The pre-publication draft went out on Friday, May 9, 2025. The following day, reports emerged that the Trump administration had fired Register of Copyrights Shira Perlmutter, who had led the Office's AI initiative. Whether the firing was connected to the contents of Part 3 has been widely debated; what is clear is that the draft's conclusions are noticeably less favorable to unlimited AI training than some in the industry had hoped.

The Office later confirmed that a final Part 3 will be published without substantive changes to its analysis or conclusions. That commitment mattered, because it signaled that the pre-publication version can be treated as the Office's settled view for now.

What Part 3 means for each audience

For AI developers

Training on copyrighted works is not per se infringement, but it is not per se fair use either. Expect a work-by-work, market-by-market analysis in court.
The source of training data matters. Using pirated copies weighs heavily against fair use. Using licensed datasets strengthens the Factor 1 argument.
Model outputs that directly compete in the markets of training works raise serious Factor 4 risk, even without verbatim reproduction.
Invest in licensing pipelines and documentation. Courts reading Part 3 will expect AI companies to have tried.

For creators and rights holders

Part 3 validates the legal theory that AI training implicates copyright and that lost licensing opportunities count as market harm.
It supports the position that works obtained through pirated or unauthorized channels are the weakest ground for fair use defenses.
It does not, however, give rights holders a clean categorical win. The Office clearly contemplated that some training uses will qualify as fair use.

For businesses deploying AI

Part 3 raises the stakes for indemnification language in vendor contracts. Ask your AI vendors where their training data came from and what licenses they hold.
Pair deployment with internal policies on AI-generated output (see our guide on drafting a corporate policy for AI-generated content).
Expect customer and partner due diligence questions about training-data provenance.

For policymakers

The Office does not currently favor new compulsory licensing.
Transparency and opt-out tooling are the areas most likely to see near-term legislative movement.
International alignment, especially with the EU AI Act's training-data transparency regime, remains an open question. For context on the EU side, see Breaking down the EU AI Act's copyright transparency requirements.

How Part 3 is showing up in 2026 litigation

One year after release, Part 3 is cited routinely in motion practice and at oral argument. A few patterns stand out:

Plaintiffs lean on Part 3's treatment of market dilution and pirated-source sensitivity (Factor 1 access analysis, Factor 4 harm analysis).
Defendants cite the Office's rejection of categorical infringement and its acknowledgment that training can be transformative.
Courts have generally treated Part 3 as persuasive but not binding — useful analytical framing rather than a rulebook.

The NVIDIA shadow-library ruling and the continuing Meta AI training litigation both show courts taking seriously the argument that the source of the training data is material to fair use — a point that tracks Part 3's reasoning closely. For a case-by-case view of where the litigation stands, see the AI copyright lawsuit tracker.

Key takeaways

Part 3 is the first comprehensive US government analysis of whether training generative AI on copyrighted works is fair use.
The Office's answer is structured, not absolute: fair use depends on purpose, data sources, outputs, and market effects.
Market dilution and lost licensing revenue count as Factor 4 harms.
Pirated training data weighs heavily against fair use.
The Office favors voluntary licensing markets over new compulsory licensing, for now.
Part 3 is a pre-publication draft, but the Office has stated the final version will not change its analysis or conclusions.
Courts, litigants, and regulators continue to use Part 3 as the default analytical framework one year on.

Summary: US Copyright Office AI Report Part 3 (Generative AI Training)

Summary: US Copyright Office AI Report Part 3 (Generative AI Training)

Quick background: where Part 3 fits in

The headline answer: "it depends," but with real guardrails

How the Office analyzed fair use

Factor 1 — Purpose and character of the use

Factor 2 — Nature of the copyrighted work

Factor 3 — Amount and substantiality

Factor 4 — Effect on the market

What the Office did (and did not) recommend

Licensing markets can and should develop

Transparency is important, but Part 3 did not mandate it

Opt-outs are helpful but not sufficient

Why the release was controversial

What Part 3 means for each audience

For AI developers

For creators and rights holders

For businesses deploying AI

For policymakers

How Part 3 is showing up in 2026 litigation

Key takeaways

Related reading

Also Read

Related Articles

When Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights

AI Remixes, Colorizations & Copyright: Who Owns a Machine-Altered Masterpiece?

AI Copyright Licensing in 2026: How Big Tech-Publisher Deals Are Reshaping the Industry

AI Music Copyright Lawsuits: Suno, Udio & the State of Music AI in 2026

Sora 2 and Voice Cloning: The Next Wave of AI Copyright Battles