Regulation 10 min read

Copyright Office Part 3: AI Training Fair Use

The Copyright Office concludes some AI training goes beyond fair use.

Copyright Office Part 3: AI Training Is Not Always Fair Use

On May 9, 2025, the U.S. Copyright Office released the most significant government analysis of AI and copyright to date: Part 3 of its report on Copyright and Artificial Intelligence, specifically addressing whether using copyrighted works to train generative AI systems qualifies as fair use.

The Bottom Line

Some uses of copyrighted works for AI training will qualify as fair use, and some will not.

The critical factor: whether the AI is trained to generate "expressive content that competes with" the original copyrighted works.

Key Findings

1. No Blanket Fair Use for AI Training

The Copyright Office explicitly rejected the argument that all AI training is automatically fair use. This was a major blow to AI companies who had argued that training is inherently transformative.

2. The Competition Test

The report established a clear principle: AI developers who use copyrighted works to train models that generate expressive content competing with the original works go beyond fair use.

In plain English: If you train an AI on romance novels and it generates romance novels, that is likely NOT fair use. If you train an AI on medical papers to build a diagnostic tool, that is more likely fair use.

3. The Andy Warhol Factor

The Copyright Office heavily referenced the Supreme Court's 2023 decision in Andy Warhol Foundation v. Goldsmith, which raised the bar for transformative use claims. Under Warhol, even significant transformation may not qualify as fair use if the new work serves the same market purpose as the original.

The Four Factors Analysis

Factor 1: Purpose and Character

  • Favors fair use: Non-commercial research, building non-expressive tools (search engines, medical diagnostics)
  • Against fair use: Commercial AI that generates content in the same category as training data
  • Key question: Does the AI serve the same purpose as the original works?

Factor 2: Nature of the Copyrighted Work

  • Favors fair use: Training on factual works, data, public records
  • Against fair use: Training on highly creative works (fiction, art, music, poetry)
  • Key question: How creative and original are the training materials?

Factor 3: Amount and Substantiality

  • Reality: AI training typically requires copying entire works
  • Nuance: Courts have allowed full copying for transformative purposes (Google Books)
  • Key question: Was copying the entirety necessary for the purpose?

Factor 4: Market Effect (Most Important)

  • Strongly against fair use: When AI outputs substitute for or compete with originals
  • Favors fair use: When AI serves a completely different market
  • Key question: Does the AI reduce demand for the original works?

What This Means in Practice

Likely Fair Use

  • Training AI for scientific research tools
  • Building search and retrieval systems
  • Creating translation or accessibility tools
  • Training AI for non-expressive analysis (sentiment analysis, classification)

Likely NOT Fair Use

  • Training image generators on artists' portfolios
  • Training writing AI on copyrighted books to generate similar books
  • Training music AI on copyrighted songs to generate competing music
  • Training code AI on proprietary codebases

Gray Area

  • Training general-purpose AI (like ChatGPT) on diverse copyrighted content
  • Using copyrighted works for AI that generates content in different formats
  • Training on publicly available but copyrighted web content

Impact on Pending Lawsuits

This report strengthens the position of plaintiffs in several major cases:

  • NYT v. OpenAI: ChatGPT generates news-like content that competes with NYT journalism
  • Getty v. Stability AI: Stable Diffusion generates images competing with Getty's library
  • Authors v. Meta: LLaMA generates text competing with authors' books

Recommendations from the Copyright Office

The report stopped short of recommending new legislation but suggested:

1. Transparency: AI companies should disclose what copyrighted works they use for training

2. Licensing markets: The development of licensing frameworks for AI training data

3. Technical measures: Respect for robots.txt and opt-out mechanisms

4. Case-by-case analysis: Courts should evaluate each situation individually

What Creators Should Do

1. Register your copyrights — Required before filing suit

2. Document your works — Establish clear publication dates and ownership

3. Use opt-out tools — Block AI crawlers, submit formal opt-out requests

4. Monitor AI outputs — Check if AI systems reproduce your work

5. Consider legal action — The legal landscape increasingly favors creators

What AI Companies Should Do

1. Audit training data — Know what copyrighted works you are using

2. Pursue licenses — Proactively license content where possible

3. Implement guardrails — Prevent AI from reproducing training data verbatim

4. Respect opt-outs — Honor robots.txt and creator preferences

5. Prepare for litigation — Budget for potential licensing costs or settlements


This article is for informational purposes only and does not constitute legal advice. Last updated: April 2026

Related Articles

Regulation

The Great American AI Act: What the Obernolte-Trahan Draft Bill Means for Copyright, Innovation, and You

Reps. Jay Obernolte and Lori Trahan have released a 269-page bipartisan draft bill that would create...

Analysis

When Your Character Gets an AI Makeover: The BuzzFeed Cuppy Controversy and What It Means for Creator Rights

BuzzFeed greenlit an AI-generated Cuppy series through Amazon's Project Nara. Original creator Loryn...

News

CNN Sues Perplexity AI: Copyright and Trademark Claims Target AI 'Answer Engine'

CNN filed a 54-page complaint against Perplexity AI on May 28, 2026, alleging copyright and trademar...

Guide

AI Copyright Infringement Penalties in 2026: Fines, Damages & Consequences

What fines and damages can AI companies actually face for copyright infringement in 2026? A deep div...

Guide

Who Owns AI-Generated Code? Copyright, GitHub Copilot & the 2026 Legal Landscape

Can you copyright AI-generated code? What the GitHub Copilot lawsuit, US Copyright Office, and globa...