NEWS

Debating Fair Use in Generative AI Training Data Copyright Issues: An Analysis of Kadrey et al. v. Meta Platforms, Inc.

IPRdaily
Nov 4 , 2025

The rapid iteration of generative artificial intelligence has triggered a profound legal challenge worldwide. To train powerful models, developers have used massive quantities of copyrighted works as training data. This practice places the practical needs of technological innovation in tension with the basic principles of copyright law, which are designed to incentivize creative expression—forming a core conflict between the technology sector and the creative-content community.


On June 25, 2025, the U.S. District Court issued its ruling in Kadrey et al. v. Meta Platforms, Inc. In this case, the court held that Meta’s use of the plaintiffs’ works to train its large language model constituted fair use. Although the plaintiffs ultimately lost, the ruling did not grant unconditional approval for the use of copyrighted works in AI training. Instead, through extensive doctrinal reasoning, the court provided highly valuable insights for future judicial practice and copyright governance.


Basic Facts


The plaintiffs were a group of fourteen prominent authors, including Richard Kadrey. The defendant, the technology giant Meta, developed the large language model known as LLaMA.


The plaintiffs alleged that Meta, without permission, attribution, or compensation, copied their books at scale and used them as training corpora for LLaMA, amounting to direct and vicarious copyright infringement. They argued that Meta’s training datasets included works obtained from “shadow libraries” such as Books3, Bibliotik, and LibGen—platforms widely known as repositories of pirated content. The plaintiffs contended that this “bad-faith” method of data acquisition reflected the unlawfulness of Meta’s conduct.


The court first dismissed the plaintiffs’ assertion that the LLaMA model itself constituted an infringing derivative work, reasoning that the model was neither a transformation nor an adaptation of the plaintiffs’ books. The claim of vicarious infringement was also dismissed because the plaintiffs failed to show that LLaMA generated any specific content that was “substantially similar” to their works. These rulings substantially narrowed the scope of the dispute. The remaining issue was whether Meta’s act of copying books to train LLaMA constituted direct copyright infringement. Meta raised the fair use doctrine as its defense.


Issues in Dispute


The court conducted a systematic review of the four statutory fair-use factors under Section 107 of the U.S. Copyright Act. This analysis formed the backbone of the entire decision.


1. Purpose and character of the use.

The court found that the plaintiffs’ and defendant’s uses served fundamentally different purposes. Plaintiffs’ books were written to be read for entertainment or knowledge, whereas Meta copied them to train a functional software tool capable of performing translation, summarization, “creative ideation” assistance, and other tasks. LLaMA did not merely “repackage” the original texts—it learned statistical patterns to create something new in purpose and function. This, the court held, is the essence of transformative use.

On the contentious issue of “bad-faith” acquisition of pirated data, the court held that even if Meta knowingly used pirated books, such bad faith did not alter the transformative nature of the subsequent training use. Fair use aims to allow the emergence of new expressive outputs that do not serve as market substitutes. Whether the underlying material is lawfully sourced does not change the fundamental nature of the later use.


2. Nature of the copyrighted works.

The court held that the works at issue—novels, memoirs, and similar texts—are highly creative and expressive, falling squarely within the core of copyright protection. The court rejected Meta’s argument that it used only the “functional elements” of the texts, observing that the expressive relationships among words are themselves “products of creative authorship.” Nonetheless, the court noted that this factor rarely plays a decisive role in modern fair-use analysis.


3. Amount and substantiality of the portion used.

The court held that copying entire works was reasonable and necessary for effective model training; using only excerpts would significantly impair performance. Since copying entire works was required to achieve a legitimate transformative purpose, the copying was reasonable in both quantity and substance.


4. Effect of the use upon the potential market.

The court rejected two central market-harm theories advanced by the plaintiffs:

 

Output-substitution theory: The claim that LLaMA’s outputs could substitute for the original works. Evidence showed that—even under “adversarial prompting”—the model did not “regurgitate” meaningful excerpts, and thus could not threaten the original works’ markets.


Lost licensing-market theory: The claim that Meta harmed a potential licensing market for AI-training use. The court found this circular: copyright holders cannot create a legally cognizable market merely by asserting a right to license an otherwise transformative fair use.

 

Holding


The court ultimately held that Meta’s use of plaintiffs’ books for model training, given the specific arguments and evidence in this case, constituted fair use. After weighing the four factors, the court found that Factors 1, 3, and 4 favored the defendant, while only Factor 2 favored the plaintiffs. The plaintiffs’ failure on the pivotal fourth factor directly led to their loss.


The court emphasized in the opinion that “this decision does not hold that Meta’s use of copyrighted materials to train its language models is lawful,” but only that “these specific plaintiffs advanced the wrong arguments and failed to build an evidentiary record to support the right ones.”


Notably, the court discussed—at length—a more persuasive theory of harm that the plaintiffs failed to raise effectively: market dilution. This theory posits that even if a model’s outputs are not infringing copies, the model itself makes it possible to “rapidly generate innumerable works that compete with the originals.” By flooding the market with works of similar type or style, AI products could “destroy” the market for human-authored books—particularly for lesser-known authors.


The court stated that this was a “far more promising” argument and suggested that, in many cases, demonstrating this type of harm could give plaintiffs “a decisive victory on the fourth factor—and thus on the entire fair-use issue.” In this case, however, the plaintiffs failed to build an evidentiary record supporting the market-dilution theory or to rebut Meta’s claim of no market harm, and therefore lost.


Reflections and Implications


1. Fair use faces new challenges.

This case highlights the difficulty of applying the flexible fair-use doctrine to the challenges posed by AI technologies. Simply characterizing AI training as “transformative” and relying on that alone as a fair-use defense may struggle to gain broad traction in future cases. As the opinion notes, it is “hard to imagine” that using copyrighted books to develop a tool capable of generating trillions in economic value—and potentially producing limitless, competing works that could heavily erode the originals’ markets—could always be deemed fair use. Courts are likely to require more detailed empirical and technical evidence showing that training does not materially harm the market value of the original works, or that the training process is fundamentally distinct from traditional forms of infringement.


2. Technical evidence will play a greater role in litigation.

The court not only ruled on the legality of using literary works in AI training but also clarified the evidentiary burden on copyright holders in future cases. The decision sends a strong signal: such complex cases cannot be won through doctrinal argument alone. They require deep, fact-based evidence—particularly sophisticated economic modeling of market impact.


3. AI developers should proactively assume social responsibility.

AI developers should use lawful and compliant data sources wherever possible and explore business models that create win-win dynamics with copyright holders, rather than relying primarily on fair use to obtain legal immunity. Governments should encourage industry self-regulation, establish technological ethics guidelines, and promote the development of laws and regulations—forming a governance framework integrating technology, industry, and law.