Two Northern District of California Judges Find Some Copying to Train Generative AI Is Fair Use
On June 23 and 25, two judges in the U.S. District Court for the Northern District of California issued highly anticipated rulings in generative AI copyright infringement cases, granting partial summary judgment for large language model (“LLM”) AI creator defendants. The Court found that the defendants’ copying of plaintiffs’ copyrighted works to train their LLMs constituted fair use. Fair use determinations are highly fact-specific, however, and the unauthorized use of copyrighted materials for training LLMs is not categorically fair use, per se. In addition to analyzing different alleged acts of infringement, the two rulings diverged in their analysis of the fourth fair use factor – the market effect on the copyrighted work.
Takeaways for Practitioners:
- Market harm is emerging as the linchpin factor in the fair use analysis regarding training generative AI. Both cases reject the theory of market harm to the copyrighted works based on a potential market for licensing the works for AI training.
- Market harm to the copyrighted works may be shown by evidence of direct substitution (i.e., the LLM can be used to reproduce meaningful portions of the original works) or indirect substitution (i.e., the LLM can be used to generate competitive works that displace the original works). Indirect substitution at mass scale (i.e. market dilution) is the potentially winning argument against the fair use defense, according to Judge Chhabria.
- Attorneys counseling clients regarding the “use of copyrighted materials for generative AI” should parse out and separately examine the initial acquisition of the materials (access), use in training (input), and AI work product (output).
- Unlawfully obtaining otherwise available copyrighted materials for LLM training may still provide a basis for infringement notwithstanding the subsequent transformative use of such materials.
- Copying protected works to create a repository intended for AI training is not equivalent to using such works to train AI when determining whether the fair use defense applies.
In Bartz v. Anthropic PBC, Judge Alsup Finds Fair Use in Anthropic’s Use of Copyrighted Material for LLM Training, But Not in the Use of Pirated Materials for Creating a Central Library
On June 23, 2025, in Bartz v. Anthropic PBC, Judge Alsup granted summary judgment that Anthropic’s use of copyrighted materials to train its Claude LLM is fair use, but denied summary judgment as to unlawfully obtained copies maintained in a central library for further use.
Anthropic makes the generative AI LLM, Claude. Plaintiffs are several authors of books that Anthropic copied (both from allegedly pirated downloads and legitimate purchases) to build a central research library, which included datasets for training its LLMs. In this case, Anthropic drew Claude’s training materials from its central library allegedly containing over seven million pirated book downloads and millions of lawfully purchased print books that it had manually converted to digital format. While a subset of these materials was used to train LLMs, all of the materials were retained in the repository for any “further use.”
To evaluate whether Anthropic’s use of the copyrighted materials qualifies as fair use, the court applied the familiar four factor test as discussed in our prior AI training fair use blog post, here. Notably, the Court sided with the plaintiffs to evaluate Anthropic’s actions as two distinct uses –(1) training LLMs, and (2) creating a central library.
Using Copyright Materials to Train LLMs:
Judge Alsup found the use of copyrighted materials to train LLMs to constitute fair use. Analyzing the first factor, the purpose and character of the use, Judge Alsup found that copying books to train LLMs to generate new text in response to user input is “spectacularly” transformative. In so finding, he focused on the generative nature of the technology, and also emphasized that Anthropic’s LLM is restricted from reproducing any of the copied works in whole or part.
The second factor, the nature of the copyrighted work, weighed against fair use due to the highly expressive nature of the plaintiffs’ works.
Judge Alsup analyzed the third factor, the amount and substantiality of the portion copied, in relation to the reasonable necessity for the purpose of the copying. This factor favored fair use since training LLMs requires a “monumental” volume of text, and Claude’s output had no “traceable connection” with the copyrighted works. Anthropic’s copying, therefore, was “reasonable and compelling.”
The final factor, effect of the use on the market for or value of the copyrighted work, also favored fair use. Judge Alsup analogized that Anthropic’s use to train its LLMs is “no different than . . . training schoolchildren to write well[.]” Therefore, LLMs do not displace demand for plaintiffs’ works in a way that the Copyright Act prohibits. Notably, Judge Alsup addressed market harm in the context of the LLM’s generative output as a potential competitive substitute. He did not evaluate harm to a potential market for licensing materials to train LLMs, reasoning that “such a market for that use is not one the Copyright Act entitles Authors to exploit.” Thus, on balance, all Anthropic’s alleged copying to train its LLMs qualified for the fair use defense.
Using Copyrighted Materials to Build a Central Research Library:
Anthropic also created a central library of copied books, retained indefinitely, that would function as a research repository “for further use.” The books in the central library included both lawfully acquired hard copy books that Anthropic had manually digitized, as well as allegedly pirated downloads of books.
Judge Alsup held that Anthropic’s conversion of print books to digital files was fair use, analogizing to prior fair use determinations including Google’s scanning of books for searchability and Sony Betamax’s videotape recording of TV shows. Of significance was the fact that the original print copies were legitimately purchased, and Anthropic’s digitized copies are not distributed. Therefore, “one copy entirely replaced the another” and “the format change did not itself usurp the Authors’ rightful entitlements.”
By contrast, Judge Alsup held that allegedly pirated downloads in the central library are not fair use. The court rejected Anthropic’s claim that the central library itself is intended for purposes of training LLMs, because fair use “look[s] past the subjective intent of the user to the object use made[.]” (italics added) Accordingly, “pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another” is not transformative. In notable dicta, Judge Alsup wrote that “piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”
In Kadrey v. Meta Platforms, Judge Chhabria Finds Meta’s Use of Copyrighted Material to Train its LLM is Fair, But Largely Due to Plaintiffs’ Lack of Market Dilution Evidence
Days later, on June 25, 2025, in Kadrey v. Meta Platforms, Judge Chhabria granted summary judgment that Meta’s use of copyrighted materials to train its Llama LLM is fair use on the record before him. He pointedly criticized plaintiffs for the lack of market dilution evidence in the record that forced his conclusion, and noted that the outcome likely could have been different had the record included such evidence.
Meta, like Anthropic, was accused of copyright infringement by several authors whose creative works were used to train Meta’s LLM, Llama. Initially, Meta tried to license training works from major publishers, but eventually decided to download and use copyrighted works in “shadow libraries,” including at least 666 copies of plaintiffs’ books. Also like Anthropic’s Claude, Meta’s Llama cannot currently be used to read or otherwise meaningfully access plaintiffs’ books.
Meta’s Use of Plaintiffs’ Copyrighted Works to Train Its LLM Are Fair Use Based on the Scant Record:
As in Anthropic, the purpose and character of using plaintiffs’ works to train Meta’s LLM is highly transformative under the first fair use factor, and thus fair. Like Judge Alsup, Judge Chhabria downplayed the significance of the second fair use factor, finding it adds little to the overall analysis. Also like Anthropic, this case finds that the third factor supports fair use where copying plaintiffs’ entire works was “reasonably necessary” in relation to the transformative purpose of LLM training.
Judge Chhabria devoted the bulk of the Meta analysis to the fourth factor of potential market harm to the copyrighted works, the “single most important” factor. Like Anthropic’s Claude, Meta’s Llama does not allow users to reproduce any meaningful portion of plaintiffs’ books. Therefore, Judge Chhabria rejected plaintiff’s theory of market harm from direct substitutionary reproduction. Judge Chhabria also rejected theoretical harm to a potential market for licensing plaintiffs’ works for LLM training purposes: “harm from the loss of fees paid to license a work for a transformative purpose is not cognizable.”
Where Judge Alsup and Chhabria’s discussions differ is in evaluation of market harm in the form of dilution – rapid generation of countless competitive (but individually non-infringing) works flooding the market as indirect substitution of plaintiffs’ works. Judge Chhabria rejected Judge Alsup’s analogy of equating LLM training to teaching children to write well because the LLM technology enables the creation of “millions” of secondary potentially substitutionary works “with a miniscule fraction of the time and creativity used to create the original” copyrighted work. This technological aspect also renders market dilution a uniquely relevant consideration to cases involving LLM training. Judge Chhabria went so far as to observe that “it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor—and thus win the fair use question overall—in cases like this.”
Judge Chhabria reluctantly reached the opposite conclusion in this case, however, because plaintiffs only made a “half-hearted” argument on market harm, including no meaningful evidence of market dilution. In contrast, Meta’s own expert testified that the release of the “Llama 3” version of Meta’s LLM had no discernible effect on sales of plaintiffs’ works, at least shortly after its release. Thus, in the absence of meaningful record on the market effect of training an LLM with plaintiffs’ works, the fourth factor, and on balance all factors, weighed in favor of finding Meta’s copying protected as fair use.
The Court expressly cabined the future applicability of the decision, stating: “[T]his ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
Copyright issues in the context of AI remain a rapidly evolving area of technology and law with numerous other copyright infringement cases pending across the country. Moreover, no appellate level courts have yet weighed in, on the fair use defense or otherwise. It remains to be seen whether other courts will reach similar conclusions on the facts before them, or how cases will be resolved on different factual records.