AI Platforms’ Use of Copyrighted Materials

Within a one-week period in June 2025, two federal judges in the Northern District of California entered summary judgment rulings on the application of the fair use defense under the Federal Copyright Act in connection with generative AI platforms’ use of copyrighted materials owned by several authors of works who did not give the AI platforms permission to use the works. The rulings in the two cases – Bartz v. Anthropic PBC and Kadrey v. Meta Platforms, Inc. – reached different conclusions about the application of the fair use defense to a claim of copyright infringement by a generative AI company that uses millions of pages of copyrighted materials to train their large language models without the authors’ consent. The decisions are the first of two such rulings in more than three dozen copyright infringement lawsuits pending in U.S. federal district courts across the country.

The headlines have largely cast both rulings as wins for the AI platforms, and to some extent that is accurate. The decisions have resolved some unsettled questions in favor of the AI platforms, at least as far as two federal judges in the Northern District of California are concerned when confronted with the facts of those particular cases. At the same time, however, the rulings clearly indicate there are a number of fact-specific, case-by-case issues that preclude drawing any blanket conclusions about the use of copyrighted content by generative AI platforms without the consent of the authors of those materials.

This blog reviews the major issues addressed in the court’s decision in Bartz v. Anthropic PBC. A second blog that we’ll publish in the near future will discuss the major issues addressed by the court in Kadrey v. Meta Platforms, Inc. There are likely many more such decisions to follow, which will be particularly important for the many AI companies operating in Silicon Valley and across California.

Background

Anthropic PBC is an AI software firm founded by former OpenAI employees in January 2021. Its core offering is an AI software service called Claude. When a user prompts Claude with text, Claude quickly responds with text — mimicking human reading and writing. Claude can do so because Anthropic trained Claude — or rather trained large language models (LLM) underlying various versions of Claude — using books and other texts selected from a central library Anthropic assembled. From the start, Anthropic had many places from which it could have purchased books, but it preferred to steal them to avoid the “legal, practice, business slog,” that doing so would involve, as co-founder and CEO Dario Amodei stated at his deposition. Claude was first released publicly in March 2023. Seven successive versions of Claude have been released since. Users can ask Claude some questions for free. Demanding users and corporate clients pay to use Claude, generating over one billion dollars in annual revenue.

Plaintiffs Andrea Bartz, Charles Graeber and Kirk Wallace Johnson are authors of books that Anthropic copied from pirated and purchased sources. Anthropic assembled these copies into a central library of its own, copied further various sets and subsets of those library copies to include in various data mixes, and used the mixes to train various LLMs. Anthropic kept the library copies in place as a permanent, general-purpose resource even after deciding it would not use certain copies to train LLMs or would never use them again to do so. All of Anthropic’s copying was without plaintiffs’ authorization. Anthropic thereby pirated over seven million copies of books, including copies of at least two works at issue for each of the plaintiff authors. From these sources, Anthropic created a general research library – or “generalized data area” – which served as a way of creating information that would be voluminous and that Anthropic would use for research or otherwise inform its products. And indeed Anthropic used all of this information to train its LLMs.

In sum, the copies of books pirated or purchased and destructively scanned were placed into a central research library or “generalized data area.” Sets or subsets were copied again to create training copies for data mixes. The training copies were successively copied to be cleaned, tokenized and compressed into any given trained LLM. Once trained, an LLM did not generate to the public through Claude any further copies. Finally, once Anthropic decided a copy of a pirated or scanned book in the library would not be used for training at all – or ever again – it still retained that work as a hard resource for other uses or future uses. At least one work from each plaintiff author was present in every phase of Anthropic’s work to train Claude.

Initial Litigation in the Trial Court

In August 2024, three individual authors brought a putative class action complaining that Anthropic had infringed their federal copyrights by pirating copies of their works for its library and by reproducing the works to train its LLMs. The parties consented to an early motion for summary judgment. Anthropic then moved for summary judgment on the sole issue of fair use.

The Copyright Act recognizes a defense to copyright infringement if the alleged copying constitutes a “fair use.” Section 107 of the Act identifies 4 factors to determine whether a given use of a copyrighted work is a fair use:

[T]he fair use of a copyrighted work … for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —

(1) the purpose and character of the use, including whether it is of a commercial nature or is for non-profit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use on the potential market for or value of the copyrighted work.

These factors pre-suppose a “use” has occurred. The threshold question the court decided is whether a copyrighted work had been used in one or more ways, whether any of those uses constituted infringement under the statute, and then evaluated each use against the 4 fair use factors to determine if the defense applied to each use.

Not surprisingly, the parties disagreed as to what uses were at issue in the case. Anthropic argued it copied the authors’ books only to train the LLMs. The plaintiffs argued Anthropic copied the works to build a large central library of content and train the LLMs. They also alleged the print-to-digital format change was infringement not covered by fair use. Plaintiffs did not allege that any LLM outputs that infringed on their works reached users of Claude’s services.

On June 23, 2025, U.S. District Court Judge William Alsup granted partial summary judgment on the plaintiffs’ complaint on the grounds that certain allegedly infringing behavior was protected as fair use. The court found the uses to train the LLMs and the conversion of the purchased works from print to digital constitute fair use under the Act. The court found that use of pirated books was not fair use. How did the court reach these decisions?

Trial Court’s Decision to Grant Partial Summary Judgment Based on Fair Use

The court’s decision addressed how each fair use factor applied to each allegedly infringing behavior, including (1) copies used to train LLMs, (2) purchased hard copies converted to digital copies for the central library, and (3) pirated copies for the central library.

When evaluating the factors of the fair use defense whether a use is transformative, rather than derivative, is an important consideration. A transformative use refers to instances where the use adds something new and alters the original work with new expression, meaning or message. If an AI algorithm’s use of copyrighted material in its inputs or outputs is deemed a transformative use, it is not dispositive of a fair use finding, but it strengthens a fair use claim.

Many copyright holders have argued that use of their works to train AI algorithms is directly infringing the copyright, or at best, infringes by creating unauthorized derivative works of the original work. A derivative work is an exclusive right reserved for the copyright holder, allowing them to create works substantially similar to the original with the addition of new elements or modifications.

In simple terms, a court must consider whether the use only adds new elements to the original work – and thus derivative – or adds an entirely new expression, meaning or message – and thus transformative. This evaluation can be subjective and depends on the specific facts and evidence of each case.

Copies Used to Train LLMs

With respect to the training copies, the court analyzed each of the 4 fair use factors. As to the first factor, the purpose and character the use, the court found that using copyrighted works to train LLMs is quintessentially transformative. It analogized to a human reading books and then using the accumulated knowledge to write new creative works. The court glossed over the distinction between commercial and non-profit educational uses of the copyrighted works. As for the pirated copies used for training, even though the court later noted that piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded, it did not exclude the pirated works from its fair use finding on the first factor. It found this factor weighed in favor of fair use.

With respect to second factor, the nature of the copyrighted work, the court explained this factor recognizes that some works are closer to the core purpose of intended copyright protection than other works, and thus fair use is more difficult to establish when such works are copied. Less protection is due published works than unpublished ones. Less protection is due factual works than works of fiction. Anthropic acknowledged that all of the selected works contained expressive elements and were chosen for those qualities when it included them in the central library and used them to train specific LLMs. The court concluded this factor weighed against finding fair use.

With respect to the third factor, the amount and substantiality of the portion of the work used in relation to the entire copyrighted work, the crux of the factor is whether the amount copied was reasonable in relation to the purpose of copying. In analyzing this issue, the amount copied is first considered against the work itself, and then against the proposed transformative purpose. Plaintiffs argued the copying used in training the LLMs was extremely extensive and not strictly necessary. In response to the first argument, the court found, “the copies that count for this factor are those that would merely serve the same use as the work’s ordinary one.” A conclusion neither clear nor persuasive. The court minimized – if not completely ignored – the fact Anthropic used millions of copyrighted works to train its LLMs without paying for them. In response to the second point, the court acknowledged Anthropic needed billions of words to train any given LLM, and also acknowledged the company could have used other books or no books at all to train its LLMs. But it concluded that because using so many works was reasonably necessary to train the LLMs using any one work for actually training LLMs was about as reasonable as the next. And the resulting works produced by Claude did not infringe on any of plaintiffs’ works. Again, neither clear nor persuasive, but the court found in favor of fair use.

With respect to the fourth factor, the effect of the use on the potential market for or value of the copyrighted work, the court stated this factor weighs against fair use when an infringer makes copies available in a marketplace that replaces the demand for copies the copyright holder makes or could make available in that marketplace. The court found that the copies used to train Anthropic’s LLMs did not (and will not) replace the demand for copies of the plaintiffs’ works. The court also found the copies used to train the LLMs did not result in any exact copies, or even infringing knock-offs, of the plaintiffs’ works being provided to the public. The court concluded the fourth factor favored fair use.

Purchased Copies to Build a Central Library

With respect to the copyrighted works Anthropic purchased, converted from print to digital media, and then used to build a central library, the court first found the company spent millions of dollars to purchase the authors’ copyrighted works, and with each purchase came the right to dispose of each copy as it saw fit. Anthropic was entitled to keep the copies in its central library for all the ordinary uses. There was no evidence the digital copies were shown, shared or sold outside the company. There was no evidence that converting print copies into digital copies constituted creating derivative works, a right reserved for the copyright holder. The court concluded that converting printed copies to digital copies to save space and enable searchability was transformative for this reason alone, and thus found fair use on the first factor.

As to the second factor, the court reasoned the evidence weighed against finding fair use for the purchased and converted copies for the same reasons it weighed against finding fair use for the training copies.

As to the third factor, the court found that for the print copies of books Anthropic purchased it already had the right to keep copies in its central library. The purpose of converting them to digital format was to make storage easier and searching possible. Copying all of the works – by converting them from print to digital – was exactly what was required. No extraneous copying occurred. The hard copies were destroyed after conversion. The court found the third factor favors fair use for converting hard copies to digital of the purchased works.

As to the fourth factor, the court found that changing the format from print to digital replaced digital copies of the works that Anthropic would have otherwise purchased from the plaintiffs, but those losses are not protected by the Act. Nonetheless, the court found the fourth fair use factor neutral for the purchased library copies.

Pirated Copies Used for the Central Library

With respect to the pirated copies used to build a central liability, the court found that before buying books for its central library Anthropic downloaded over 7 million pirated copies of books, paid nothing, and kept these pirated copies in its library even after deciding it would not use them to train Claude, at all or ever again. The authors argued Anthropic should have paid for these pirated library copies, and the court ultimately agreed.

In evaluating the first factor, the court determined that pirating copies of copyrighted works to build a central research library without paying for them, and retaining copies should they prove useful for one thing or another in the future, was its own use, and not a transformative use. It concluded this factor weighed against fair use.

As to the second factor, the court reasoned the evidence weighed against finding fair use of the pirated copies for the same reasons it weighed against finding fair use for training copies.

As to the third factor, the court found that Anthropic had no legal right to hold any of the pirated copies. While the company said it stole the works to train its LLMs – an arguably transformative use – its conduct indicated it was seeking to acquire and retain “all the books in the world” even after deciding it wouldn’t make further copies to train its LLMs, indicating there were other illegal uses contemplated. Under these circumstances, any unauthorized copying would have been too much, and Anthropic copied millions of books in toto. This factor weighed against fair use.

As to the fourth factor, the court found that stolen copies of the plaintiffs’ copyrighted works to build a central library clearly replaced the demand for the authors’ books, copy for copy. It found this factor weighed against fair use.

The court granted summary judgment on the fair use of copies of the authors’ works used to train Anthropic’s LLMs and the purchased hard copies converted to digital format to ease storage and enable searching. It concluded that, “we will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory, and evaluate the issue of willful infringement.

The Court’s Conclusions

First, the court concluded that copies used to train specific LLMs were justified as a fair use. Every factor but the nature of the copyrighted work favors this result. The court found the technology at issue was among the most transformative many will see in their lifetimes. It granted summary judgment on this basis.

Second, the court found that copies used to convert the purchased print library into a digital library were justified as a fair use, particularly because once converted the purchased print copies were destroyed and their digital replacements were not redistributed. It granted summary judgment on this basis too.

Third, the court determined that pirated copies of plaintiffs’ works could not be justified by a fair use. Anthropic’s own evidence showed that copies of pirated (and purchased) works would be retained forever for general purposes even after the company determined they would never be used for training LLMs. A separate justification was required for each use. None was even offered except for Anthropic’s pocketbook and convenience. The court denied summary judgment on this ground.

Fourth, with respect to copies made from central library copies but not used for training, the court denied summary judgment. It found the central library copies were retained even when no longer serving as sources for training copies, hundreds of engineers could access them to make copies for other uses, and they indeed made such copies. Anthropic dodged discovery on these issues, and was denied summary judgment.

Anthropic Settles

On September 5, 2025, Anthropic agreed to pay $1.5 billion to settle claims it used pirated books to train its AI, marking the largest U.S. copyright settlement in history and sending a clear warning to AI companies about the high cost of using copyrighted material without permission.

This is just one ruling among 40 or more pending cases with more cases likely to follow. The application of the law to this stunning and disruptive new technology has plenty of room to continue to develop and likely will not be resolved anytime soon.

About Finkel Law Group

Finkel Law Group, with offices in San Francisco and Oakland, has close to 30 years of experience representing clients in prosecuting and defending copyright infringement cases in federal district courts across California. When you need intelligent, insightful, conscientious and cost-effective legal counsel to assist your company in prosecuting or defending a copyright lawsuit filed in federal court in California please contact us at (415) 252-9600, (510) 344-6601, or info@finkellawgroup.com. Our attorneys are ready to assist you to resolve your case.

Lonnie Finkel

See Full Bio

U.S. District Court for Northern District of California Enters Partial Summary Judgment Finding Using Copyrighted Material to Train AI Platforms is a Fair Use

Background

Initial Litigation in the Trial Court

Trial Court’s Decision to Grant Partial Summary Judgment Based on Fair Use

Copies Used to Train LLMs

Purchased Copies to Build a Central Library

Pirated Copies Used for the Central Library

The Court’s Conclusions

Anthropic Settles

About Finkel Law Group

Contact Information

Background

Initial Litigation in the Trial Court

Trial Court’s Decision to Grant Partial Summary Judgment Based on Fair Use

Copies Used to Train LLMs

Purchased Copies to Build a Central Library

Pirated Copies Used for the Central Library

The Court’s Conclusions

Anthropic Settles

About Finkel Law Group

Connect on Social Media

Contact Information