Access to diverse kinds of materials is vital for building and fine-tuning Large Language Models (LLMs). These materials could include those that are available in the public domain (for example, works where the copyright has expired or works wherein copyright was relinquished by the authors) as well as those under copyright protection. Apart from gathering data through sources such as Common Crawl, AI firms often scan copies of books and other materials, and convert them into machine readable text from which data can be extracted for training purposes.
Whether the use of copyrighted materials for training purposes, without permission from the copyright holders, constitutes copyright infringement is a challenging legal question. Many litigations are happening across the globe around this issue.
One of the key factors that could determine the outcome in these litigations is how the courts view whether the concerned activities fall within the scope of any of the exceptions to infringement under the relevant copyright laws. For litigations in the U.S., this means one of the primary determinants in the outcomes would be the application of the ‘fair use’ doctrine under U.S. copyright law. Two trial courts in the U.S. have recently delivered summary judgments on fair use, and they may be considered as the beginning of the adjudications on this complex issue.
Factors considered
The U.S. courts generally take into consideration four factors while assessing whether a use constitutes as ‘fair use’. They are — (i) purpose and character of the use, and the enquiry in this regard includes the extent to which the use can be considered ‘transformative’; (ii) nature of concerned copyrighted materials (there is a higher likelihood of the fair use clause being applicable when it is used for works that are factual in character as compared to works of fiction or fantasy); (iii) amount of the portion taken, and this includes both qualitative and quantitative analyses; and (iv) the effect of the use on the potential market of the plaintiff’s works or value of the plaintiff’s works. The questions of transformative use and the impact on the potential market/value of the plaintiffs works have historically played critical roles in determining the final outcomes in a fair use litigation.
The Anthropic case
Anthropic trained the LLMs underlying Claude, one of their popular GenAI agents, using books and other texts from a library compiled by them. The library consisted of works obtained from different sources, including books purchased and converted to digital form as well as books acquired from potentially illegal sources. The copyright infringement action was initiated by the plaintiffs as their works were used for training without any authorisation from them.
Based on the application of the four above factors to the specific facts of the case, specifically the highly transformative nature of the use of copyrighted materials, the court, in Andrea Bartz et al. versus Anthropic PBC, granted summary judgment in favour of Anthropic on the question of whether the training of the AI was fair use. The court was of the view that the print-to-digital format conversion of the books purchased by them constituted fair use. However, it denied the request of Anthropic that downloading and storing of the copies sourced from illegal sources must be treated as fair use. It remains to be seen how the infringement analysis and remedies would be handed down by the court with regard to those activities.
The Meta judgment
In Richard Kadrey et al. versus Meta Platforms, Inc., 13 authors had sued Meta for downloading books from illegal sources and using them for training Llama, the LLM of Meta. Based on the specific facts and the specific averments made by the parties with regard to the four fair use factors, the court granted a summary judgment in favour of Meta.
The court was of the view that use of the works for training purposes was highly transformative in character and in such instances the plaintiffs will have to bring in substantial evidence with regard to the fourth factor (whether such use has affected the plaintiff’s works market value) to avoid a summary judgment against them. But as the plaintiffs in the instant case couldn’t produce any meaningful evidence, the summary judgment was in favour of Meta with regard to the copying and use of the plaintiffs’ books as training data. However, the court will be continuing the proceedings against Meta with respect to the argument of the plaintiffs that Meta also unlawfully distributed their works during the torrenting process.
Comparative analysis
One of the common dimensions of both the summary judgments is the recognition of the highly transformative character of the use of copyrighted works in training LLMs. This substantially influenced fair use analysis in both cases. There is an alignment on the third factor also, as both courts considered the extent of materials used reasonable in the broader context of training.
But on the fourth factor, one can see substantial differences. Judge Chahabria, who authored the Meta summary judgment, rejected the argument of the plaintiffs that Meta harmed the potential licensing market of the plaintiffs, primarily on the ground that it is not a market that the plaintiff is legally entitled to monopolise. However, he also observed that in many cases, AI training on copyrighted materials may become illegal due to “market dilution”. According to him, the rapid generation of countless works that compete with the originals, even if those works aren’t themselves infringing, can result in market dilution through indirect substitution. But the inability of the plaintiffs in the case to produce sufficient empirical evidence in this regard illustrates the difficulty in proving this kind of harm.
On the other hand, Judge Alsup, who authored the judgment in the Anthropic case, categorically rejected the market dilution argument and observed that the “[a]uthors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.”
It is also worth highlighting here that the Judge in the Anthropic case considered downloading or building a permanent library of infringing works as a different use that warrants separate analysis and a different outcome. But the Meta summary judgement didn’t take that approach and focused just on the ultimate purpose, that is, the training of models.
Other AI cases
Earlier this year, in Thomson Reuters versus Ross Intelligence, the court had reached the conclusion that the fair use exception was not applicable. However, this was not a GenAI case. The AI in question merely retrieved and shared judicial opinions based on queries from users. As this was not considered a transformative use by the court, and as the AI in question competed directly with the works of the plaintiff, the court concluded that the use of those materials without permission was not fair use.
Broader implications
Both the summary judgments in the Anthropic and Meta cases recognise the highly transformative character of use of materials in the GenAI training context, thereby favouring a finding of fair use with respect to the use of copyrighted materials for training purposes. But both judgments also reflect many of the anxieties of copyright holders. Whether the sourcing of materials from potentially illegal sources can negate the claims of fair use is an issue where scholarly opinion is divided and more discussions are warranted.
It is also evident that the kind of evidence copyright holders will bring in to illustrate the negative impact on their market will play a prominent role in determining the final outcome in many cases. This also implies that copyright infringement related issues are far from settled in the AI training area, and depending on the specific facts and evidences in each of these cases, the outcomes can be very different.
Arul George Scaria is a professor at the National Law School of India University (NLSIU)
Published – July 29, 2025 08:30 am IST