news icon

A group of prominent intellectual property law professors has weighed in on the high-stakes AI copyright battle between several authors and Meta. In an amicus brief, the scholars argue that using copyrighted content as training data can be considered fair use under U.S. copyright law, if the goal is to create a new and 'transformative' tool. This suggests that fair use could potentially apply to Meta's training process, even if the underlying data was obtained without permission.

In the race to build the most capable LLMs, several tech companies have sourced copyrighted content for use as training data, without obtaining permission from content owners.

Many of those companies are now being sued for alleged copyright infringement. The list includes Meta, which faces a class action lawsuit filed by authors Richard Kadrey, Sarah Silverman, and Christopher Golden, among others.

This case has a clear piracy angle, as Meta used BitTorrent to download archives of pirated books to use as training material. Notably, the authors argue that, in addition to copying pirated books from Anna’s Archive and Z-Library, in the same process Meta also uploaded pirated books to third parties.

Last month, both parties filed motions for summary judgment. Meta’s motion relied heavily on a fair use defense. Meanwhile, the authors argued that the downloading of millions of books cannot be classified as fair use, since the source of the books is clearly copyright infringing.

“The uncontroversial implication is that for fair use to apply, the work that was copied must have been lawfully acquired in the first place,” the authors wrote.

IP Professors Back Meta’s Fair Use Argument

This week, a group of IP Law Professors submitted a “friend of the court” or amicus brief, backing Meta’s fair use defense. The professors, including scholars from Harvard, Emory, Boston University, and Santa Clara University, have different views on the impact of AI but are united in their copyright stance.

The Profs

The brief stresses that Meta’s alleged use of pirated books as training data can be considered fair use. The source of the training data is not determinative, as long as it’s used to create a new and transformative product, they argue.

“The case law, including binding circuit precedent, holds that internal copying, made in the course of creating new knowledge, is a transformative use that is heavily favored by fair use doctrine,” the professors write.

Transformative Use: Piracy or Not

The professors’ argument is centered around the concept of “transformative use.” They note that using books outside their original ‘reading’ purpose to create an AI model, transforms the purpose of the use. This internal copying, they argue, falls into a category courts have consistently recognized as fair use, also known as “non-expressive use”.

The amicus brief cites several cases to back up their line of reasoning. This includes the Perfect 10 v. Amazon lawsuit, where the Ninth Circuit found that it was fair use when Google created thumbnails using images copied from unauthorized “pirate” sites, because the resulting image search tool was transformative.

Pirate thumbnails

The authors cited conflicting cases, but the professors note that cases where fair use was denied typically involved copyright infringement related to personal consumption, rather than use of content to create something new.

The brief distinguishes this case from those cited by the plaintiffs, which involved unauthorized copying for direct consumptive use (e.g., downloading for personal enjoyment). In contrast, Meta’s internal copies were allegedly not perceived by humans but used to build a new tool.

“Fair use, like copyright as a whole, ‘is not a privilege reserved for the well behaved’,” the brief notes. “Fair use doctrine should focus on the consequences of a ruling for knowledge and expression. Other considerations should be left for other legal regimes.”

The professors’ comments appear to relate to Meta’s internal use of the books, as training material for LLMs. It’s worth noting, however, that the authors also accuse Meta of uploading these books to other file-sharers while obtaining their own copies.

The amicus brief doesn’t address this issue directly, but previous back-and-forths in court showed that uploading is likely to be a central point of contention as the case moves forward.

‘Copyright Infringement Is Not the Answer’

The amicus brief is mostly targeted at the potential use of books as training input, which Meta and other companies publicly acknowledged. Whether this is fair use is a key question that this and other courts have to decide.

Other countries, including Japan, have reportedly crafted exceptions in their law to allow tech companies to train LLMs on copyrighted material, without permission.

The U.S. has no such exceptions, but the professors urge the court to consider fair use. As the VCR and other innovations showed, copyright shouldn’t stand in the way of new tools and developing technologies.

“Copyright owners have often predicted that new technologies, from photocopying to home VCRs to the internet, would create disasters for copyright owners and that fair use needed to be shrunk to protect them; instead, new technologies have routinely created
new markets,” they write.

“Whatever the risks of AI—and there may be many—condemning the act of creating large-scale training datasets as copyright infringement is not the answer.”

A copy of the proposed Amicus Curiae brief, which was granted by the court yesterday, is available here (pdf)