AI Training and Copyright Infringement: Lessons from the Ross Intelligence Case

Published: November 30, 2023

The Delaware District Court’s Ruling on cross-motions for summary judgment in the case of Thomson Reuters v. Ross Intelligence Inc will provide guidance for similar AI training/copyright infringement cases and, as a bonus, it provides a bit of clarity (or muddies the waters… depending on your point of view) in the application of a post-Warhol fair use defense.

These are the basic facts underlying this lawsuit. Ross was a legal research AI startup. (I say “was” because Ross AI closed down as an operating company in 2020 – Ross says it was due to the Thomson Reuters lawsuit – but its insurance coverage has allowed it to continue to defend the Thomson Reuters lawsuit). Ross hired a subcontractor to create memos with legal questions and answers. The questions were meant to be those “that a lawyer would ask,” and the answers were direct quotations from legal opinions. These memos were used to train Ross’ AI tool. Thomson Reuters, the provider of the Westlaw service, contended that these questions were essentially Westlaw case headnotes. The court found, as a matter of law, that Ross copied portions of the Westlaw headnotes. Ross challenged Thomson Reuters’ copyright in the headnotes and raised a fair use defense.

Before the court analyzed Ross’ fair use defense, the court spent a significant amount of time talking about the scope of Thomson Reuters’ copyright which extends to its headnotes and the arrangement of the headnotes and opinions but not the opinions itself. Ross challenged Thomson Reuters’ copyright in the headnotes, claiming that the Westlaw headnotes are not copyrightable because they “follow or closely mirror the language of judicial opinions. In the Ninth Circuit, this analysis is part of the extrinsic test, which is used in the determination of substantial similarity. After the plaintiff has identified specific criteria that it alleges have been copied, the court separates the unprotectable elements, such as facts or ideas, from those that are protectable and then sorts out whether there are enough similarities between the works as to elements that are protectable such that a reasonable jury could find that the defendant’s work is substantially similar.

With regard to the headnotes, the court observed that if a headnote merely copies a judicial opinion, it is uncopyrightable. However, if the headnote varies more than trivially, then Thomson Reuters would own a valid copyright in the headnote. The court found this question over the originality of the headnote to be a genuine factual dispute for the trier of fact to decide. If the headnotes are mere regurgitation of parts of an opinion, this will severely impact the strength and extent of Thomson Reuters’s copyright.

In addition to challenging Thomson Reuters’ copyright claim, Ross raised a fair use defense. Fair use balances four factors: (1) the purpose and character of the use, (2) the nature of the copyrighted work, (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and (4) the effect of the use upon the potential market for the copyrighted. The first factor assesses whether the use is ‘transformative.'”. As established in the Supreme Court case of Campbell v. Acuff-Rose Music, transformativeness occurs where the new work adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message.” And the Warhol decision now requires courts to ask, as part of the first factor, whether and to what extent the use at issue has a purpose or character different from the original and whether that supports a justification for copying. So now, the first fair use factor will analyze whether the purpose of the use of the second work is different enough from the first to reasonably justify copying. Under the Warhol decision, a transformative use cannot be found for any use that just adds some new expression, meaning, or message.” Now, the purpose of the use must be distinct enough from the purpose of the original to justify copying.

Under Warhol, if the intended use is commercial, this tends to factor against a finding of fair use. The court’s decision in Ross seems to pull away from this. In Warhol, the Supreme Court determined that the use in question was not fair use largely by emphasizing its commercial nature. In Ross, the judge said that he declined to overread one decision, especially because the court recognized that “a use’s transformativeness may outweigh its commercial character” and that in Warhol, “both elements point[ed] in the same direction.” Further supporting the court’s position is the case of Google v Oracle, a technological context much more like this one, and in that case, the court placed much more weight on transformation than commercialism.

In arguing that it’s copying of the headnotes constituted fair use, Ross argued that the copying is part of building a search engine that “avoids human intermediated materials”. Once the plain-language entries are entered into the Ross database, they are converted into numerical data, which is then fed into its machine-learning algorithm to teach artificial intelligence about legal language, which will then allow the AI to recognize patterns, and those patterns can be used to find answers not just to the exact questions the system was trained on, but to all sorts of legal questions users might ask. This argument follows the cases holding that “intermediate copying” constitutes fair use. The idea of intermediate copying is that a user copies material to discover unprotectable information or as a minor step towards developing an entirely new product, with the final output—despite using copied material as an input—being transformative. These cases have been cited favorably, particularly in the context of “adapt[ing] the doctrine of fair use in light of rapid technological change.

Ross said that its AI studied the headnotes and opinion quotes only to find language patterns that would allow Ross to develop a search tool that would produce highly relevant quotations from judicial opinions in response to natural language questions and not replicate Westlaw’s expression. The court said that if Ross’s characterization of its activities is accurate, Ross’s final product would not contain or output infringing material and Ross’ use would be transformative intermediate copying. The court is leaving it to the jury to determine if Ross’ stated intention is actually its intention so the outcome of this case is not yet clear. However, one thing that is made crystal clear is that the intermediate copying cases will have a great impact on all of the other AI training copyright cases.