Universal Music Publishing Group, Concord, and ABKCO have initiated a legal action against Anthropic, an artificial intelligence firm, alleging copyright infringement in connection with the usage of the publishers’ songs. As reported by Rolling Stone, the lawsuit against the San Francisco-based startup was filed in a federal court in Tennessee.
The plaintiffs contend that Anthropic’s AI assistant, Claude, a substantial language model akin to OpenAI’s ChatGPT, violated the publishers’ copyrights by using their songs and featuring the lyrics of these songs in training Claude’s responses without a licensing agreement. Anthropic is also accused of removing copyright management information, violating the Copyright Act of 1976. The lawsuit pertains to approximately 500 copyrighted works owned by the plaintiffs, including notable songs like Sam Cooke’s “A Change Is Gonna Come,” the Police’s “Every Breath You Take,” and Beyoncé’s “Halo.” The lawsuit asserts that this infringement is not isolated but rather “systematic and widespread,” holding Anthropic responsible not only for Claude’s actions but also for its users’ infringing activities.
“Publishers bring this action to address the systematic and widespread infringement of their copyrighted song lyrics by the artificial intelligence (‘AI’) company Anthropic,” attorneys said. “In the process of building and operating AI models, Anthropic unlawfully copies and disseminates vast amounts of copyrighted works—including the lyrics to myriad musical compositions owned or controlled by publishers.”
Anthropic was founded in 2021 by former OpenAI employees and secured funding from Google and Zoom. In April 2022, the company attracted $500 million in investment from a group led by Sam Bankman-Fried, the founder of the now-defunct cryptocurrency exchange FTX. Allegations from the Department of Justice suggest that these investments originated from FTX customer funds. Additionally, Anthropic recently announced a substantial investment of “up to $4 billion” from Amazon, granting the tech giant a minority stake in the company.
The crux of the lawsuit revolves around the proprietary and undisclosed nature of LLM (large language model) training datasets. The plaintiffs lack precise knowledge of how Anthropic obtained their copyrighted content, basing their claims on the fact that Claude generates copyrighted material.
“Anthropic claims to be different from other AI businesses. It calls itself an AI ‘safety and research’ company, and it claims that, by training its AI models using a so-called ‘constitution,’ it ensures that those programs are more ‘helpful, honest, and harmless,’” the publishers’ attorneys added. “Yet, despite its purportedly principled approach, Anthropic infringes on copyrights without regard for the law or respect for the creative community whose contributions are the backbone of Anthropic’s infringing service.”
In Anthropic’s documentation for Claude 2, the company stated that “Claude models are trained on a proprietary mix of publicly available information from the Internet, datasets that we license from third-party businesses, and data that our users affirmatively share or that crowd workers provide.” Due to the inherent opacity of LLMs and their training data, the publishers assert that they have suffered substantial and incalculable harm.
The lawsuit presented by the publishers includes instances of Claude producing lyrics for songs like Katy Perry’s “Roar,” Gloria Gaynor’s “I Will Survive,” Garth Brooks’ “Friends in Low Places,” and the Rolling Stones’ “You Can’t Always Get What You Want” upon request.
It also highlights instances where Claude generated lyrics for “new” songs that incorporated existing copyrighted works. For example, when prompted to compose a song about Buddy Holly’s demise, it offered the lyrics from Don McLean’s “American Pie.” When asked to create a song about relocating from Philadelphia to Bel Air, it generated lyrics from the theme song of The Fresh Prince of Bel-Air.
While the U.S. Copyright Office has established guidelines asserting that any AI-generated content is ineligible for copyright protection, it has not yet addressed the legality of using copyrighted material for LLM training. If the court grants the relief sought by the publishers, it could potentially set a precedent that requires companies to exclude copyrighted content from LLM-generated output and training datasets.
The publishers are seeking a permanent injunction against the infringement of their copyrighted works, damages ($75 million), and attorney’s fees. They also requested that the court ask Anthropic to disclose the publishers’ lyrics and other copyrighted content used in its AI models, divulge the methods employed to collect and copy the training data (including any shared with third parties), and, under the court’s supervision, eliminate all infringing copies of copyrighted works in Anthropic’s possession or control.