Creators Focus a Wider Lens on AI and Licensing

Privacy, contractual terms and the role of middlemen come in for scrutiny

Dec 18, 2024

More than three dozen lawsuits have been filed, since the introduction of ChatGPT in the fall of 2022, charging OpenAI and other AI developers with copyright infringement over the unlicensed use of protected works to train generative AI models. And that’s just in the U.S. Cases have also been brought in several European jurisdictions, Canada and Australia.

Yet, as the technology has advanced and become more widely used, it is increasingly clear there are many considerations and potential sources of conflict at stake besides copyright — not just for technology companies but for creators and rights holders as well.

As discussed here in a recent post, the Italian data privacy regulator last month sent notice to the GEDI Group, owner of la Repubblica, la Stampa and other news outlets, warning the publisher that its agreement with OpenAI to provide Italian-language content for training AI models could conflict with the European Union’s General Data Protection Regulation (GDPR). Under GDPR "It is prohibited to process personal data revealing racial or ethnic origin… as well as to process genetic data, biometric data intended to uniquely identify a natural person.” According to the regulator, “processing” data includes using or providing it for use by an AI system, and such data are often rife in news stories.

Closer to home, a group of photographers last week filed a prospective class-action lawsuit in federal court against Photobucket charging the image-hosting service with violating various state privacy laws.

Once a leading cloud-based storage service dating back to the dot-com era, Photobucket has largely fallen out of favor with shutterbugs with the rise of social media platforms. But it managed to compile an archive of 13 billion images before falling off most users’ radar. In a bid to revive its fortunes, Photobucket has sought to license its archive to AI developers for training and issued a change to its terms of service purporting to grant it the rights to do so. The notice gave users 45 days to respond and stipulated that Photobucket would treat non-responses as consent to the new terms. As many users have long-since stopped using the service, however, the ToS change largely went unnoticed.

One anticipated use of the archive by AI companies, according to the complaint, is to extract biometric data from people depicted in the photographs, such as hand and facial geometry, body proportions, retinal images and other individual characteristics, to train image generators. The plaintiffs allege that use runs afoul of various state laws against the collection or commercial exchange of such data without an individual’s consent, including laws on the books in California, Colorado, Illinois, New York and Virginia.

As in the Italian case, the use of the photographs could also implicate the privacy rights of individuals who may be depicted in the images but who are not party to any agreement with Photobucket, let alone any AI companies.

Notably, while the lawsuit does throw in a DMCA 1202(b) claim over the removal of copyright management information from the images, almost as an afterthought, the body of the complaint itself includes barely any discussion of copyright, apart from the purported conveyance to Photobucket of AI training rights to the images. Even there, the complaint focuses on the allegedly coercive nature of the unilateral contractual change rather than on infringement of those rights. Instead, the plaintiffs’ case is framed almost entirely around privacy interests.

A debate over who has the rights, and under what terms, to license the use of creative works for AI also has erupted recently in the book industry. After HarperCollins inked a deal to allow Microsoft to use the publisher’s non-fiction backlist titles to train AI models, the Authors Guild issued a policy statement claiming AI training rights are not covered by standard publishing contracts.

“A trade publishing agreement grants just that: a license to publish. AI training is not publishing, and a publishing contract does not in any way grant that right,” the statement said. “Licensing for AI training is a right entirely unrelated to publishing, and is not a right that can simply be tacked onto a subsidiary-rights clause. It is a right reserved by authors, a right that must be negotiated individually for each publishing contract, and only if the author chooses to license that right at all.”

The Association of American Literary Agents also takes the position that rights not explicitly granted in publishing contracts, including AI rights, are reserved by the author and need to be individually negotiated.

HarperCollins is allowing its authors to decide whether to allow their books to be included in any AI licensing deal, offering a one-time $2,500 payment per title. The Authors Guild expressed appreciation for the Big Five publisher’s opt-in approach, but insisted in its statement that any AI licensing of authors’ works by publishers should be on a revenue-sharing basis, with the revenue split depending on the publisher’s role in the deal.

It wasn’t always so. The Guild was initially supportive of publishers’ AI licensing efforts, proposing a 50-50 split of any revenues. With its latest statement, however, it appears to have done a near-complete about-face on the issue and it recently partnered with startup Created by Humans to develop a marketplace for individual authors to license their works for use in AI.

The seeming turnaround points to an evolution in the creative community’s views on AI and licensing. Once focused almost exclusively on alleged copyright infringement by AI companies — the Authors Guild itself joined one such lawsuit against OpenAI and Microsoft last December — authors and other creators are widening their lenses onto the broader AI marketplace to focus on the terms and conditions by which their works are licensed for use in generative AI systems.

As part of that broader focus, creators also have begun to scrutinize more closely how content aggregators and other middlemen are offering their works to AI developers, adding still another layer of complexity — and potential source of friction — to the still ill-defined business model for licensing the use of copyrighted works to train AI models.

Even where copyright and authorship remain creators’ primary concern, the role of middlemen is not escaping scrutiny, as evidenced by the Writers Guild of America’s recent demand for legal action by the Hollywood studios against AI companies it claims are taking its members’ scripts and using them to “plagiarize stolen works.”

“The studios, as copyright holders of works written by WGA members, have done nothing to stop this theft,” the Guild said in a letter addressed to the heads of the seven major studios including Netflix. “They have allowed tech companies to plunder entire libraries without permission or compensation. The studios’ inaction has harmed WGA members.”

In their headlong rush to build bigger and better generative AI models, fueled by staggering sums of capital pouring in from Wall Street and other investors, AI developers companies often have run roughshod over creative industry norms, accepted practices, and, critics would say, the law. But as media companies and the creative community each look to get a piece of that bounty they’re finding technology companies are not the only hands reaching for the pot.

RightsTech Extra

Discussion about this post