Building a Scale Model for AI Data Licensing
I wouldn’t call it a reproachment exactly, but a kind of tentative peace — or at least ceasefire — has lately broken out between some rights owners and some generative AI developers after nearly two years of legal warfare over the use of copyrighted works to train AI models.
The Variety Intelligence Platform (where I am a contributing analyst) recently compiled a list of 18 licensing deals that have been publicly disclosed between rights owners — primarily news publishers and photo agencies — and AI developers since the beginning of 2023. I’ve since found about a dozen more. There are almost certainly more that have not been publicly disclosed.
Yet, while encouraging to see, those deals represent mere drops in the ocean amid the vast seas of training data being pumped into AI models, still mostly without licenses or compensation to the owners of the data. If licensing were to become the norm for training data, whether by legislation, legal precedent or industry agreement, relying on bespoke direct deals between individual publishers and individual AI companies is unlikely to prove a practical approach.
Source: Variety Intelligence Platform
“We need to be able to license at scale,” says James Smith, co-founder and CEO of the London-based startup Human Native AI. A long-time Google executive, Smith and co-founder Jack Galilee, another Google refugee, formerly of GRAIL, recently set out to build an online platform where AI developers can license archives from multiple publishers to build their training datasets, and publishers can make their data available to AI developers broadly, without needing to approach potential customers individually.
“I think we’re still in the Napster era of generative AI, when its very disruptive and there’s a lot of litigation flying around,” Smith tells me. “We want to build a platform for data for the AI age.”
Human Native officially launched in April, after Smith’s contract with Google expired. The company has raised a seed round and is currently “deploying the capital,” according to Smith, mostly in the form of staffing up. It recently opened a U.S. office in Los Angeles Silicon Valley and is looking to establish a presence in Silicon Valley Los Angeles. A Series A round of financing may not be far off.
“It may happen faster than I expected,” Smith says.
Human Native is not the only startup attracting capital to build the infrastructure to support AI licensing at scale, in fact. Boston-based ToolBit, raised $7 million in March to build out its tools for collecting tolls from web scrapers for access to a publisher’s content. It uses cybersecurity tools to redirect AI agents to ToolBit’s platform where they are offered a chance to pay a toll for access without needing to negotiate and agreement with the publisher in advance.
Newly minted ScalePost attracted headlines last month by facilitating Perplexity’s revenue-sharing deals with a group of leading publishers including Fortune, TIME, Der Spiegel and the Texas Tribune, among others. The startup claims it now has 30 publishers using its platform to monetize their content via AI and boasts a roster of notable advisors, including Rajiv Pant, former CTO at The New York Times and the Wall Street Journal; Adam Cheyer, co-founder of Siri; Gideon Lichfield, former global editorial director at Wired;and Peter Norvig, former engineering director at Google.
Smith reports strong interest in the Native Human platform from both AI companies and rights owners but declines to disclose whether any have yet formally signed on.
“There are a lot of deals being done out there that aren’t getting reported,” he says. “What AI companies are telling me is that price is not the issue. They’re willing to pay. The issue is convenience and simplicity. It makes no sense for them to have deals with a hundred or a thousand different data sources. They need data to be available from a limited number of sources.”
The company is targeting a broad spectrum of archive owners, including news media publishers, music companies, film & television companies, and photo and image agencies.
“All the models are going to be multi-modal at some point so developers need access to a range of different types” of content, Smith explains.
As for publishers, Smith says he is encouraging them to focus their outreach on smaller AI startups rather than the big tech companies.
“If you make a deal with OpenAI, or with Google, what do you do in two years when that deal expires and they don’t need you anymore?,” he says. “I tell them to focus on smaller startups where you’ll have more leverage and an ongoing base of customers.”
Smaller developers are also more open to a variety of business models, according to Smith. “I’ve been encouraged to see companies starting to experiment with revenue sharing, or with performance bonuses, where a developer will pay a publisher a certain amount after hitting various revenue targets,” he says. “We want to be a brokerage, where we’re connecting buyers and sellers.”