Facing a potentially ruinous lawsuit from the New York Times over the unlicensed use of the newspaper’s reporting to train its GPT Large Language Model, OpenAI is putting out the word that it is not opposed to paying publishers for access to their content, as it recently did with Axel Springer.
“We are in the middle of many negotiations and discussions with many publishers. They are active. They are very positive,” Tom Rubin, OpenAI’s chief of intellectual property and content, told Bloomberg News. “You’ve seen deals announced, and there will be more in the future.”
It’s apparently just opposed to paying very much. According to reporting by The Information, OpenAI has been offering publishers between $1 million and $5 million a year to license their copyrighted new articles.
The Times was no doubt looking for much more than token payments. And OpenAI might even have been willing to up its offer a bit for the Times’s prestigious content. But the fact the two sides were unable to close the gap is but another indication that their dispute is as much about price as it is about copyright principles.
Yet it could also be an indication of what might be an unbridgeable gap in how AI companies and publishers (as well as other rights owners) value any particular tranche of content.
Once upon a time, the expensive, labor-intensive process of gathering and reporting the news that the Times and other publishers undertook, was primarily subsidized by high-margin advertising sales. While unfettered independent news reporting plays a vital role in democracy, for a purely business perspective, the reporting that process yielded was valuable primarily for the advertiser-favored demographics of the readership it attracted. Some of those readers paid to subscribe to their favorite newspaper; others purchased single copies from newsstands. For the most part, though, they were paying for the cost of delivery, not for the content per se.
These days, that print-era economic model has evaporated, done in by the triumph of low-margin, programmatic digital advertising and the Google-Meta duopoly on the sell-side of ad placement.
It’s taken more than a decade, but some publishers at least have begun to figure out how to make their content pay more of the freight for its production via digital subscriptions, while the marginal cost of delivery is now effectively zero.
The Times, for instance, had 9.4 million digital-only subscribers worldwide as of the end of Q3 2023, far more than it ever had in its print-only, big-city heyday and far more than the 643,000 print subs it still has.
Rather than merely part of the cost of doing business, in other words, the Times’ content is now literally its bread-and-butter. It’s what its readers are paying for. The idea that anyone should be allowed to simply scrape it all up and mash it into a generative AI soup that can then substitute for the real thing is both ideologically anathema and economically disastrous. And five million bucks a year just ain’t gonna cut it.
What that perspective doesn’t reckon with, however, is the sheer scale of the content a LLM model like GPT needs to be fed to learn how syntax, grammar and vocabulary work. OpenAI’s GPT-4, for instance, ingested something on the order of 13 trillion tokens, each token roughly the equivalent of a word, or part of a word in its training data, comprising something like 45 Gigabytes of data. That’s orders of magnitude more content than the Times has ever produced.
While Times content may be marginally more valuable to OpenAI for training than that of other publishers due to its high quality and broad range of subjects, no single rights owner’s content is so essential that a LLM couldn’t adequately be trained without it. And the idea of paying big bucks for all of it is a non-starter with AI companies.
That fundamental difference in what news content is “worth” to either side will be a challenge to resolve.
ICYMI
Breaking the franchise fever 2023 was the year of “Barbenheimer” on the big screen. Between them, Warner Bros.’ Barbie and Universal’s Oppenheimer grossed just under $2.4 billion worldwide – blockbusters by any measure. But their success has left Hollywood scratching its collective head over what makes a movie a hit anymore. For much of the past 20 years, the major studios have relied on “tent pole” releases to prop up their profits, based on preexisting IP and often the third, fourth of fifth installment of a “franchise.” But that formula no longer seems to be working, and the movies that are don’t really lend themselves to enfranchisement. Barbie was based on existing IP, but it was so original a second act will be a stretch. Oppenheimer was optioned from a book, but it’s lead character is a spent force by the end and the story is hardly the stuff of sequels. Meanwhile, the latest installments of Spider-Man, Indiana Jones and Mission Impossible underperformed expectations and several big-budget comic-book flicks outright bombed. Maybe IP isn’t destiny after all. THR.
Streaming sticker shock For years, the music industry looked enviously at their video streaming counterparts, who seemed able to raise prices at will without losing subscribers while music streamers were stuck on an inflation-eroding $9.99 price point. No more. As major video streaming services raise prices, one-quarter of subscribers to Apple TV+, Discovery+, Hulu, Max, Netflix, Paramount+, Peacock and/or Starz have cancelled at least three of them over the past two years. Music streaming services, on the other hand, from Spotify to Amazon to Apple, all hiked prices last year without seeing the bottom fall out of their subscriber bases. What gives? All music streaming services offer more or less the same content, making it hard to differentiate from competitors. Video streamers meanwhile, could offer exclusive original content. The cost of that exclusivity has proved staggering, however, forcing streamers to try to pass ever-more of it along to subscribers, who seem to have reached their limit. WSJ. The Messenger.
Pods and ends While the podcasting business took a bit of a hit last year, with advertising growth sagging layoffs and programming cuts hitting Spotify, major podcast producers are looking to leverage AI to take their business global over the next few years. The CEOs of Amazon-owned Wondery and iHeartMedia’s Digital Audio Group say they are looking at ways to use AI to translate their pods into different languages to reach audiences beyond their native tongues. “It’s not just meaningful for the creators to be able to access audiences in the language they want, but also to actually build businesses in those territories,” iHeart’s Conal Byrne says. While the technology isn’t yet scalable, Byrne says he expects it to get their by the second half of 2024. THR.