Searching for Balance in Generative AI
AI-powered search, once a reason for panic by news publishers, could become a source of leverage for them instead
When Google started testing Overviews, news publishers of all types and sizes felt a cold chill. The new feature delivered AI-generated summaries of information collected from the web in response to users’ search queries. If a query concerned anything to do with current events, it was likely the Overview summary would draw heavily from news sites.
While the Overview searches still provided links to relevant websites, the summaries, by design, frequently were sufficient to answer the user’s question without their needing to click threw to any of the links provided. Publishers, many of which had grown dependent on search-driven traffic from Google and had spent decades carefully optimizing their sites and content to appear high up in its search results, feared a catastrophic loss of traffic.
Seeing an opportunity to dethrone Google, OpenAI, Microsoft, Perplexity and others have all piled into providing AI-powered search products, reinforcing publishers’ fears.
But a funny thing seems to be happening on the way to news-site Armageddon. AI-powered search has become a factor helping drive a flurry of licensing deals and partnerships between search providers and publishers.
Consider:
Startup Perplexity, which vows to share ad revenue with publishers once it has some, launched its publisher partnership program in July with Automattic (WordPress), Der Spiegel, Entrepreneur, Fortune, The Texas Tribune and TIME signing on to allow their content to be used in the startup’s search summaries. As part of the multi-year deals, the publishers also get access to Perplexity’s APIs and developer support so they can create their own custom answer engines on their sites.
Last week, Perplexity expanded the program, adding the LA Times, Adweek, Mexico News Daily Blavity, NewsPicks, Minkabu the Infonoid, Gear Patrol, MediaLab, DPReview, World History Encyclopedia as partners, along with the RTL Germany outlets NTV and Stern, and the Spanish-language publishing groups Pris Media and Lee Enterprises
ProRata, a startup out of Idealab Studio that claims to have cracked the nut of being able to measure the contribution of individual input sources to an AI-generated output, this week began beta testing Gist, an AI-powered search engine “based entirely on high-quality licensed content.” Licensors include the publishers of ADWEEK, TIME, The Atlantic, Sky News, Fortune, the Guardian, Foreign Policy, Healthline, Outdoor Life, International Business Times, Slate, TheStreet and more. Like Perplexity, ProRata plans to share advertising revenue with publishers when it surfaces their content in summaries.
Last month, ProRata announced the signing of a letter or intent (LOI) with the Danish Press Publications’ Collective Management Organization (DPCMO) to “explore collaborative opportunities aimed at ensuring fair credit and compensation for the use of Danish media content in the era of generative AI.”
Reddit, with its nearly 20 years of user comments on a breathtaking array of topics organized into more than 100,000 subreddits, has become a darling of LLM developers, both for the depths of its archive and for the colloquial, conversational style of user posts. It announced a data licensing deal with Google just prior to going public in March, followed shortly after by a deal with OpenAI. While the terms of those deals are not public, it’s a fair bet the Reddit API being tapped for Google Overviews and OpenAI’s ChatGPT Search. This week, Reddit launched its own chatbot to offer summaries drawn from its own vast library in response to user search queries, using technology from Google and OpenAI.
The trend-lette is born partly of policy choices by AI developers. "Generative AI cannot thrive on a foundation of stolen or uncredited content—it’s neither sustainable nor just," ProRata CEO Bill Gross said in announcing the launch of Gist. "Content creators are the backbone of the knowledge economy, and they deserve fair recognition and compensation for their contributions.”
But there a practical considerations behind the partnerships as well. AI-powered search — also referred to as retrieval-augmented generation (RAG) — uses a combination of a generative AI engine and traditional search technology. It’s greatest utility, compared with LLMs alone — lies in answering questions on current events.
LLMs are static; they do not continue to learn once they are trained, so their replies to prompts draw on data that may be out of date. By incorporating search technology, RAG models have access to the most current information on current events. News organizations are in the business of gathering and publishing the most current information they can, making them particularly valuable partners for AI-powered search providers.
The symbiosis between the needs of RAG models and news organizations’ continuous production of new material also lends itself to longer term or renewable agreements between the parties. Unlike more static datasets like book publishers’ backlists or photo archives, which need to be accessed only once to train a model, RAG search engines need ongoing access to news publishers’ content.
That’s not to say the advent of AI-powered search engines has or will eliminate all friction between news publishers and AI companies. Perplexity, for instance, relies on OpenAI’s GPT and Anthropic’s Claude model for its generative AI functionality, both of which have been the target of copyright infringement lawsuits by publishers. Perplexity itself has been accused of plagiarizing news stories by Forbes, Bloomberg and CNBC, and has been sued for infringement by News Corp properties, including Dow Jones and the NY Post.
OpenAI has deals with Condè Nast, New Corp., The Atlantic, Vox Media and others, but has been the target of multiple infringement suits.
The ad-revenue sharing business model meant to support many of the partnerships between AI companies and publishers, moreover, remains unproven and speculative. While Perplexity, OpenAI and ProRata may have visions of using AI to topple Google from the search throne, luring ad dollars away from the world’s most dominant search platform will not be easy, even in light of Google’s ongoing antitrust troubles with the Department of Justice. Google itself, or course, also a major player in the AI-powered search game.
But AI-powered search holds the potential, for the first time, to provide publishers a measure of genuine strategic and economic leverage with AI developers beyond merely angling for redress from the courts or policymakers.