Reuters' RAG to Riches Story
Deal with Meta brings publisher closer to usage-based licensing for AI
Last month, Meta CEO Mark Zuckerberg offered the obnoxious but not entirely incorrect observation that publishers “tend to overestimate the value of their specific content in the grand scheme of” generative AI.
Earlier this month, the Reuters newswire erected a digital paywall around its content in the U.S., after introducing it in Canada and ahead of a planned worldwide rollout.
This week, Meta and Reuters announced a “multiyear deal” to use Reuters reporting to provide real-time answers to queries on current events in Meta’s AI chatbot, which can be accessed via Facebook, Instagram, WhatsApp, and Messenger.
The three events are not completely unconnected.
Zuckerberg was referring to the use of copyrighted works to train (or pre-train) generative AI models. In the context of the tens of billions of works ingested in the pre-training stage of a large language model (LLM), the works of any individual rights owner, whether included or not, will have a de minimis impact on the resulting model’s capability.
Thus, from the model maker’s perspective, any price put on that content for use in pre-training will inevitably be arbitrary.
From the publishers’ perspective, on the other hand, leaving aside copyright considerations, content has —- or should have — an absolute economic value independent of the application for which it’s being licensed related to its cost of production and distribution, the brand equity in the publisher’s masthead, and other quantifiable factors (to say nothing of its social or cultural relevance) that should be reflected in the price.
Thus, the inherent conflict.
It’s unclear whether Meta’s deal with Reuters includes the use of the newswire’s archive to train Meta’s AI models. Where Reuters’ content definitely will be used is in Meta’s new AI chatbot, which is based on retrieval augmented generation (RAG).
A RAG chatbot uses search technology to find relevant sources of information in response to a user’s query. Rather than simply providing links to those sources, however, RAG leverages the generative capabilities of an LLM to provide a precis on the topic or a summary of the information found in those sources (which it can do with or without attribution).
News on current events, by definition, will not have been included in an LLM’s pre-training dataset. To generate relevant and factually accurate responses to such queries, then, the chatbot needs access to the latest reporting on them.
Users of Meta’s various platforms frequently turn to them for news on, and to discuss breaking news. The AI bot current available on Facebook and Instagram relies on Google and Microsoft Bing to answer questions on current events. But Meta is keen to reduce its dependence on its two rivals without having to reinvent the traditional search wheel, and it views AI-powered search as its best path to independence.
Meta is also in a RAG race with Google, which is aggressively rolling out its AI-powered search summaries tool, Google Overviews. For Meta, being able to leverage the strengths of a widely trusted breaking-news source like Reuters for its chatbot has obvious value as it seeks to capture greater market share in search.
But only if it doesn’t have to compete with other chatbots availing themselves of the same content for free. That’s where Reuters’ new paywall comes in. By placing its content behind a wall, even a low wall, gives Reuters the predicate and the technical means to limit access to its content by AI bots to paying customers. Partnering with Meta also furthers Reuters efforts to build up the consumer-facing side of its business as its customer base among traditional news publishers dwindle in number.
More critically, structuring the deal around RAG moves Reuters closer to a usage-based licensing model. Focusing on the use of the content in responses to individual search queries, allows both Meta and Reuters can treat each instance effectively like an API call, providing a mutually quantifiable metric for pricing.
It’s an encouraging sign that, at least in some circumstances, the interests of AI developers and publishers can be aligned, providing benefits — and value — to each.
But as with other AI licensing deals signed to date, it leaves open the question of whether the model is scalable beyond a limited pool of publishers.