In an op-ed that appeared earlier this month in the MIT Technology Review, Shayne Longpre, a PhD candidate there and head of the Data Provenance Initiative, raised an alarm about an emerging threat to the functioning of the internet he dubbed the AI Crawler Wars. Crawlers are the automated agents used by search engines to visit websites behind the scenes for indexing purposes. They’re also used by websites that aggregate information from multiple sites, online retailers to set prices, academics conducting research and many others. These days they’re ubiquitous on the web and currently account for roughly half of all web traffic.
Over the past few years, the volume of crawler — or bot — traffic on the web has increased dramatically with the rise of generative AI models and their incessant scraping of websites for training data. Over the past 18 months, that growth has accelerated still further with the development of retrieval augmented generation (RAG) engines, which use crawlers to collect fresh information from online sources to generate up-to-date responses to user search queries.
This week, Longpre’s metaphor erupted into a live-fire exercise. Online education platform Chegg, sued Google in federal district court in Washington, DC, over Google’s scraping of content from Chegg to feed its AI Overviews RAG engine.
Chegg is hardly the first or only publishers to sue over the unlicensed use of its content by RAG engines. My previous post reported on the lawsuit filed last week by 14 U.S. and Canadian publishers against the Canadian AI company Cohere, and in October 2024 Dow Jones sued RAG-pioneer Perplexity.
Unlike those other cases, however, Chegg’s lawsuit eschews any allegation of copyright infringement. Instead, its complaint charges Google with violating multiple provisions of the Sherman Antitrust Act, along with the common-law tort of unjust enrichment.
The alleged Sherman Act violations have two main prongs. The first involves Google’s alleged abuse of its adjudicated search monopoly to coerce Chegg into making its content available for use by AI Overviews, or what the lawyers call “reciprocal dealing.” As I discuss in a new VIP+ report ($$) to be released next week, Google uses its regular search crawlers to collect information for Overviews, leaving online publishers with a sort of Hobson’s choice. Block access to their content for use in Overviews and they effectively become invisible in search results. It’s all or nothing.
The second prong involves Google’s alleged monopoly maintenance. In this case, that means leveraging its monopoly in search to unlawfully extend its dominance into an adjacent market.
Just as Chegg is not the first publishers to sue over RAG engines, it’s also not the first to raise competition concerns involving generative AI. Both former Federal Trade Commission chair Lina Khan and former head of the Justice Department’s antitrust division Jonathan Kantar raised the issue in various forums. Many of the 39 copyright infringement lawsuits filed against AI companies have also involved allegations of unfair competition, albeit generally in the context of contesting a claim of fair use.
But as I’ve discussed in posts going back as far as 2023, antitrust law has remained a mostly sheathed sword in the battle over the unlicensed use of copyrighted works to train and operate generative AI systems. Whether Chegg can wield it effectively against Goliath is another story, however.
Chegg is a tiny outfit compared to Alphabet. It reported net revenue of just $617.6 million in 2024, and a net loss of $837 million. It forecasted further decline in Q1 2025. After growing rapidly during the Covid shutdown as students were sent home and schools shifted to online education, it has struggled as the pandemic has faded and students returned to classrooms. The company has now undertaken a strategic review to consider its options, a step CEO Nathan Schultz said would not be necessary but for the significant decline in search-driven web traffic it has sustained since the introduction of Overviews.
Google could simply wait for Chegg to bleed out by dragging out the litigation as long as possible rather than engage directly and risk an adverse outcome in court.
Nor is it clear whether Chegg’s allegations against Google would fly against another RAG provider that has not previously been found to be otherwise operating a relevant monopoly. But that doesn’t mean antitrust law could never be a potent weapon in the battle over AI and human creativity.