On AI & Copyright, No Fair Use, Or Fair Play (Updated)
Mother's Day Massacre in the Madison Building
The hurried Friday release of a pre-publication draft was a tell. The Copyright Office was in the final stages of preparing the third, and keenly awaited, final installment of its report on Copyright & AI, and Register of Copyright Shira Perlmutter had to know trouble was coming after her boss, Librarian of Congress Carla Hayden, was summarily fired by someone at the White House on Thursday. The possibility that the report could have been suppressed before it could be released was suddenly very real.
On Saturday, the trouble came. Perlmutter herself was unceremoniously sacked, in a one-sentence email notifying her that “your position as the Register of Copyrights and Director at the U.S. Copyright Office is terminated effective immediately.”
But the report is now on the record.
The legality of the firings appears suspect. Although the librarian is nominated by the president and confirmed by the Senate, the Library of Congress is not an executive branch agency answerable to the White House. It reports to the leaders of the House and Senate. The Register of Copyright is appointed by the Librarian and is not a presidential nominee, so the president’s authority to sack her is unclear.
I’ll leave it to the lawyers to pass on whether Hayden and/of Perlmutter could have a legal cause of action. But it’s hard to read the timing of their defenestration as unrelated to the substance of the of the third installment, dealing with the politically charged issue of the use of copyrighted works to train generative AI models. It certainly will not please the tech-bros and Muskovites who have President Trump’s ear.
The report, more than two years in the making, concludes that the unlicensed use of copyrighted works to train generative AI models, in many if not most cases, cannot be excused as fair use, contrary to the position taken by the tech-company defendants in the dozens of copyright infringement lawsuits over AI training.
The views of the Copyright Office, by themselves, do not carry the weight of law. But courts hearing copyright cases often look to the office for its expertise in the nuances and intricacies of copyright law and frequently defer to its judgments. If that were to happen in this case, it could topple the AI empires built atop hundreds of billions of dollars of Wall Street and tech company money.
Perlmutter’s firing also comes one day after a court hearing on a motion for summary judgment in one of the most high-profile of those dozens of lawsuits, Kadrey v. Meta, in which the judge expressed concern that AI-generated content could harm the market for human-created works. That’s one of the four — and often most decisive — statutory factors courts must weigh in evaluating claims of fair use, and one on which the Copyright Office put significant emphasis in its report.
“The speed and scale at which AI systems generate content pose a serious risk of diluting markets for works of the same kind as in their training data,” the report said. “That means more competition for sales of an author’s works and more difficulty for audiences in finding them.”
As Dr. Barry Scannell of the Irish law firm William Fry, and a close observer of international copyright trends, put it, “The timing [of the firings] is not subtle.” The report “offered the most detailed and sober government analysis yet of how U.S. copyright law applies to the training of generative AI systems. It acknowledged the complexity of the issue. It accepted that some uses might qualify as fair use. “But it also warned against assuming that all AI training is exempt from copyright law. It emphasi[z]ed the economic harm unlicensed training can cause to creators. And it questioned the tech industry’s mantra that more data always means better AI.
“This careful legal balancing act,” he added, “appears to have crossed a political red line.”
“The steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction”
Politics aside, the report highlighted an issue that has not factored heavily in the lawsuits targeting AI companies over copying during training, but which could disrupt the entire AI supply chain: the preparation of training datasets.
“The steps required to produce a training dataset containing copyrighted works clearly implicate the right of reproduction,” the report said. “Developers make multiple copies of works by downloading them; transferring them across storage mediums; converting them to different formats; and creating modified versions or including them in filtered subsets. In many cases, the first step is downloading data from publicly available locations, but whatever the source, copies are made—often repeatedly.”
While the office acknowledges that the data may be discarded by the model after training, as tech companies often point to as exculpatory, “that does not affect the infringement analysis” with respect to the initial collection, preparing and curation, the report said. “Moreover, public reporting indicates that major developers often maintain training datasets for use in future projects.”
The report further tackles another contentious issue in a way not likely to please tech companies: the question of whether models can “memorize” content they are trained on and reproduce it verbatim as output.
“As discussed in the Technological Background, the extent to which models memorize training examples is disputed,” the report states. “When, however, a specific model can generate verbatim or substantially similar copies of a training example, without that expression being provided externally in the form of a prompt or other input, it must exist in some form in the model’s weights… In such instances, there is a strong argument that copying the model’s weights implicates the right of reproduction for the memorized examples.”
The report also comes at a critical time in the debate over AI and copyright internationally. The British House of Commons last week passed the U.K. Data (Use and Access) Bill, which contains a provision that creates a copyright exemption for text-and-data mining unless a rights owner affirmatively opts-out of allowing their works to be used. The rule is modeled on a similar TDM exemption with-opt-out standard in the European Union’s AI Act, and Prime Minister Keir Starmer’s government had called the measure “essential” to maintaining Britain’s technological competitiveness with the rest of Europe.
The provision has triggered widespread pushback from the creative industries in the U.K., however, including from such luminaries of Sir Paul McCartney and Elton John, and the government has been accused of bowing to pressure from U.S. tech companies and the Trump administration. Opponents managed to attach an amendment to the bill requiring AI companies to comply with U.K. copyright law, effectively eliminating the TDM exemption. But the government managed to strip the amendment from the bill ahead of last week’s vote. Baroness Beeban Kidron, who authored the original amendment has vowed to try to reinstate the measure when the bill comes back to the House of Lords this week, but the government remains opposed.
In a bid to try to mollify critics, Culture Minister Chris Bryant told MPs last week the government would likely revisit the issue in a separate bill expected to emerge from a public consultation on AI and copyright later this year or next.
Across the Channel in Europe, meanwhile, U.S. tech giants, again with the backing of the Trump administration, have poured millions into an effort to weaken the AI Code of Practice being developed by a committee of experts as mandated by the AI Act. Among their targets have been have been provisions concerning compliance with EU copyright law.
The Copyright Office’s analysis cuts mostly in the opposite direction, however. If followed by the courts, it could put the U.S. legal stance on copyright & AI at odds with the views of the White House and the tech magnates it listens to, and with the demands they’ve made of other countries.
Whether that was what triggered the Mother’s Day Massacre at the Copyright Office, or not, the gap between what the U.S. says and what it does on AI and copyright could linger.
(Update) The AP reported Monday that Trump has named Deputy Attorney General Todd Blanche, formerly Trump’s personal lawyer, acting Librarian of Congress. Associate Deputy Attorney General Paul Perkins was named acting Register of Copyright. Both appointments raised serious legal and constitutional questions. Blanche and Perkins are both confirmed officials of the executive branch; the LOC and USCO are legislative branch agencies. “You can’t name an executive branch official to head a legislative branch agency,” Daniel Schuman, executive director of the American Governance Institute and a former official of the Congressional Research Service told Roll Call. “It is unconstitutional, illegal and imprudent.” More to come, no doubt.