Rolling in the Deep
Creative workers enjoy a taste a schadenfreude with the release of DeepSeek RI, but danger may lurk beneath the waves
Creative workers and copyright holders could be forgiven their moment of schadenfreude at the expense of OpenAI last week over its complaint that the developers of the Chinese chatbot rival, DeepSeek R1, may have misappropriated ChatGPT’s intellectual property to train their own model. OpenAI, after all, has been accused of misappropriating creators’ IP on a massive scale and turnabout seems only fair play. But as gratifying as it may be for creators to see the AI bogeyman hoist on its own petard, DeepSeek could pose its own threat.
The makers of DeepSeek appear to have used a technique called “distillation” to train R1. At a basic level, it involved feeding thousands of prompts into a large language “teacher” model, allegedly in this case ChatGPT, and then using the generated responses as input to train the “student” model, DeepSeek. In the process, the student is able to derive the model weights of the LLM and incorporate them into the far-smaller student model, resulting in a chatbot with comparable abilities at a fraction of the cost and computing power needed to train the teacher.
Just how small a fraction in the case of DeepSeek is a matter of dispute among tech experts. The $6 million figure its backers released almost certainly understates the full investment. It appears to refer only to the final round of training, excluding previous R&D and pre-training costs. The Chinese hedge fund behind DeepSeek may also have had access to more and more-powerful chips than acknowledged, notwithstanding the restrictions placed on exports of the most advanced chips to China by the Biden Administration.
Whether out of cheek or simple coincidence, DeepSeek R1 was released one day after OpenAI CEO Sam Altman appeared at a high-profile White House press conference to announce a plan to invest $500 billion to build out yet more computing capacity to train future iterations of GPT. Microsoft, Alphabet, Amazon, and Meta, collectively, have announced plans to spend $320 billion in 2025 alone to build out additional computing capacity, most of it to support their AI ambitions. The DeepSeek announcement caused panic in the industry. Investors were already concerned over whether all that capex will produce a meaningful return on anything but a geological time scale, and the possibility that equally capable models could be developed for far less sent a temblor down Wall Street. Big Tech share prices cratered.
Whatever the true costs of training DeepSeek R1, it’s biggest impact — and to rights holders the most concerning — could be on the operational, or “inference,” side of AI models. In addition to being fantastically expensive to train, LLMs like Claude, Gemini or ChatGPT are very expensive to operate because they use a lot of computing capacity and mostly run on massive data centers.
DeepSeek’s developers appear to have made significant strides in increasingly their model’s inference efficiency, greatly reducing the computing requirements, and therefore cost, of responding to a prompt.
DeepSeek also made its model open source, allowing anyone to integrate it with their own platforms and applications. This week, RAG search engine Perplexity.ai announced it is now hosting R1 on its on its own servers to support “deep web research,” free of the censorship restrictions and data collection imposed by the Chinese government.
The ability to run high-performance generative models on ordinary, non-specialized servers or even mobile devices is likely to lead to a proliferation of AI apps and platforms powered by open-source models, greatly accelerating consumer adoption. It also puts the power to create synthetic media and deepfakes with just a few taps in the palm of anyone’s hand, which will greatly increase its production.
Even the process of training lightweight apps like R1 could pose new threats to rights owners. The distillation tactic used by DeepSeek is at least one step removed from the scraping, ingesting and processing of organic copyrighted data used to pre-train models like ChatGPT. That distance could make it harder for rights owners to show direct infringement by the distillers even where its known that such infringement occurred.
It could also greatly complicate the case for licensing, at least with respect to distilled models. If a model is not actually making use of protected content, why would it need to be licensed?
DeepSeek R1’s greatest legacy could prove to be a proliferation of highly capable, unlicensed and unlicensable models running on low-cost, consumer-grade hardware.
Not a good look from a rights holder’s perspective.