It seems this was the week for music to take the top spot on the the AI crack up hit parade. StabilityAI, which is currently undergoing its own crack up featuring the departure of three of its founding researchers followed by the abrupt “resignation” of its CEO, took time out from its internal chaos to release Stable Audio 2.0, the newest version of its AI music generator, featuring enhanced song-creation tools and the ability to generate tracks up to three minutes in length. Several new music AI startups also made their debuts this week, including Sonauto, a Y Combinator graduate whose home page features tracks with “vocals” by Katy Perry, Louis Armstrong, Johnny Cash and Frank Sinatra.
None of those new entrants, however, compares with the jaw-dropping — or deeply unsettling, depending on your perspective — capabilities of the latest version of Suno’s music generator released last month. The Suno version 3 can create songs in virtually any style or genre in a matter of seconds in response to a simple text prompt, complete with persuasively human-sounding vocals and lyrics, the latter actually sourced from ChatGPT via API.
The prompt “solo acoustic Mississippi Delta blues about a sad AI,” for instance, produced the following:
Listen to ‘Soul of the Machine’
When Rolling Stone demoed Suno 3 for working musicians last month their reactions ranged from “Oh, boy,” to “Holy shit.”
Unlike Sonauto, Suno will not generate a track if you use a specific artist’s name or the name of an existing track in your prompt. But Ed Newton-Rex, the former VP of audio at StabilityAI who resigned in November over a disagreement with management concerning the use of copyrighted works in training AI models, was able to elicit near copies of known artists’ tracks from Suno by simple workarounds such of misspelling the artist’s name.
To Newton-Rex, those results are a near-certain sign that Suno was trained on copyrighted music including the original recordings he was able to mimic.
Since leaving Stability, Newton-Rex has launched Fairly Trained, a non-profit that certifies AI models that are trained without any unlicensed copyrighted works.
“I was very aware [while at Stability] that there were two radically different approaches to training. One that involved scraping content and claiming it falls under the fair use exception in the U.S., and the other that doesn’t make that claim and is much more respectful of the owners of the content and the creators of the content,” he tells me. “We want to make that division clear.”
In an open letter released this week and signed by more than 200 artists, the Artists Rights Alliance added their own freak out to the mix.
“This assault on human creativity must be stopped,” the letter exclaimed. “We must protect against the predatory use of AI to steal professional artists’ voices and likenesses, violate creators’ rights, and destroy the music ecosystem.”
We weren’t supposed to be hear yet. Many technology experts until recently thought fully synthetic music on part with the current generation of text and image generators, was still years off. Audio is far more complex than either text or static images. It’s a wave, and occurs over time. Training a model to produce 48khz audio means encoding and processing 48,00 tokens per second, which would require immense computing capacity unless some technical means are devised to reduce the processing load.
But as with everything about generative AI, it’s later than you think.
ICYMI
Custom Made AI
OpenAI this week announced an expansion to its customized model program and enhancements to its fine-tuning API to help enterprises create models tailored to their industry or use case. “We believe that in the future, the vast majority of organizations will develop customized models that are personalized to their industry, business, or use case,” the company said in a blog post announcing the enhancements. “For any organizations that need to more deeply fine-tune their models or imbue new, domain-specific knowledge into the model, our Custom Model programs can help.” Infusing a model with domain-specific knowledge will, of course, mean fine-tuning it with domain-specific training data. That’s likely to mean that some sources of data will be more valuable to training a particular model than others, and more valuable than any given dataset is for training large foundation models. As discussed here in our previous post, the need for specialized datasets will likely bolster the case for licensing and payment to rights owners, at least at the fine-tuning stage.
A Piece of the Action
The movie and television industry is still nursing the hangover from last year’s protracted work stoppages, and it could be on a cusp of another. The studios’ contract with the Hollywood craft unions represented by IATSE expires at the end of July. And as with last year’s contentious negotiations with the writers and actors, the use of AI by the studios is at the center of the contract talks with IATSE members. Also like the writers and actors, the craft guilds want to put some guardrails around AI use, and they want to share in any windfall it brings to the studios. “We want some of the spoils of artificial intelligence,” IATSE president Matthew Loeb said at a rally in Los Angeles last month. This week, IATSE leadership claimed to have “momentum” in their talks with the studios in the wake of the script supervisors on Local 871 reaching favorable settlement. One down and a dozen or more to go.