A new generative model croons like Elvis and raps like Eminem. It might even make you think you’re listening to a lost demo by the Beatles.
What’s new: OpenAI released Jukebox, a deep learning system that has generated thousands of songs in styles from country to metal and soul. It even mimics the voices of greats like Frank Sinatra.
How it works: Jukebox generates music by drawing from a database of 1.2 million songs. Where some other AI-powered systems use symbolic generators to create tunes, Jukebox uses audio recordings, which capture more of music’s subtleties.
- In working with raw audio, the biggest bottleneck is its sheer size and complexity, the authors write. They used vector quantized variational autoencoders, or VQ-VAEs, to compress the training set to a lower-dimensional space. Then they trained the model to generate audio in this compressed space. Transformers create successively higher-resolution versions of a new song. Finally, a decoder turns that output into audio.
- The researchers paired each song with metadata including its artist, lyrics, and genre. That helps guide the model as it generates made-to-order music in any designated style.
- The model made cohesive music, but it struggled to produce coherent lyrics. To overcome this, researchers added existing lyrics into the conditioning information. It also had a hard time associating chunks of words with musically appropriate passages, so the researchers used open source tools to manually align words with the music windows in which they appear.
- The model requires upward of nine hours of processing to render one minute of audio.
Results: OpenAI released over 7,000 songs composed by Jukebox. Many have poor audio quality and garbled lyrics, but there are more than a few gems. Have a listen — our favorites include the Sinatra-esque “Hot Tub Christmas,” with lyrics co-written by OpenAI engineers and a natural language model, and a country-fied ode to deep learning.
Behind the news: AI engineers have been synthesizing music for some time, but lately the results have been sounding a lot more like human compositions and performances.
- In 2016, Sony’s Flow Machine, trained on 13,000 pieces of sheet music, composed a pop song reminiscent of Revolver-era Beatles.
- The production company AIVA sells AI-generated background music for video games, patriotic infomercials, and tech company keynotes.
- Last April, OpenAI released MuseNet, a music generator that predicts a sequence of notes in response to a cue.
Why it matters: Jukebox’s ability to match lyrics and voices to the music it generates can be uncanny. It could herald a new way for human musicians to produce new work. As a percentage of all music consumed, computer generated music is poised to grow.
We’re thinking: Human artists already produce a huge volume of music — more than any one person can listen to. But we’re particularly excited about the opportunity for customization. What if you could have robo-Beyonce sing a customized tune for your home movie, or robo-Elton John sing you a song celebrating your birthday?