Musicians and filmmakers adopted AI as a standard part of the audio-production toolbox.
What happened: Professional media makers embraced neural networks that generate new sounds and modify old ones. Voice actors bristled.
Driving the story: Generative models can learn from existing recordings to create convincing facsimiles. Some producers used the technology to generate original voices, some to mimic existing voices. You can hear their work via the links below.
- Modulate, a U.S. startup, uses generative adversarial networks to synthesize a new voice for a human speaker in real time. It enables gamers and voice chatters to inhabit a fictional character, and trans people have used it to adjust their voices closer to their gender identities.
- Sonantic, a startup that specializes in synthetic voices, created a new voice for actor Val Kilmer, who lost much of his vocal ability to throat surgery in 2015. The company trained its model on audio from the Top Gun star’s body of work.
- Filmmaker Morgan Neville hired a software company to re-create the voice of the late travel-show host Anthony Bourdain for his documentary Roadrunner: A Film About Anthony Bourdain. The move prompted outrage from Bourdain’s widow, who said she had not given her permission.
Yes, but: Bourdain’s widow isn’t the only one who’s disturbed by AI’s ability to mimic deceased performers. Voice actors expressed worry that the technology threatens their livelihoods; they were upset by a fan-built modification of the 2015 video game The Witcher 3: Wild Hunt that included cloned voices of the original actors.
Behind the news: The recent mainstreaming of generated audio followed earlier research milestones.
- Open AI’s Jukebox, which was trained on a database of 1.2 million songs, employs a pipeline of autoencoders, transformers, and decoders to produce fully realized recordings (with lyrics co-written by the company’s engineers) in styles from Elvis to Eminem.
- In 2019, an anonymous AI developer devised a technique that allows users to clone the voices of animated and video game characters from lines of text in as little as 15 seconds.
Where things stand: Generative audio — not to mention video — models give media producers the ability not only to buff up archival recordings but to create new, sound-alike recordings from scratch. But the ethical and legal issues are mounting. How should voice actors be compensated when AI stands in for them? Who has the right to commercialize cloned voices of a deceased person? Is there a market for a brand-new, AI-generated Nirvana album — and should there be?