The most important thing about a documentary deepfaking Anthony Bourdain's voice isn't that it happened, but that it happened and almost nobody noticed.
Director Morgan Neville faced skepticism and outright revulsion on social media this month when it was revealed he used artificial intelligence to create a model of Bourdain's voice for 45 seconds of narration in the new documentary "Roadrunner," about the life and 2018 death by suicide of the beloved chef and journalist.
Bourdain's voice was one of his trademarks, known to fans the world over from his TV travelogues "Parts Unknown" and "No Reservations." Fans also loved how authentic he seemed, always able to level with the viewer. Faking his voice, to some, was a step too far.
"In the end I understood this technique was boundary-pushing," Neville said. "But isn't that Bourdain?"
Yet the boundaries have already been pushed far beyond Bourdain's legacy or the mere confines of documentarian ethics. The voice imitation revolution is already here, and artists, technologists and companies in several industries who use the new tech are grappling with the big question of what happens when you separate speech from the speaker.
Need a synthetic voice that can read text for the visually impaired? A human voice actor can't preread every possible sentence in the world but an AI-built voice could cope. Have a video game that's been in interminable production for years and want to avoid hauling in voice actors for rerecording every time there's a script change? Tweak their dialogue in production.
"There are endless possibilities. We believe this is the CGI of audio," says Zeena Qureshi, co-founder and CEO of Sonantic, a start-up formed in 2018. "We made the first AI that can cry last year. We made the first AI that can shout early this year."
Potential commercial uses abound. Sonantic's website touts AI voices with "STUNNING REALISM, CAPTIVATING EMOTION" that can "deliver compelling, lifelike performances for games and films with fully expressive AI-generated voices." It also promises to "reduce production timelines from months to minutes by rapidly transforming scripts into audio."
Another synthetic-voice company, VocaliD, pitches to potential corporate clients that "the volume and speed with which written content must be transformed into brand-consistent sound bytes cannot be met by traditional voice talent or generic text-to-speech." It imagines scenarios "you need audible content fast, but your voice talent isn't available. Don't miss the mark by delivering flat or generic sounding content that doesn't foster the connection you've built with your audience."
A third company, Resemble AI, offers services like voice "cloning" and has short clips of synthetic speech from former President Barack Obama and actors Morgan Freeman and Jon Hamm (though potential clients are forbidden from cloning the voices of celebrities without permission). The company says a voice clone can start to be built if it has 50 sentences from a real speaker to synthesize.