Photoshopped Voices, Now With Less Data And More AI

These Lyrebird people haven’t yet reached Adobe VoCo levels of voice fakery, but they’re getting there. And though their aim is to sell their technology to companies whose products include speech synthesis, once it’s widely available, the implications are quite similar.

What you’re listening to is a Lyrebird generated Donald Trump, Barack Obama and Hillary Clinton talking about the company. Right now they sound quite low quality and computer generated, but with the arguable exception of Hillary it isn’t hard to figure out which voices those are. And though it may be tempting to write that sample off as digitized garbage and move on, it’s worth keeping in mind that those voices are as close to the genuine article as they are after the use of voice samples under a minute long as opposed to the 20 minutes required by VoCo.

This is all made by possible through the use of artificial neural networks, which function in a manner similar to the biological neural networks in the human brain. Essentially, the algorithm learns to recognize patterns in a particular person’s speech, and then reproduce those patterns during simulated speech.

“We train our models on a huge dataset with thousands of speakers,” Jose Sotelo, a team member at Lyrebird and a speech synthesis expert, told Gizmodo. “Then, for a new speaker we compress their information in a small key that contains their voice DNA. We use this key to say new sentences.”
The end result is far from perfect—the samples still exhibit digital artifacts, clarity problems, and other weirdness—but there’s little doubt who is being imitated by the speech generator. Changes in intonation are also discernible. Unlike other systems, Lyrebird’s solution requires less data per speaker to produce a new voice, and it works in real time. The company plans to offer its tool to companies in need of speech synthesis solutions.
“We are currently raising funds and growing our engineering team,” said Sotelo. “We are working on improving the quality of the audio to make it less robotic, and we hope to start beta testing soon.”

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.