Software developed by Chinese technology giant Baidu can clone a human's voice with just seconds of training audio, with implications for both trust and security.
Baidu has been working on Deep Voice for over a year, and had already managed to reproduce speaker identities with about half an hour of training data. With new developments, it has lowered that time to 3.7 seconds.
A believable, if low-quality, false voice can now be produced from a only single sentence of speech. Of course, more training leads to higher-quality results, especially if there is more than one sample to learn from.
Baidu has uploaded an array of system demos to Github, showing the capabilities of Deep Voice (the single-sample demos, to be fair, range from hilarious to creepy). These include voice cloning and various manipulations, such as changing the voice from male to female or British to American.
While the company writes that ‘Voice cloning is expected to have significant applications in the direction of personalisation in human-machine interfaces', there are naturally concerns about identity theft.
Tom Harwood, CPO and co-founder at voice security solutions provider Aeriandi, said:
"This technology is poised to transform personalisation in human-machine interfaces, but it raises serious concerns about voice biometric security systems. Soon, criminals will need just a few seconds of someone's voice to cheat a voice recognition security system - voice biometric authentication will be rendered useless.
"Organisations need to be thinking now about how they can implement new technologies to ensure they stay ahead of the curve. Voice fraud detection technology is the primary candidate, as it looks at far more than the user's voice print; it considers hundreds, if not thousands, of other parameters. For example, is the phone number being used legitimate? Has it been used fraudulently before? Increasingly, phone fraud attacks come from overseas. Voice fraud technology has been proven to protect against this as well as domestic threats."
The development of Deep Voice to require only minimal training speech could further raise distrust in internet media - mimicking the ‘deepfakes' fake celebrity porn videos that began popping up earlier this year.
Cotton seedling freezes to death as Chang'e-4 shuts down for the Moon's 14-day lunar night
Fortnite easily out-earns PUBG, Assassin's Creed Odyssey and Red Dead Redemption 2 in 2018
Meteor showers as a service will be visible for about 100 kilometres in all directions
Saturn's rings only formed in the past 100 million years, suggests analysis of Cassini space probe data
New findings contradict conventional belief that Saturn's rings were formed along with the planet about 4.5 billion years ago