Microsoft has come even closer to achieving true parity with human speech recognition, in semi-AI form.
An announcement from Xuedong Huang, a Microsoft Technical fellow, reveals that the firm's latest tests show a 5.1 human parity word rate error, an improvement from the 5.9 previously announced, which was already better than that of a regular, casual human conversation (and twice that of a Loose Women panellist).
The results are based on the standard Switchboard test, a measure based on a flurry of conversations which the machine being tested then transcribes.
Huang said, "We introduced an additional CNN-BLSTM (convolutional neural network combined with bidirectional long-short-term memory) model for improved acoustic modelling. Additionally, our approach to combine predictions from multiple acoustic models now does so at both the frame/senone and word levels."
Microsoft goes on to explain that the system can now tune itself better to the language model of the individual's previous conversations to better predict what it would say next to improve the topic and context - something that its current public bots are doing less sell at.
The post warns that the community still has much to do. Noisy surroundings, far away mics, weird accents and more fundamentally, speaking styles and languages where there is limited training data.
In a week when the BBC has announced a dedicated Pidgin English language radio service, how likely is it that AI will actually be able to speak it or even interpret it as a different, valid language from British English?
Also, an issue is that computers understand the words, and can contextualise them and add meaning. An error rate is one thing but to meaningfully interpret speak, computers will have to learn to better understand them, with Huang adding "Moving from recognising to understanding speech is the next major frontier for speech technology."
Mozilla is in the midst of the process of publicly collating new data to make its speech library more accurate. The company then plans to release the audio data as a library to make it easier for smaller businesses and individual coders to incorporate speech data.
Kicking Palantir off of AWS is among their demands, too
Rafaela Vasquez was watching The Voice at the time of the crash, new evidence shows
PUBG price slashed on Steam after selling more than 50 million copies - as daily player numbers plunge
Use the same password for every website? It might be time to change them all