My Irish grandmother used to say there were three signs of madness. First was hairy palms (I am not claiming these signs are scientific in any way). Second was talking to yourself. And third was looking for the hairs on your palms.
However, next time you see someone standing on a street corner apparently talking to themself, they are more likely to be in conversation with Apple’s Siri, Microsoft’s Cortana, Google Now, Baidu’s Duer,or Facebook’s M—rather than exhibiting the second stage of madness as defined by my grandmother.
Voice recognition is coming of age and it will completely change the way we interact with all the technology around us.
From the beginning of the modern computing era in the 1950s, the idea of humans interfacing with technology through natural spoken language has been a key ambition. However, this ambition has long been frustrated. Despite computers outperforming humans in other complex tasks, quality and contextual speech interaction with technology has, until very very recently, proven difficult.
In the last few years though, due to vastly improved computing power combined with Artificial Intelligence (AI) deep learning algorithms and increased volumes of user data, speech recognition has come along in leaps and bounds. Google, Apple, Microsoft, Baidu, and Amazon are all investing heavily to improve web-wide voice search. Facebook has now entered the fray, launching Facebook M—its AI-powered personal assistant tool that sits inside its Messenger app.
Google announced late last year that it has got its speech-recognition error rate down to just 8 percent. Compare that to 20 years ago, when Microsoft launched its first ever speech-recognition technology along with Windows 95, and the project lead stated the error rate was almost 100 percent.
Mobile and 16- to 34-year-olds are leading the charge in using voice recognition, with only 13 percent claiming that they have not used the voice features on their device, and 50 percent saying the frequency of use is growing rapidly. According to comScore, 200 billion searches per month will be done with voice by 2020.
Related: Why APAC marketers should note Google's first Vietnam ad
The really interesting development though, is how quickly speech recognition is moving into the technology in our homes, cars and everyday lives. Samsung now produces a Smart TV that is 100 percent voice controlled. No more looking for that remote: Switch it on and off, change channel, access apps and search the web all by voice. LG has a voice-controlled vacuum cleaner. Tell it to clean (and where) and off it goes. And the Vocca Light, this is a little piece of tech that allows any ordinary light bulb to be voice activated.
More than this though, connective voice technology such as Homey—which connects all your devices in your environment so you can manage them with the sound of your voice, allowing you to adjust your Facebook status, light switches, thermostats, and more from one place—is becoming readily available.
Everyday technology with advanced natural speech interfaces will soon be ubiquitous, and many companies you currently interact with are driving forward this version of the future.
For example, the new Apple TV rejects any apps whose core functionality does not support Siri, and let’s not forget the very interesting Amazon Dash. This little gadget is for Amazon’s Fresh grocery service. It allows you to scan bar codes to replace products, but it also has a mini-microphone. Why scan, when you can just say “milk, eggs, juice” and it adds them to your grocery list, using historical preferences to determine selection?
To me this is the most interesting proposition, the idea that any environment—home, work, car, outdoors—could be populated with mini-microphones through which we interact with an invisible, omnipresent AI (or bundle of AIs) that assists us in our daily lives.
This however, presumes that these microphones are in continuous listening mode, waiting for our interaction, but listening to our daily patterns and generating deep insight into who we are and how we like to live our lives, and possibly proactively responding to situations. That has dark overtones.
I prefer to think the potential for voice is along the lines of Star Trek: The Next Generation, where the crew said “computer” (from any location) and generally then asked some pointless question about Klingons, black holes, Q or what was happening, and the benevolent AI would happily tell them the answer.
Kristian Barnes IS CEO of Vizeum APAC