“Hey, Alexa … what’s new with voice assistant technology?”
Over the last decade, our understanding of voice technologies has grown and evolved as voice assistants continue to integrate into every aspect of our daily lives. So much more than just the voice responding to verbal requests to play your favorite playlist, turn off the lights and report the weather, voice assistant devices now serve as the point of contact between you and nearly all of your connected devices.
Today, voice assistants make our lives easier and streamline our relationship with technology as machines have become better at hearing, recognizing and processing human speech. And much of that evolution can be credited to the developments in neural network software and new hardware that allow the use of voice assistant technology in low-power applications.
The History of Voice Recognition Technology
In the 1950s, researchers at Bell Labs figured out how to build a system that could actually recognize speech, but due to the size, power requirements, cost to produce and maintain and the fact that it could only comprehend digits, the Audrey system lacked mass appeal. Moving into the mid-1980s, certain machines had vocabularies of about 20,000 words and were able to process speech by predicting the most likely result based on previous interpretations, but they still couldn’t automatically adapt to individual speakers. While the 1990s saw broader access to both personal computers and speech recognition tech, the price tag for voice assistant technology was still well north of $500, preventing widespread adoption.
Things really took off in the last decade, where we’ve seen significant steps forward for both voice recognition and software intelligence. Siri was integrated as Apple’s voice assistant in every voice-capable product, IBM’s Watson beat Jeopardy! grand champion Ken Jennings by answering questions based on natural language and tech giants Google, Microsoft and Amazon all revealed their own voice assistant technologies.
Today, there are more than 3 billion voice assistants in use around the globe with expectations of hitting 8 billion in 2023. But without the development of neural networks enabling their use in low-power applications, the ubiquity of voice assistants would not be possible.
From Plugged In to Portable: The Future of Voice Assistants
From our homes to our cars, as well as in retail, education, health care and telecommunications environments, voice assistants are pretty much everywhere these days. Digital assistants that use voice recognition, speech synthesis and natural language processing (NLP) to provide a service through a particular application. The earliest iterations of voice assistants had to have an internet connection so that they could be constantly streaming audio to the cloud, and they had to be plugged into the wall.
However, due to challenges surrounding these technologies being able to recognize that users want to talk to them and the resulting privacy concerns of always listening, there was a significant push to move the intelligence to limit how much audio is streamed into the cloud. Voice assistants eventually moved to a model where they still had to be plugged in but would only stream audio to the internet when actually being commanded to do so. The next step was to bring power requirements down to integrate voice assistants into smaller devices that didn’t require being tethered to a power source.
The current state-of-the-art is to use a neural network engine as hardware that recognizes the user wants to talk to the assistant and only starts streaming audio when they are actually issuing a command, making it possible to use the technology in smaller, lower-power devices that can be taken anywhere a user wants to go.
Neural Network Software — and Hardware
Neural networks are sets of algorithms, loosely modeled after the human brain, designed to interpret data and recognize patterns. Right now, neural networks run in the cloud on backend servers at Google, Amazon, Microsoft or wherever, helping us cluster and classify real-world data.
However, the industry is also moving to push more voice recognition out of the cloud and into the device. Doing so ensures devices are not tethered to wires or even a wireless network for a lot of functionality. Plus, moving more of the language processing and servicing to the device and out of the cloud addresses the privacy concerns we’re seeing becoming increasingly important to people.
With voice, there are two things you want from your neural network:
- When you say the wake word, you want it to notice and switch into the mode where it’s going to service your command.
- When you’re not saying the wake word, you don’t want it switching into that mode.
Today, the same software model a developer would run on a software-based neural network engine in the cloud can now be put in a chip for use in a smartphone or in a device that fits in a user’s ear.
Thanks to this development, voice assistant technology has evolved to the point where, as we see with Apple’s AirPods, the device can be in your ear, meaning it doesn’t need to be plugged into a power source, and it’s always listening and streaming voice audio to the cloud but only when given a command. In short, always-on, low-power neural networks can now listen for wakeup words in voice assistants — something that used to require several different technologies because none of them scale up or down particularly well.
At Cardinal Peak, we have deep experience with neural networks, as well as voice capture, detection/recognition and response applications. We know the space, and we can work with you to smooth out or convert your product requirements into a manufacturable and shippable design. We are an Alexa Voice Services (AVS) and Alexa Mobile Accessory (AMA) solution provider, and we’re prepared to help you bring your voice-enabled product to market.
From capturing to recognizing and responding, voice assistants are changing the world around us. After all, your voice is their command.
To discover how we deliver best-in-class connected and standalone voice solutions that meet the challenges of noisy and acoustically difficult environments, get in touch with us!