The year 2018 has been dubbed the year of voice technology. The sheer number and diversity of the demos at this year’s CES, the buzz on tech sites, and the rapid uptake of voice activated assistants like the Google Home and Amazon's Alexa depict the oncoming boom of this UX revolution.
The significant growth owes to the fact that voice technology fits human behaviour so well; voice interactions are the most intuitive, natural and efficient way to communicate for a small, but growing, number of tasks. Voice is the only form of communication that can be used while our eyes, hands, or whole bodies are engaged in some other form of activity, like driving, exercising, getting dressed, or doing household chores. In a recent report from Accenture, nearly half of people in the US are already using voice-activated digital assistants in their smartphones or tablets, and the ownership of standalone digital assistants, like Google Home and Amazon Echo is expected to double in 2018. As of last month, both assistants are now available in Australia predominantly as 'smart speakers' sold at major electronic retailers, and it is only a matter of time before the technology will be available in everything from microwaves to cars, and from TVs to mirrors.
It is still imperfect, but speech recognition has reached an acceptable level of accuracy for most consumers now, with all major platforms reporting an error rate of under 5%. With the improvements in recognition accuracy, today’s challenge is more in the design of the conversation itself, which is deceivingly difficult to codify in a way that will be perceived as high-quality, highly natural, and user-friendly.
The fact is, natural conversation is hard - it's complex and subtle. Humans do an incredible job verbally communicating without realising how hard it is, particularly with what we call 'error recovery' - clarifying misunderstandings or mis-statements and false starts and stops. We communicate unclearly all the time, but with human-human interactions, we have the ability to come to a common understanding via complex conversational strategies we learnt in the schoolyard and now take for granted. For automated systems, the challenge with high-quality voice design is achieving anywhere near this same level of error detection and recovery - not only for the speech recognition errors, but also the mistakes, misunderstanding and 'mis-speaks' that humans themselves can and do make regularly. As we say in the industry, the 'happy path' is easy, but quality comes from the handling of 'non-happy paths', and is about 80 percent of the work.
It may surprise some, but automated voice assistance is not a brand new field, and VUI (Voice User Interface) design has been a small but vibrant specialist field for more than two decades. Companies like Salmat have long worked in this space, originally designing and developing sophisticated voice-driven technology for call centres. This ranged from natural language understanding of responses to "how can I help you?" on a telephone call, through to complex multi-step interactions such as gathering an order of multiple pizzas, drinks and side orders.
For businesses, it’s an opportune time to join the voice revolution but it is up to brands to make the relationship with their audience mutually beneficial by designing and delivering a smooth experience that is useful, or at least entertaining or brand-building. When done well, the technology provides the platform for brands to communicate credibly, and provide a low friction channel for customers to engage with. Our own research found that almost half (46%) of consumers are willing or excited to use an in-home voice assistant to interact with and shop from brands, and Google recently reported that 62% of Google Home users plan to make a purchase through their speaker over the coming month, while 58% use theirs to create a weekly shopping list.
With this in mind, many savvy local brands, like NAB, Australia Post and Woolworths, are already strategising and experimenting with voice technology. They are exploring how their products can fit seamlessly into smart voice ecosystems, and what level of integration will work best in order to reach, convert and serve customers. At Salmat, we are experienced in voice design and our team of developers are working on a number of internal and external voice and conversation projects to assist brands reach, convert and serve their customers.
There is little doubt that voice assistants will impact consumers' day-to-day behaviours - and quite likely their purchasing habits. Therefore, Australian businesses need to carefully consider how they approach the technology in respect to brand messaging and business objectives, while consciously focusing on the user experience. The quality of conversation design and practical implementation is ultimately the differentiator in this voice-activated virtual assistant space.
We are still some distance from realising the ultimate potential of speech recognition technology. This applies both to the sophistication of the technology itself and to its integration into our lives. It's likely that over time, customers will begin to expect, even demand, convenience from their brands — and if brands are not on voice, then they will effectively be silent when their customers are wanting to interact on this new channel. Voice is the new frontier, but as always, design matters.
By Salmat senior technical lead of speech solutions, Peter Nann