Words are the source of misunderstandings

Nowadays, Voice User Interfaces (VOI) seem to be popping up everywhere. We can find VOI in mobile phones, televisions, smart homes and a whole range of other products. With rapid advances being made in this sector and smart home technology, it is safe to say that the only direction for voice interaction is up.

The way users interact with VOI is vastly different from the way they interact with the more common graphical ones. The main and obvious difference is that, unlike in graphical User Interface (UI), you cannot create visual affordances to guide users through the flow. This lack of visuality leaves users in the dark on what they can expect from the interaction.

To be able to create great voice interactions, you need an understanding of how people naturally communicate with their voices, and you need to understand the fundamentals of voice interaction. Since you cannot fully live up to the user’s expectations of a natural conversation partner, it becomes even more important to design the voice user interface so that it contains the right amount of information and handles the user’s expectations elegantly. To help do so I will refer to the guidelines inspired by Amazon’s best practices for how to create voice interaction skills for Alexa.

Guidelines for designing VOI

Provide users with information about what they can do. Display the options for interaction by proactively asking questions. When doing so, you should always provide an easy exit option for the user to cancel the request.

Where am I? Users can quickly get confused about where they are, or they might inadvertently activate a functionality as understandably they are ‘running blind’. To help ease these issues it is best to make use of full sentenced answers such as “Today’s weather forecast is mostly sunny and dry” rather than just “sunny and dry.”

Express intentions. With a voice system, the risk of ambiguity increases as the speech recognition system may understand the words the customer says with limited ability to map it to an appropriate action or response. For example, let us say a customer asks "Is it nice outside?" The answer depends on mapping to a weather intent, having the customer's location, and delivering an appropriate weather forecast, while making sure you answer the specific question asked. The forecast itself, “it’s 23 degrees today,” might not be the best response compared to what a human might say, such as, “Yes, it's warm and sunny.”

Limit the amount of information. When users browse visual content or lists, they can go back to information they overlooked or forgot. That is not the case with verbal content. With verbal content, you need to keep all sentences and information brief so that the user does not become confused or forget items on a list. It is recommended that you do not list more than three different options for an interaction. Should there be more, you can opt for grouping or by asking if they would like more options.

Use visual feedback. If and when possible, use some form of visual feedback to let the user know that the system is listening. Users get frustrated if they are unsure of whether the voice user interface has registered that they are, in fact, trying to interact with it. Amazon’s Echo Dot, for example, handles this scenario by exploiting its bluish light that swirls around the top rim of the device, signalling that Alexa is listening.


In conclusion, to design great VOI, you must find an elegant way to provide users with missing information about what they can do and how they can do it, without overwhelming them. You must also handle the expectations users have from their experience with everyday conversations by providing information about what they can do and what functionality they are using, telling them how to express their intentions in a way that the system understands, keeping sentences brief and by providing visual feedback so that they know if the system is listening.

Voice user interaction may pose more of a challenge in other aspects too, apart from a graphically based system. However it is fair to say that this method will improve as more aspects of this technology will be integrated in our day to day life.


