Over the last few weeks, I’ve learned a lot about how to design for a Voice User Interface (VUI).
One of the most important things about designing a VUI is the writing portion. Since the main interaction if through voice, it’s important to make the conversation with Alexa as naturalistic as possible. Without formal scriptwriting training, it can be hard to write a conversation without sounding a little weird. One thing I learned from watching the webinars is the ways that the developers practice writing a skill. The first thing they did was roleplay, where one person pretended to be the user and the other pretended to be the Alexa device. Another thing they did was writing out little scripts playing out different scenarios. I tried the scriptwriting method when writing my own skill, which did help me formulate how I wanted Alexa to respond to certain types of questions and make guesses about how a user might interact with her. However, I still wasn’t able to anticipate everything a user might say.
In order to do that, I’ve learned that you really need to do user testing with another person that wasn’t involved in the writing of the skill. It’s impossible to guess every single thing a person might say or every way a person might phrase a question, so user testing is really important for filling in those blanks.
I think writing the skill to make it easy for users to figure out what functions are in the skill and how to call those up is really important. However, it’s not as easy as just front-loading the commands in the introduction of the skill; really involved skills can contain a lot of commands, and a user won’t necessarily want to listen to an entire list of them, especially after they’ve used the skill for a while and already know all the commands.
Another thing the developers advised in the webinars was to actually say things out loud. This is useful for a couple of reasons. Sometimes, just reading back the dialogue you’ve created out loud can help you realize how awkward it sounds. We all have conversations of some kind or another pretty much every day, so we can usually tell when something is off when we hear it. If we just write it down and never read it out loud, the oddness might never trigger without the audio cue.
It’s also important to read the skill name/trigger phrase out loud to see how hard it is to say or how weird it sounds. When I created my first skill, I named it “Magic Tool.” However, it sounded very similar to an existing Alexa skill called “Magic Door.” Other people created skills that were too long or awkward to say out loud. If people can’t say the skill or they can’t pull it up consistently because the name is too hard to say, they’ll stop using it.
Another lesson that was repeated a few times is that a voice skill has to make things easier — it has to add something to a user’s experience. If a person finds using the voice skill more inconvenient than just turning to their phone or tablet or computer, then maybe there’s no need to create a voice skill. Even though voice is the exciting new thing, it might not perfectly translate into every app or task.
Originally, I was a little skeptical about owning a voice assistant. I couldn’t really see myself using one for more than the novelty of it, and it seemed like a bit of an investment to get them to be really functional since you’d have to buy new lights and electronics that linked to the device.
However, I now find myself using it practically every day. I like playing Jeopardy (even though I usually only ever get half the clues correct on a given day) and Question of the Day. I wake up and get my daily weather and news updates. I’ve even started using it to do really simple things like set timers when I’m cooking, add things to my shopping list, and I’ve learned that it’s really nice not to have to stop what I’m doing to do fairly tedious things. I’ve also started to think about how it could be really helpful to someone who is less mobile.
There are some things that are still less than ideal about the experience. I’ve found that using Alexa to control my Roku is usually more trouble than just picking up the remote. For example, by the time I’ve gotten out the command to have Alexa pause whatever I’m watching, I’ve already missed at least 30 seconds of dialogue. Rewinding is problematic for the same reason but made worse because I have to repeat the command over and over to rewind for any significant amount of time; for some reason, Alexa only rewinds a few seconds at a time. I think this is a case where a skill was designed that isn’t necessarily useful. I wonder if the voice functions on the Fire TV were handled better.
For now, I think I’ll keep using it, and I might even invest in some smart lights at some point. I look forward to seeing how else the technology grows.