On this episode of Product Hacker, we’re chatting with Arcweb Head of Design, Len Damico, a user experience (UX) expert. We’re going to revisit the voice assistant conversation but, this time, from the UX perspective. Len will share his point of view on voice user interfaces. What limitations do household voice applications have? Are they doing what consumers want? If not, where does the gap lie?
Kurt Schiller [00:00:01]: Welcome to Product Hacker, the Arcweb business innovation podcast. We bring you the latest from the world of business innovation from emerging technologies to game-changing ideas. Product Hacker connects you with the people and concepts that are changing the face of business. I’m your host, Arcweb Head of Marketing, Kurt Schiller.
Kurt Schiller [00:00:21]: If you’re in Philadelphia and work in the healthcare field, check out our new meetup, “Healthcare Experience Engineers“. We’ll be exploring the design and implementation of healthcare experiences from a user and patient-centric point of view. You can also check out our recent white paper about healthcare interoperability and patient experience. We’ll put the link in the episode description.
Kurt Schiller [00:00:41]: After years of waiting and use cases that are so narrow that they’re basically just proofs of concept, voice interfaces are finally here. They’re in your car, on your phone, in your house. And it seems like they’re not going away anytime soon. We previously covered some of the products and technology challenges of voice. And now we’re back to talk about the other side of the equation. User experience.
Kurt Schiller [00:01:01]: Joining me today is Arcweb, Head of Design, Len Damico. Len, welcome to the show.
Len Damico [00:01:05]: Thank you.
Kurt Schiller [00:01:06]: So, I want to start this off by prying into your personal life a little bit. Do you use any voice interfaces in your day to day life?
Len Damico [00:01:13]: I do. We have a number of the Amazon products in the home currently, and I’ve been using Siri since whenever my first iPhone was, which was, I think the four maybe.
Kurt Schiller [00:01:23]: Do you find them overall good or annoying or?
Len Damico [00:01:26]: When they’re good, they’re really, really good. But when they don’t work, they’re really, really frustrating.
Kurt Schiller [00:01:32]: In our pre-show discussions, we talked about the sudden and kind of rapid adoption of voice interfaces. Now, I think you compared it to touch screens a little bit where there was this sudden success and suddenly they were everywhere. Is that — is the same thing happening now with voice?
Len Damico [00:01:47]: I think so. I think we’re seeing a lot of mimicry without really understanding the specific use cases that go into or that went into driving the adoption of touch screens. For instance, the touch screens allow you a lot more flexibility in terms of real estate with a physical product than we all of a sudden started seeing lots of laptops and such with touch screens. It’s like, well, now you still have the keyboard there. I’m starting to see a lot of that with voice in interfaces where we’re just adding them into products just because we can, rather than thinking about what are the specific constraints that voice interfaces are meant to solve.
Kurt Schiller [00:02:19]: So you mentioned not keeping in mind these specific constraints that make voice work or not work. What are some of those constraints that kind of come to mind as being either prerequisites or things to avoid or things to consider when you’re deciding about implementing voice?
Len Damico [00:02:33]: There are a couple of different ways to think about voice. There is the input aspect of voice, which is the user speaking to a system and asking it for something. And then there’s the output of voice, which is the system using and telling someone something using voice. With respect to input, voice is great when the user knows exactly what they want and exactly how the system thinks about it.
Len Damico [00:02:55]: So for instance, if we were to ask Alexa to play you some music or to say what’s the weather today? That’s a very well-defined use case. And that’s an area where a voice can shine, whereas where voice struggles, as when the user doesn’t know exactly what they want and has to review a list of options. And you can think of some of your own experiences with, say, a phone tree where the option you want is option 7, but you’re obliged to listen to options 2 through 6 to get there.
Kurt Schiller [00:03:21]: In terms of needing to know a specific thing that you want, well, one thing that I’ve noticed that I think is interesting is that the marketing around these voice interfaces often seems to be marketed around things that are actually not a good use case for it. So, you know, there’s commercials where someone is shown asking Alexa, “Hey, Alexa, tell me some facts about gorillas.” And that’s a very vague inquiry. But, in practice, you know, it seems like you’re much more likely to want to say, Alexa, play me this specific song. Do you think there’s any sense of like a user expectation mismatch or the things about voice that seem cool to us aren’t what they’re actually good at?
Len Damico [00:04:00]: That feels to me like an example of something that engineering has decided voice can do well. So let’s expose this out to our users. So we have a finite universe of facts about gorillas, for example, and it’s easy enough for us to return those to users. And “haha” it gets a laugh. Whereas if we’re thinking about voice from what how much value does this drive for a user? It’s probably not where you’d start. But it’s an interesting proof of concept.
Building a digital product?
Kurt Schiller [00:04:25]: It certainly seems like the sort of thing that fascinates us. Like it’s what we ultimately want our futurology-minded relationship with computers to be or this concept of computers as people. Which certainly seems to be the direction that Apple and Amazon have been going in terms of presenting these interfaces to us. They’re not just interfaces, they want us to think about them as people. Is that something that we see reflected in the design of these tools? Kind of the personification of them?
Len Damico [00:04:57]: The personification, sure. And that’s one thing that I personally struggle with in my own interactions with voice UIs. I would prefer to not think about them as people because they do not– they do not meet the criteria for personhood. And so often they fall into the uncanny valley where the human mind starts to get icked out about the idea of this is coming close to human, but it’s just not all the way there.
Kurt Schiller [00:05:20]: Yeah, I’ve certainly had instances where I became angry at the voice interface and I yelled at it and then I felt bad for yelling at it.
Len Damico [00:05:28]: I’ve found myself– I just say please and thank you a lot. And I found myself that when I say those words to the voice interfaces, it often confuses them more than it does acknowledgments. It’s just– it’s just noise they throw away rather than signal.
Kurt Schiller [00:05:41]: You used an interesting analogy again when we were kind of doing the background for this episode where you likened voice commands to a book of spells that we have to learn. Can you elaborate on that a little bit?
Len Damico [00:05:54]: I’ve noticed that a lot that– there is a very– as much as we’d like to think that voice UIs, a very open green field area for users that users can just talk to these machines and they’ll understand, there’s still a very finite list of commands that these systems will understand. And it really is like pulling in incantations out of a spellbook. Like asking for something in a specific order to get the appropriate album played from Spotify, for instance, or not understanding commas in a band name and things like that. There is definitely an order that the system expects the command to come in. And when you vary from that even slightly, you are disappointed, which is one of the– one of the clearest ways that these systems are still different from interacting with an actual human.
Kurt Schiller [00:06:41]: You alluded to two specific– I think UX conventions in a roundabout way, one of which is discoverability. Obviously, you know, when we look at a visual interface, we can immediately take in all this different information about it. We see that, okay, there’s ten buttons that suggest there’s ten things that I could do short of literally giving a user a list of “Here’s 10 commands, learn them”. How do you make a discoverable voice interface?
Len Damico [00:07:07]: I think the ultimate best practice there is to worry less about discovery and to make your system as broad as possible with respect to the things that it will understand.
Len Damico [00:07:18]: And not just that, but then taking in input that it can’t understand and matching it up against what the system can actually do. So those types of things like, “I heard this. Did you actually mean this?” And that’s a way that you can sanely expose lots of other functionality to user in a way that’s not onerous.
Kurt Schiller [00:07:36]: So, again, kind of using that emotional or conversational cue to lead someone back to the correct behavior for the system to understand without being like, “Here’s the ten things I understand, please say one of them”. And I think you alluded there to feedback as well, which there was a pretty well-publicized news story a few weeks ago about a couple that accidentally sent a– or I should say Alexa accidentally sent a recording of their private conversation to a contact in their contact list because Alexa thought they said “message this person” and then heard them or thought it heard them say, “Okay, recording”. And they didn’t receive– and they didn’t notice any feedback from that interaction. And then it sent a 12-minute audio recording of them. Fortunately, they were just talking about like home remodeling or something.
Len Damico [00:08:26]: Right.
Kurt Schiller [00:08:26]: In terms of a voice interface giving good feedback, are there any best practices that we can learn from other mediums or from other human experiences?
Len Damico [00:08:36]: In that case, I’m familiar with the case study you’re referencing. I struggle with it honestly, because more or less the device behaved exactly as intended. It wasn’t a case of it being hacked or doing something nefarious. It thought it heard what it thought it heard. So in terms of how we can do a better job of designing user interfaces for these voice devices that protects our users from things like that, I think it comes down to things like smart defaults and not allowing people to have things have the messaging component turned on by default. Make that a very purposeful and intentional thing that a user has to do. And there are other opportunities for you to say, let’s put the user in a sandbox as much as possible until they really understand the system and understand its power and get a sense of what it can actually do for them.
Kurt Schiller [00:09:22]: Is there something about voice that is uniquely suited for performing a very wide range of tasks versus something like a visual interface?
Len Damico [00:09:32]: Sure, sure. If we get the voice recognition to be competent enough that it can realize a long– a wide range of inputs, then yes. I think voice is also really great for that last mile type of stuff like you talk. One of the first things I hear people talk about their Alexa devices. One of the first use cases they present is it’s great for kitchen timers and it’s great for that last mile type of thing where you are– you are almost all the way into the kitchen, but you aren’t quite all the way there and you want to be able to shout something from across the room rather than. There are a couple of different ways you could do this but let’s use voice since that is the most convenient rather than pulling out your phone and putting a timer or going to the stove timer.
Kurt Schiller [00:10:09]: But at the same time, I think there’s an interesting UX constraint there where I’ve certainly been upstairs and an alarm that I set on my Alexa downstairs starts going off and I want it to stop. And I find myself trying to shout down the stairs to it. If this were just on my phone, I would just be able to take my phone out and stop it. But because it’s tied to a physical location and generally doesn’t move, it places kind of additional constraints on the user to think about, Okay, I’m in this room. The device is in that room. Maybe it can hear me. Maybe it can’t. Maybe it will misunderstand me.
Len Damico [00:10:41]: I find it very interesting as a home with also a multi-Alexa setup that it never works the way I think it will. Whenever I want it to be considering my voice throughout the entire house, it’s only constrained to the one location. And whenever I want it to be constrained to the one location, it’s acting throughout the house. So that very much feels like an unsolved problem still.
Kurt Schiller [00:11:00]: So earlier you mentioned lists specifically as being something that voice is not great for. Do you think that there is a specific kind of list of use cases or types of user experience that voice is just not well-suited for, or is it a case of we just don’t– we just haven’t figured out how to do that effectively yet.
Len Damico [00:11:19]: I think it’s still– I think there’s still a lot of learning to be done. I think lists are generally bad. I think things where ambiguity is possible. For instance, browsing a catalog of things that have a lot of sub-options, for instance, and there’s not a whole lot of differentiation between them. That’s obviously a struggle for a voice interface system.
Kurt Schiller [00:11:40]: So we talked a lot about bad or aggravating voice experiences for users, especially in terms of kind of the spellbook approach. Are there other things that you’re seeing out in the wild that voice is attempting but simply isn’t working or things that are common user frustrations or shortcomings that are kind of low hanging fruit we can just stop doing, as people, you know, designing products?
Len Damico [00:12:06]: I think the lowest hanging fruit is really just an opportunity for restraint for those of us who are actually building the products. For instance, not just doing voice because we can. There are so many cases of someone just forcing a voice interface onto something that just doesn’t need it.
Len Damico [00:12:22]: I think that is where you start to see a lot of the bad experiences when we’ve just decided we have to do voice because we have to do voice.
Len Damico [00:12:30]: And Amazon has done 80 percent of the work for us. Well, what about that? Let’s say Amazon hadn’t done 80 percent of the work for you. Would you still be interested in exploring this voice interface?
Kurt Schiller [00:12:40]: Could also be that maybe designers should look to take advantage of existing interfaces and infrastructure for voice instead of trying to make their own that’s specialized to that. So it seems to me that one of the things that Alexa has done so well is it allows lots of different people making lots of different products to kind of all tap into the same one interface and conventions for their voice. So it becomes a much more organic experience of learning it versus here’s this new product, learn all these voice commands for it.
Len Damico [00:13:11]: Yeah, certainly. It’s just like anything else in user interface. When you are trying to create new conventions, you’re always going to have bumps at the beginning and for a while you may be better served to tap into existing conventions because people have already used them and they’re not going to feel those pain points of learning how to interact with the device when they are interacting with your particular product and they won’t take that out on you as a result.
Kurt Schiller [00:13:35]: So you mentioned again on our kind of pre-show discussions that there’s a level of abstraction present in a graphical user interface that isn’t there in voice. Could you talk a little bit more about that?
Len Damico [00:13:47]: Sure. One of the reasons that voice feels so right when it’s right and so frustrating when it’s wrong is we’ve been as humans talking as basically since we came out of the room. We just are used to interacting with the world through our voice, whereas, touching things and then a system, doing something and returning feedback, that’s not a more natural experience. So we’re a little bit prepared to put up with more frustration or less obviousness in those types of interfaces whereas when we say, could you please hand me that glass of water? We expect a human to do that when we are tapping a soda fountain to say, please pour me this drink, that’s a little bit of an abstraction and that’s a little bit of a different experience as opposed to asking for something with your voice.
Kurt Schiller [00:14:35]: Yeah, we definitely expect a very high fidelity of comprehension in most scenarios where we’re using voice currently, which is basically just talking to people. You know, if someone is not able to understand, it’s immediately frustrating and seen as this, you know, as a very big problem, whereas with the voice interface, you know, you’re very likely to keep encountering, “Sorry, I don’t know how to do that” or “Sorry, I’m not sure what you said, could you say it again” or something I’ve encountered a lot with voice interfaces is just a nonspecific failure where I’ll ask for something and it’ll say okay, and then nothing happens. And then I think there’s something uniquely aggravating about that because you don’t actually even know what went wrong or what to do differently.
Len Damico [00:15:20]: It’s just like any other system. Failing silently is among the worst things you can do because you as a user don’t know what to do next.
Kurt Schiller [00:15:27]: So I think we’ve covered a lot of ground already. But I did have one thing that I wanted to ask before we wrap up. I originally had a strange experience where I encountered an outdated voice interface where it was one that was about 10 years old. Really, before the advent of, I guess what I would say is the modern era of Siri and an Alexa-type interfaces. It made me think about how these interfaces will age over time versus a visual interface and kind of where the overall medium is going.
Kurt Schiller [00:15:59]: Are there ways that we can design voice interfaces to be timeless? Or is it just going to become the norm that we’ll understand oh well, this is an old voice system and so it probably doesn’t work very well.
Len Damico [00:16:11]: I think my answer would be pretty similar to that for any other user interface which is design it well and it will age well. So if we are doing things like we are– we are allowing the interface to be useful if we are not trying to follow trends and be cute in the user interface but provide value, if the interface works well, it will age well. And that’s the same for a digital product and same for an iPhone app, same for a web system and same for voice UI.
Kurt Schiller [00:16:42]: Awesome. Well, thanks so much for joining us today, Len. I really appreciate it.
Len Damico [00:16:46]: It was a pleasure. Thank you.
Kurt Schiller [00:16:49]: Thanks for joining us today. Next up, we’ll be doing a mini-episode about consumer giants like CBS, Amazon and Apple entering the healthcare space and what that means for product people. As always, Product Hacker is brought to you by Arcweb Technologies, a digital design and development firm in Old City, Philadelphia. To learn more, visit Arcwebtech.com. Product Hacker is produced by Martin Schneider and hosted by me, Kurt Schiller. Don’t forget to like and subscribe and see you next time.