
MEALS WITH MANDY
Designing a recipe finder skill for Alexa
My Role
VUI Designer
Main Goal
Make meal preparation easier with a voice assistant.
PROJECT OVERVIEW
PURPOSE
Making a meal can be difficult in today’s busy world. To help solve this problem, I've designed an Alexa skill that allows users to browse several recipes, then select one and have Alexa assist with the meal preparation. This project is part of the Voice User Interface Design with Amazon Alexa course.
TEAM
1 VUI Designer
TIMEFRAME
12 weeks
SOFTWARE
Alexa Skills Kit
AWS Lambda
Google Suite
DELIVERABLES
-
User persona
-
System persona
-
User flows
-
​Scripts
-
Skill code
FUNCTIONAL REQUIREMENTS
In order to provide the best experience, the following functional requirements need to be met.
-
At least 10 different recipes with at least 3 steps each
-
Breakfast, lunch, dinner, and snack recipe buckets
-
The ability for users to choose their preferred meal
-
The ability to have Alexa select a recipe for a user
-
The ability to ask for a different recipe if the user doesn’t like a suggestion
-
Confirmation for the chosen recipe
-
A strategy for when users miss a preparation step or need something repeated
-
A way to check whether users are ready to move on to the next step​
-
A response for an unsupported utterance
-
A custom help message relevant to your skill
PERSONAS
When designing for voice, there are two sides to consider. It's important to define both the user and system personas.
I interviewed 5 people about their cooking habits and based the user persona off of that. I created the system persona to complement what I thought would be useful and enjoyable for the users I interviewed.
Just like designing for a digital interface, these personas will help keep us grounded when it's time to write the dialog.


USER STORIES & FLOWS
With personas identified, it's time to consider instances and context in which users would interact with the skill.
I started by developing user stories that mapped back to the functional requirements listed above. Then I grouped them into broader categories to determine the most common/critical intents. From there, I developed user flows to demonstrate how the users and system would interact during each intent. To ensure the flows make sense, I added sample dialog for a point of reference. This helped me build a solid foundation for what the user would say and what the system would say.
CHOOSE MEAL TYPE INTENT

Corresponding user stories:
-
As an amateur cook, I need meal suggestions to help me get started.
-
As a nurse with an inconsistent work schedule, I need easy recipes so I can quickly make meals in between shifts.
-
As someone who is on the go, I need meals and snacks that can last throughout the day.
SEARCH BY INGREDIENT INTENT
Corresponding user stories:
-
As a picky eater, I want to filter recipes by ingredients so that I only get recipes with ingredients I like.
-
As someone who likes to host, I want a variety of recipes so I can cater to my guests’ tastes and dietary restrictions.
-
As a budget shopper, I want flexible recipes so my groceries don’t go to waste.
SAVE RECIPE INTENT

Corresponding user stories:
-
As a busy professional, I need a way to save recipes so that I can access favorite recipes when I’m short on time.
-
As a picky eater, I want to revisit recipes that are tried and true.
-
As someone who likes to host, I want go-to recipes that I know my guests will enjoy.

SCRIPTS
Using the flows and sample scripts as a guide, I began to flesh out the prompts and responses in the form of a script.
Scripts are comprehensive and serve as the definitive source of truth for engineers and quality assurance on the voice application. Since they act as the high-fidelity, pixel-perfect deliverable, it's important to be organized. For the purpose of this assignment, I wrote scripts for the user flows outlined above, as well as some global intents.
I used a simple spreadsheet to document the contents of my script. I organized it by intent and housed each intent on a separate tab, where respective prompts, responses, logic, and system actions are detailed. Each tab contains sections that represent all the states of that intent. All utterances that lead to the state, as well as the system’s responses, are defined here.

The most important thing is to create a dialog that feels natural. In order to do so, I employed the following strategies in my script:
Echo User Information
When the user gives dynamic information, such as a date, a time, or in this case, a meal type, I wanted the system to read that information back to the user as a way of confirmation. Letting the user know they are on the right path not only provides reassurance, but allows opportunities for error recovery early on.

Encourage Optimal Phrasing
One way to help users be successful in their voice interactions is to use prompts as a way to model good behavior. In this instance, the system is teaching the user the best way to enter the next state.

Earcons
Earcons are the equivalent of icons and help users understand meaning more quickly than by reading or listening to words. They should be used sparingly and consistently for the same action throughout the experience so that users can learn what they mean. I used an earcon to indicate the very end of an interaction.

Variety
Adding multiple variations of the prompts or responses is a good way to make the dialog feel fresh and natural. Although they give the user the same information, the variety will make the dialog feel less robotic over multiple uses.

Error Handling
In cases where there are no matches, the way we handle the error is critical in providing a good experience. Since this particular skill won’t be able to accommodate all ingredients and recipes right from the start, I’ve decided to let the system be transparent and acknowledge when there are no matches.

In the case of no input from the user, the system should reprompt​.

Ability to Repeat
Since this skill will be presenting a lot of new information to users, most likely while they are cooking, I thought it would be helpful to offer the ability to repeat a prompt in case they missed a step. By supporting “repeat,” the skill allows the user to remain in control. While the "repeat" prompt doesn’t have to be word-for-word repetition, it should contain all th relevant details.

Memory
Part of creating a personal and natural interaction comes from remembering information about a user. A novice user, for example, will require introductory prompts to orient them to the basic functionalities. These novice prompts provide just enough information to get started and explains enough to ensure a user knows how to continue.
Power users, on the other hand, will require a technique called tapering to ensure that they don’t receive information they no longer need. Tapering typically makes the regular prompt even shorter. Logic and rules around implementation varies by experience. In this particular example, I decided to taper the prompt after the first use, but could serve the novice prompt again if the user has not opened the skill within 30 days.

Context
Context plays a big role in making a voice experience more natural. One way that the design can leverage context is by maintaining the current or last-known status of the system so that it can properly interpret the next turn. In this instance, if the user asks to save a recipe upon completion, the system should remember and understand which recipe is being referred to without having to clarify with the user. ​

USABILITY TESTING
Using the scripts, I tested the conversational aspect of the skill using the Wizard of Oz method.
It's a good idea to assess the utterances, prompts, and responses early on in the process. Similar to testing a visual interface, there are different test methods at various stages of fidelity. Since we are at low fidelity at this point, I leveraged the Wizard of Oz method to gauge usability and voice interactions. This method involves a “Wizard” (me) to guide the user through a simulated experience of the VUI being designed.
Objectives
-
Are utterances comprehensive enough to meet user needs?
-
Are prompts clear and direct?
-
Are users overwhelmed by the amount of information?
-
Are there instances where there is too much information or not enough information?
-
Are users able to recover from error easily?
-
Are there any system prompts or responses that are unexpected?
Methodology
The skill was tested in-person at the user's kitchen for a representative environment. In lieu of having a second researcher, test sessions were recorded (audio and visual) for later reference. I acted as the “Wizard” and presented users with a mix of direct and scenario-based tasks. After each task, debrief questions were discussed to get user's immediate reactions.
Tasks & Scenarios
-
Scenario 1: It’s almost time to head out for work, but you want to pack something for lunch. How would you know what to make?
-
Scenario 2: You’re in need of inspiration for dinner, but you’re too tired to make anything elaborate.
-
Direct Task 1: Ask the system to repeat a step.
-
Direct Task 2: Ask for a brunch recipe.
​
​
Test Results
Issue 1: The system does not support ingredient substitutes.
Users find out which ingredients they need after they have chosen a recipe. Users are unsure which ingredients are acceptable to substitute or if they should just choose a new recipe. This skill is not always well positioned to enable users to make a quick meal on the fly; it’s better for planning ahead.
Issue 2: Utterances were not comprehensive for some intents.
Users had various ways of asking for information. Sometimes it took a couple of tries to successfully invoke the intents. This provided a lot of great insight into additional utterances to consider. We need to add utterances for ChooseTypeIntent, IngredientsIntent, and DifferentRecipeIntent.
​
Issue 3: VUI only can be hard to follow.
Users felt that there was a missed opportunity for some visuals. While they appreciated being hands-free, some wanted visuals to reinforce that they were cooking something correctly or to validate if what they created looked right. Because most of these users are amateur cooks, it’s important for them to have a visual guide and a benchmark to compare their results to.
Conclusion
Overall, users were able to get through the core functionality with limited errors. They felt the prompts were direct and were always able to make a decision based on system prompt. The language was easy to understand and there wasn’t really an instance of cognitive overload. The big takeaway is that users would have appreciated a multimodal experience to help clarify certain steps and to make cooking more interesting. They cited things like Tasty videos or Buzzfeed blogs, which are quite rich in visuals and/or sounds that make the experience more engaging and friendly. It also would help users assess at a glance if this is indeed a recipe they want to make, and if not, gives them a chance to bail out quicker. With VUI only, users don’t have the chance to take a sneak peek at the recipes to see if they want to commit to it.
LEARNINGS
When designing a voice interface, there are important considerations beyond the dialog, such as accessibility and privacy.
This project helped me understand and notice what makes a conversation more human, natural, and personalized. Even smaller details like using contractions or conversation markets can make the dialog sound more fluid and keep it moving forward. A lot of tenents of digital design also translates to voice design, such as providing just-in-time instruction or collecting information one piece at a time to maintain forward progress without inducing cognitive load or confusion.
Designing for voice brings about different challenges relating to accessibility, privacy, and safety. I learned that while a voice interface can be more accessible than a digital one, there are still hurdles to consider for users with speech impairments or heavy accents. When it comes to protecting users' rights to privacy, understanding policies and being thoughtful in presenting sensitive information go a long way. As a designer, it's important to keep informed and be aware of the issues that might arise in order make good decisions to keep users and their data secure.