Conducting Better UX Research for Conversation Design: Measuring Chatbot Success

Overview

Conducting UX research on a chatbot while architecting it is a lot like studying the engine of a car as you’re building it. However, early and often testing for chatbots is just as important as it is for other digital products. Here is an in-depth look at how I used UX research methods to improve Máximo, our dinosaur chatbot for the Field Museum, as well as some tips for how to use methods to improve your own conversational agents.

Use a variety of UX research methods to inform your agent’s scope

Since setting expectations with users is so important to conversation success, it’s important to keep the question of “What do users need to know from my bot?” in mind at all times. UX research can help you determine what decisions users need to make that your bot can facilitate and what motivations need to be triggered in order to make those decisions. Máximo, our educational chatbot for Field Museum, had an open-ended scope of knowledge; after all, a 101-million-year-old titanosaur is a bit of a blank slate. I used a variety of methods to identify what he needed to know.

Stakeholder interviews

Stakeholder interviews with key museum team members helped surface some of the top scientific facts that the team wanted Máximo to be able to provide to users. We were able to finalize these topic areas--Maximo’s size, his Cretaceous world, his discovery, and a few others--as business requirements for our agent and later validate them through on-site user testing.

Sentiment analysis

I conducted sentiment analysis on user tweets during a period where Máximo was using Twitter on the museum’s account to learn how users interacted with his personality. This helped place anticipate constraints on Maximo’s knowledge areas: it made sense that he more completely understood his Cretaceous world, for example, but not so much whether Soldier Field has available parking.

Competitive analysis

Competitive analysis can provide a benchmark for how well your bot needs to perform compared to agents with a similar purpose. It can also inform you of common affordances and pitfalls.

I tested other conversational agents, like National Geographic’s Tina the T. rex and the Westworld chatbot, in order to understand how successful these bots were at answering my questions, how they dealt with error prevention, and what I might do to improve their experiences. Though these agents were entertaining, I saw opportunities for improvement on conversation depth, fallback control, and variety of responses.

Test chatbot content early and often with users that match your hypothesized personas

We all struggle with bias in UX research; you never know your users as well as you think you do. Chatbots (and voice assistants) position researchers toward compassionate UX design more than other products, simply due to the fact that they are literally an engine of user intent. Unlike other products, you know exactly what your users need from your chatbot--because they’re typing it! In this way, your chatbot is an incredible tool for UX research in itself.

For Máximo, I ran quick sessions with PRPL employees who were unfamiliar with the chatbot project early on to test Máximo’s conversational framing. I quickly found that their conversations failed because they weren’t sure what to ask him. This motivated me to incorporate stronger framing for what Máximo could answer in his welcome greeting.

Since users were going to be texting Máximo with our mobile web and SMS clients, I designed a study to be conducted on the museum lobby floor with target numbers of demographics that matched usual museum traffic, including:

Families with toddlers-elementary school aged children
Families with middle-school aged children
Families with high school-aged children
Millennial adults
Older adults

Results from this testing helped us re-prioritize content that we had previously flagged for the backlog. Our results also informed us that users weren’t sure how to summon the iOS keyboard on mobile, which led to the inclusion of instructional text in the chat window.

Quantitative methods can help improve chatbot accuracy

One of the most important drivers of a satisfying chatbot experience is how accurately the agent responds to users. Sampling user sessions from a live chatbot and tracking bot performance over time is essential. We used Dialogflow analytics to study how Máximo has responded to users since his soft launch. If you’ve used Google Analytics before, you’ll recognize that Dialogflow’s analytics tools include a conversation flow that looks very much like GA’s behavior flow. The conversation flow is helpful in exploring how users move through an ecosystem of agent intents.

In addition to studying insights from Dialogflow, we were actually able to sample our training data and rate our chatbot’s accuracy for ourselves. This is important as the natural language processing engine only reports a response “failure” if the agent triggers the fallback response, or the error response. Consequently, Máximo might still be incorrect if he doesn’t match a user’s question with the intent that was designed, even though Dialogflow marked this intent as “matched.”

To get a true sense of how well Máximo was doing, I took a sample of 50 user sessions at soft launch and 50 sessions at public launch, then compared the results after training Máximo. By organizing user session accuracy into Airtable, we were able to identify some valuable insights on his performance.

During the week of his soft launch, Máximo was consistently accurate across short, medium, and long conversations. We were surprised by this, hypothesizing that he might be more accurate for shorter conversations. After two weeks of training, we found that Máximo’s accuracy for long conversations had actually increased to 79%. It’s possible that over time and with larger sample size, Máximo would be more accurate with short conversations, since users don’t present him with as many questions. However, it’s also possible that users engaging with Máximo for short conversations may tend to ask more conventional questions, which are easier for him to answer.

Maximo Soft Launch Quick Stats:

Listen to your users

One of the great advantages of building chatbots is that you’re creating a direct line to your customer’s motivations. By incorporating UX research methods early and often in chatbot product design, your bot will be able to meet user expectations, engage with them accurately, and hopefully inspire them to engage with your brand more fully than they did before.

Illustration by Marie Wohl.