Tips on building chatbot experiences for delightful conversation

A Dinosaur Chatbot for the Field Museum

A conversational chatbot experience should be many things: delightful, roundabout, surprising, and organic. However, it’s probably safe to say that most of us have spoken with a chatbot agent or voice user interface having experienced none of these things. For an agent that’s meant to be exploratory and conversational, if it doesn’t sound human, then it’s just not...fun.

So when our partner the Field Museum in Chicago approached us to help build a chat personality for their newest—and biggest—dinosaur addition, Máximo the Titanosaur, we saw an opportunity to create a remarkable conversational experience. With partner Leo Burnett’s visual and messaging support, the Field Museum did an amazing job creating a truly lovable character for Máximo; his voice truly evolved through the Field’s social media work (congrats on that Webby award, Field Museum!). I worked directly with the museum mapping conversation logic, testing and translating user needs into chatbot intents, setting up the chatbot management system, and creating the chat interfaces that Máximo uses to converse with users.

Here are some of the most impactful things I learned bringing Máximo back to life.

Framing the conversation reduces user frustration

At their core, chatbot conversations can be relatively simple: they consist of a user query, otherwise known as a “training phrase,” which triggers an “intent” response from the chatbot agent. You can think of intents as the things your chatbot can do, or similarly, the questions it can answer. An in-depth look at how to identify, test, and prioritize your bot’s intents based on user needs is outside the scope of this article; however, educating users on what they should expect of a bot absolutely is.

The goal for all chatbot conversations is to maximize the rate at which the bot correctly detects a user's intent. For conversational bots, there’s no specific purchase path to help guide the conversation like there would be for a transactional bot. As a result, conversational chatbot experiences end up becoming a subtle balance of leading users through a maze of desired intents without making it seem like you’re dictating the conversation. It’s important to make sure your bot serves those intents by exposing what it can do up front. It’s also important to ensure that it’s appropriately listening for the ways that users make those intents known.

In this earlier iteration, Máximo’s framing is unclear. As a user, I’m not sure what Máximo knows about from his welcome greeting alone. Giving the user the bounds of what your agent knows about sets the tone for the entire conversation.

In initial user testing, we found that we didn’t frame Máximo’s conversational abilities quite well enough. As a result, we got a lot of left-field questions that resulted in unmatched, or “fallback” responses. Once we framed the conversation with what Máximo could and couldn’t do, we got user queries that were much more aligned with what we wanted them to ask.

Grammatical listeners lead to organic conversation

I wanted Máximo to be able to handle as much spontaneous human conversation as possible. Sure, he needed to be able to handle their Cretaceous knowledge areas with the wizened knowledge of a 101-million-year-old titanosaur, but I wanted to make conversing with this gentle giant as delightful and organic as possible, even when dealing with things they didn’t know much about. This included situations like:

  • Ambiguous pronouns in follow-up responses
  • Organic user reactions to Máximo responses
  • The range of random requests and assertions that users might make to a dinosaur

As you might imagine, this quickly became difficult to manage. I discovered early on that I could only go so far with framing. Random user queries were still there: delightful, frustrating, and impossible to keep up with. In the beginning, I directed that randomness to fallback answers when the natural language processing (NLP) engine did not. With every user testing session, however, I grew more and more frustrated that Máximo could only respond with a variant of the same fallback response.

After initial user testing, I hypothesized that the key to managing this randomness was to create grammatical listeners. Using DialogFlow as our bot management system, I designed these to pick up the “gray area” of clauses that were a) common patterns of conversation in English and b) some of the most frequently used sentence constructions found in our user testing sessions. Here are just a few examples:

“Why” clauses
  • The “why” clause is designed to detect queries like: “Why is the sky blue?” or “Why do you think SUE likes Jeff Goldblum so much?” It’s not intended to detect a phrase like “Why are you so big?” since we have a specific response for that.

Máximo’s personality helps him deflect a “why” clause question with ease. He really, really, really likes eating ferns. I would definitely ask him about it sometime.

“I think” clauses
  • The “I think” clause is designed to detect a variety of user opinions, such as, “I think that is a stupid idea,” or “I think the non-canon Old Republic content in the Star Wars Expanded Universe is superior to anything post episode VI,” but not “I think you are an awesome friend,” since we also have a specific intent for that.
“Surprise-related” clauses
  • For these, I used NLP powered sentiment analysis to group like phrases of positive user emotion. I then provided on-brand responses to users expressing surprise to Máximo’s earlier statements.

The user phrase “oh wow” helps to trigger Máximo’s “surprise” intent.

The power of the listeners allows for precise responses while being delightfully flexible. If a more specific instance of this clause exists, it should match that intent. But if it doesn’t, it gets swept up by the listener.

Linguistic ambiguity creates conversational forgiveness

Just as keywords power specific ad groups in a sound SEM strategy, agent responses should be as comprehensive as the phrases that power them. Not only should your agent provide entertaining responses to a few niche phrases, but they should also be equally as entertaining when responding to a grab bag of user queries with an on-brand, ambiguous response.

For example, it’s not realistic to expect Máximo to have a response to every single Chicago attraction that he might enjoy. Admitting that Máximo hasn’t had a chance to try everything Chicago has to offer isn’t a cop out - it’s the truth! (Also, he is certainly far too large to fit in most of the doors.)

In this way, you can use your bot’s character to respond as they would while not expecting them to be more perfect than a human. Do YOU know everything people ask of you? Have grace with your agents, and allow them the ambiguity in responses to do so. User testing revealed that users generally didn’t abandon conversations when Máximo was ambiguous with them; in fact, a clever, ambiguous response usually prolonged the length of the conversation.

It’s okay for your agent to admit defeat! If users still see something of their own question in the agent’s response, they’ll see it less as a failure and more as an aspect of the agent’s personality.

“Mistakes are design opportunities”

My high school art teacher framed this quote on the wall. I probably think about it at least 3-4 times every day. Your bot will use your fallback response even more frequently. There is no avoiding this.

Take heart, though. An unmatched intent is an opportunity for you based on how you word your agent’s fallback responses. These are arguably some of your most powerful responses, so it is important to consider the following when crafting them:

  • Avoid making them dead ends
  • Ensure you have many varieties
  • Use them to pivot users into new threads of conversation

Avoid seeing fallbacks as failures and leverage them as opportunities to keep users engaged.

Máximo doesn’t have a follow-up for this query, but mentioning a fun fact about when Stegosaurus and T. rex lived keeps this conversation from going extinct.

You can learn a lot by reading, but you learn the most from users

None of the above insights would have been possible without frequent user testing. Whether testing was conducted in-person at the museum, remotely, or in-office, the sessions included diverse audiences that helped us assess the content of user responses and the usability of the web and SMS chatbot interfaces. Without the testing we conducted, we would not have anticipated the huge effect that proper framing would have on conversation. Similarly, I would not have anticipated the need for grammatical listeners without studying the patterns from these user conversations.

Whether you’re designing a conversational agent, a checkout flow, or a landing page for a new product launch, you can’t expect to know your users without listening to them. Test your agent thoroughly up front so that it can be the best conversational experience for your users, and the best user sentiment gathering tool for you. If your chatbot can listen to your users better than you can, it won’t just talk with them, it’ll connect in a way that will delight them and provide you with priceless insight.

Text Máximo at 70221 or send him a message online at fieldmuseum.org/máximo.