Teaching clueless chatbots the art of conversation

The gap between human expectations of chatbots and their actual capabilities means this field has a lot of development still to come.

by Jenny Chan 陳詠欣 | 10/06/2017

Our reporter enters a difficult chat with Oscar, the Air New Zealand chatbot

Air New Zealand may have made some of the most memorably wacky in-flight safety videos in the industry, but its first foray into chatbots, via 'Bravo Oscar Tango' – Oscar for short – doesn't seem quite such a success.

When questioned by this reporter (as seen in the screenshots above), Oscar was unable to answer basic questions about flight bookings unless they were asked in an exact scripted phrase. Oscar also couldn't recognise abbreviated spellings of city names, 'think ahead' to provide a few options at a time, or reuse information previously entered in the same chat, despite supposedly being a 'learning' bot.

Oscar is not the only bot in the market showing obvious flaws. But the limitations of this technology don't seem to be discouraging marketers from investing.

Chatbots have come a long way since Eliza, an application built at MIT in 1966 that could carry out a text dialogue with human interlocutors. They are now on the up and up, proliferating on messaging apps like Facebook Messenger, WeChat, Kik and Line, and predicted to have a transformational impact on the digital workplace in the next five to 10 years, according to Gartner's 'Hype Cycl e', which looks at the bell curve of emerging technologies.

The hype is certainly building: since Facebook opened up its Messenger platform in April 2016 to allow developers to build chatbots, it has prompted a lot of brands to start thinking about chat as their fourth platform after desktop, mobile and app, according to Travis Johnson, former global president of Ansible.

In China meanwhile, WeChat chatbot APIs have now developed to the point where they can be quickly and sustainably implemented by brands, Kevin Gentle, director and lead strategist at Madjor told Campaign Asia. Add to this the fact that consumers are not yet overwhelmed with chatbot marketing, meaning potentially high conversion rates, and the whole field looks like an attractive one.

Bots must be respectful conversationalists - particularly in the health sphere

Spend a few minutes with your average chatbot, however, and you will quickly realise that they cannot sustain a real conversation. Yeong Yee, senior consultant at Ogilvy & Mather Singapore, wrote as much in an opinion column for Campaign earlier this year: "Chatbots are perennially question-and-answer-based. Ask a question or provide a trigger, and the chatbot gives an answer. Often, conversations do not extend beyond this scripted give-and-take."

But customers still want simplistic chatbots to 'respect' them, no matter how unpredictable their behaviour. Developing chatbots that somehow "know" how customers feel is crucial, said Gentle at a recent Shanghai event. Brands can do this by using language cues to 'sense' emotions, then respond accordingly. If a user expresses anger, for example, an emotionally-intelligent chatbot would try to resolve the situation through further questioning or pass the baton to a human customer service agent, he said.

Most chatbots do not elicit this level of understanding, leading to potential consumer disappointment. "The conversational manner of the user interface lulls us into imagining we are talking with 'someone'," says Yee. "We expect it to display empathy. Or maybe, regardless of how the chatbot is designed, we want to break them to test the limits of their ability to sustain a human facade."

What is a bot?

At its simplest, a chatbot is a computer programme that mimics conversations with people and is used as a means of automatically dispensing information. It works by picking up keywords that, when entered by users in a graphical user interface (GUI, pronounced 'gooey', also known as a human-computer interface), trigger automated replies.

All chatbots are bots, but not all bots are chatbots, clarifies Jeffrey Broer, founder of bot-development platform Recime.io. He identifies three types of bots: task bots, option bots and chatbots. While all three are good for aggregating data to qualify leads and optimise conversions, chatbots are "the new apps," says Broer, and conversation "the new interface".

The need to layer emotional intelligence onto the chatbot interface is even more acute in the medical and pharmaceutical fields, where customer emotions tend to run higher than usual. If properly implemented, however, this area is also one that also stands to benefit the most from chatbots.

Pharma firm Boehringer Ingelheim (BI), for example, developed a Facebook Messenger chatbot called Tabatha in May this year to allow people with asthma to engage with the brand in a more private environment, based on the insight that sufferers tend to accept asthma attacks as the norm, meaning many don't bother seeking treatment.

Previously, since the campaign first launched in 2014, the brand has been driving users to an interactive website where they could self-identify as symptomatic by responding to the Royal College of Physician’s ‘3 Questions’ (RCP3Qs).

Switching an online questionnaire for a conversation with a Facebook chatbot was a timely move, says Valerie Hargreaves, senior director of healthcare in Cohn & Wolfe (BI's agency), given both the increasing popularity of ‘dark social’ - private social channels where conversations are not visible to the wider public (900 million people a month use Facebook Messenger, according to eMarketer) - and people's apparent willingness to engage with AI about their health.

A chatbot can collect richer data and enable deeper measurement of campaign success. In BI's case, this included capturing consumers' intentions to act as a result of the information received for the first time, adds Hargreaves, who said Tabatha generated over double the number of engagements on World Asthma Day in 2017 compared to 2016, without Tabatha.

After privacy, proactiveness is another vital attribute of a successful healthcare chatbot. The Hong Kong branch of French insurance firm AXA, for example, created an 'intelligent' personal coachbot called Alex as part of the brand's wellness coaching app, which claims to seamlessly connect with over 150 wearable devices including Fitbit and Apple Health.

Apart from answering broad questions on wellness goals and objectives, Alex can apparently tackle more abstract issues relating to gym routines, healthy food recommendations or even what sporting activities are available in Hong Kong. It can even initiate suggestions to users based on observations of their behaviour recorded on other devices (such as last night's sleep pattern, missed schedules, or the time of day or week), according to Brady Ambler, strategy director with Publicis Worldwide Hong Kong, who worked on the campaign.

A final consideration that's "critical" for healthcare bots is the design of a really detailed response framework that can anticipate sensitive situations, says Ambler. This means ensuring the chatbot knows exactly when to transfer the conversation to a human colleague, a protocol many bots lack. It's part of the reason that once the novelty wears off, many users will regard chatbots as an "useless" brand interaction point, remarks Forrester Research senior analyst Xiaofeng Wang.

Training a bot is "vastly different from just content marketing on Facebook"

Despite their promising future, chatbots are still at a very early stage of development, says Wang, and most of today’s successful chatbots are driven more by keywords than by machine learning (see graph by Forrester, below). They can deliver "quick-hit information" and "shortcuts" to content such as tutorials, but not context- or intent-based personalisation or advice about complex products such as life insurance.

Fundamentally, humans use a different language than computers do. After a person talks to a chatbot, their words must be translated into machine grammar, explains Forrester's vice president and principal analyst, Julie Ask. This means turning words or characters into SQL (Structured Query Language), but there’s also the human intent challenge: interpreting “that sounds good” to a “yes” in programming language.

"You need a human to train the system very early on. Machines also take a long time to learn," says Ask, who predicts that even Facebook Messenger chatbots are on a 10-year roadmap. "The more difficult bits are [working out how] the information architecture of a chatbot that must involve a security layer of third-party platforms [such as Facebook or WeChat], must also be integrated with CRM and payment systems of the brand itself."

This is vastly different from just doing content marketing on Facebook, she asserts.

Making a chatbot truly conversational involves incorporating a lot of NLP (natural language processing) training, comments Recime's Broer. It's about teaching chatbots to have 'chat IQ' so they can recognise ad hoc keywords (not just stringent phrases in their entirety) in natural, freeflowing conversations. A chatbot can even tell a joke if it knows a user’s interest graph well enough.

Functionality and feasibility are key in a chatbot's design

OCBC is still the only case Wang has seen that has been able to link business results to chatbot usage: the Singapore bank closed S$33 million of new loans in six months via its Emma chatbot. But the fact that many chatbots disappoint today doesn’t mean that marketers should give up and stop experimenting. For now, users still accept low levels of 'correctness' from chatbots; even a failure rate of 70 percent can still be acceptable and enjoyable for users, says Broer. In contrast, a failure rate of 70 percent in an AI tool for self-driving cars or cancer detection could very well cost lives.

"As part of the chatbot developer community, we have to be cautious not to just put bots out there because we can, technically. It's our responsibility to prevent 'bot fatigue' that may lead to a 'chatbot winter'," warns Broer.

When planned and executed well, it is possible to deploy chatbots to deliver business value - with one major condition: brands have to define a very clear use case. From a marketing perspective, this means they must at the very least improve engagement and save consumers' time. There is already plenty of company information on official websites and apps, for instance, but chatbots can help consumers screen out irrelevant facts.

Ogilvy's Yee emphasises that chatbots should be designed to serve a brand purpose, and knowing the purpose means knowing which conversation threads are essential, and which might be extraneous to the chatbot design. A logistics company helping customers to track deliveries will need to provide access to real-time package-location data, for example. If this is not technically feasible for a chatbot, it should not be designed within the interface, he says.

^{Adding humour can be an effective tool: one example is Bus Uncle, which helps locals in Singapore find bus schedules in a conversation peppered with 'Ah Beng' jokes in Singlish}

There is no “one natural language understanding (NLU) AI that rules them all” as all chatbots are based on a use case and an “intent”, echoes Broer. "A chatbot about choosing the right present for Valentine's Day would not know the weather in Beijing, because it doesn’t have that data source, it was not trained for that because the intention by the maker wasn’t there."

It’s up to the bot developer to set the expectations right, but users should also be made aware how a bot works and what its intent is, especially with 'general purpose' chatbots like Xiaoice or the English equivalent Tay (the Microsoft bot that went wrong very quickly).

Chatbots that are too generic simply won't work, says Forrester's Wang. Another Singapore bank, POSB, rolled out a chatbot to handle general inquiries about its products and services, touting itself as the first bank in the region to launch one. This tactic may generate some PR value, continues Wang, but all too often firms fail to clearly define their chatbot’s purpose and communicate that to users. Any PR gains will evaporate when customers end up confused or frustrated after asking questions that far exceed the chatbot’s abilities.

"As a brand, it is not about making a chatbot to pass the Turing test—an assessment of a machine’s ability to exhibit intelligent behaviour," argues Ogilvy's Yee. "Functionality takes precedence over conversation. Chatbots need to be effective at conversation commerce, rather than making human conversation comfortable."

However, once chatbots are capable of inferring user context and intent in the near future, marketing use cases will grow in complexity and sophistication, thinks Wang. Think minimum viable product (MVP) and iterate as you learn, advises Ask. "Building a chatbot with a conversational interface is even less one-and-done than an app."

Services to train chatbots to leverage deep learning by detecting emotion in a user utterance are currently available via open APIs from most of the big players such as IBM Watson, says Rory McElearney, developer and chatbot UX expert at Filament. "The reason they are not widely used is down to the inability of chatbot creators to clearly see how they would be of value and how exactly the chat flow will adapt based on user emotion, so the hurdle is a conversation-UX one, not a deep learning one."

As chatbots evolve, the best will depend heavily both on AI and on the humans who teach the machine. Says Ambler: "I believe we’ll see our first super chatbot within the next 5 years. Something that makes the chatbots of today seem cute and antiquated."

CASE STUDY: Lost in translation in Hong Kong

Replicating the intricacies of human language is no small task, and having a clueless chatbot master the nuances of varied speech patterns, slang, sarcasm and insinuation common in Hong Kong, a city already fragmented with multiple marketing languages, is even more difficult.

Kinni Mew, CEO of NLP technology provider Mindlayer, says one of his projects almost got killed because the company committed some NLP mistakes in Cantonese. Speaking at a recent startup event, he compiles these rules for chatbot marketers in Hong Kong:

You'll fail if you think your dataset is ready.

Many brands, especially banks and insurers in Hong Kong, do not yet have usable data to train their chatbots. Audio archives of phone recordings with customers do not count as data, for example. "We rely on text to process syntax. Banks and insurers, both in highly-regulated industries, cannot use speech-to-text conversion," he says.

You'll fail if you think users will follow chatbot instructions.

Say a chatbot issues the instruction that a user should enquire only about policy details or promotion mechanics. Fine, but Hong Kong users will always ask more than the scope of a chatbot. "For one client, initially we typed everything using a FAQ format, but we discovered that a lot of customers are asking other questions outside of the agreed FAQ. There needs to be a lot of iteration. You have to release an updated chatbot version every single day so people can train the chatbot," adds Mew.

You'll fail if you treat Cantonese as English.

When Cantonese is translated into English, it becomes meaningless. "Garbage in, garbage out, we say. Brands can’t just purely translate user input. You need to have data from scratch to train your chatbot," Mew says.

You'll fail if you expect users to follow a defined language.

Hong Kong consumers usually type in broken sentences, mixing languages. Chatbots judge things in a binary fashion: it's either this or that, meaning untrained bots are unable to handle 'Hongklish' amalgamations, typos or emoticons. Probability analysis helps determine fuzzy logic for every single word or character, and deep learning helps make a word coherent enough to look beyond spelling errors, says Mew, but there is still some way to go.

You'll fail if your chatbot doesn't have a strong use case.

When your chatbot sucks, Hong Kongers will drop out immediately, so any sales are just a pipe dream.

You'll fail if you measure the wrong metrics.

"Wrong metrics" include things like asking customers to rate the conversation after a chatbot session. A plain numbered grade does not reveal the customer’s pain points, such as whether they like the chatting experience because they don’t have to call the bank's call centre, but maybe they still haven’t solved their core problem of a wrong account balance or a lost credit card. Brands need to have very specific questionnaires when they ask Hong Kongers for feedback.

You'll fail if you launch your chatbot before it is ready.

The term ‘ready’ means different things to different marketers in Hong Kong. To some, a chatbot may be "99 percent ready" based on accuracy testing with a 100-person sample. But this is not complete, Mew emphasises. A template of test scenarios should be built everyday and fed with fresh data to continously train the chatbot, he says. "The best tester is your actual audience, of course. So you have to strike a balance in terms of launch date."

You'll fail if you don't think 'small talk' is important.

Brands must anticipate ‘stupid' questions by users, like "who is your boss?" or "are you a boy or a girl?". Also, because people know they are not talking to a real human, they are cavalier about the consequences of improper etiquette. People may deliberately swear or say something inappropriate to the chatbot. "For this reason, web scraping (from Quora, Facebook, online forums, etc) to respond is very dangerous in case you scrape foul language or negative political opinions from these internet sites in order to handle these", warns Mew.