Making simple chatbots better

Making simple chatbots better

Making simple chatbots better

Written by Livio Pugliese

Many deployed chatbots are far from holding real conversations. But they too can be enabled for fluent dialog. This is how we do it.

When you consider chatbots these days you think of ChatGPT, Google Bard, Bing Chat, etc. These are all based on Large Language Models (LLM) and are able to answer pretty much any question that users may ask. But in fact, there are thousands of deployed chatbot that help users with customer service issues every day, which are way more limited. Many chatbots in use today have simple interfaces with predefined questions that users can select by typing keywords or clicking on buttons. They work, sometimes they work well, depending on the domain and the type of business they serve. But there is no question that using these chatbots for voice interactions would result in a bad customer experience. That is, unless we at Interactive Media intervene.

In this article I will describe how to enable even these chatbots for voice, with excellent customer experience.

Type of chatbots

According to a blog post by IBM, there are four categories of chatbots:

1. Menu or button-based chatbots. These are the simplest chatbots, very much deployed on web pages, that guide the user through explicit choices. They are equivalent to the traditional tone based IVR for voice and can work well if the domain is simple and the choices are clear. Obviously, if the user needs something that is not included in the menu, they can’t help and should refer the customer to a human representative.

2. Rules- and keywords-based chatbots. These chatbots let the customer ask questions in somewhat free format, then match keywords in the question with their knowledge base and present text that contains those words. They are in essence interactive FAQ reading tools. The problem here is that if the question is complex, these chatbot can’t answer it and should forward the interaction to a human.

3. AI-powered chatbots. These chatbots have Natural Language Understanding (NLU) and Natural Language Processing (NLP) capabilities and can handle dialogs with multiple exchanges with users. They are based on AI engines acting on a knowledge base that is tailored for the specific domain they are serving. This means that, users can ask any question about information in the domain, and the chatbot will understand the question and answer according to its knowledge. Sometimes the chatbot will ask a clarification question if the user’s initial question is ambiguous. However, if the user asks something outside the chatbot specific knowledge base, the chatbot will not be able to answer.

4. LLM-based, generative AI chatbots. Well, these chatbots have been all the rage in the past year or so. They are fluent and can answer any question, since their knowledge base is enormous, they can also create new answers by putting together information and text in a statistical way. They are the future, and the high-end present, of customer service, as one of their applications. But there are many others…

If you read about chatbots in magazines and specialized websites, you’ll find 90% of the articles about the 4th type, maybe still 10% about the 3rd type, and nothing about #1 and #2. But many #1 and #2 chatbots are still serving customers – and will for years to come: they are paid for, and 3rd and 4th type chatbot applications for customer service are expensive.

Where simple chatbots can do the job well

Sometimes you don’t need sophisticated AI chatbots to serve your customers well. If the customer service domain is simple, the questions asked are for the most part always the same, and you want users to feel at ease with clicking buttons or following menus, a simple, Type 1 chatbot will do just fine.

For example, banks have a limited number of services that they can perform using chatbots. For this reason, often the UI consists of a series of buttons and menus where the user selects one item and continues to the desired service.

Another example may be purchasing a train ticket. You want the user to specify the date of the trip, the origin station, the destination, and the class of travel. Easy peasy. A chatbot with web widgets and keywords matching for cities will do the job.

How to enable dialogs and voice with simple chatbots (without excessive costs)

Even a simple chatbot like a type 1 has access to a valuable knowledge base that would be great to access by voice. And indeed, Interactive Media offers a service to add the voice channels to chatbots easily and conveniently, called PhoneMyBot. But a type 1 chatbot needs precise inputs from users which are not easily translatable into a voice conversation. Here we can use an add-on to PhoneMyBot, implemented using Interactive Media’s conversational AI platform, MIND. In essence, we add a very simple type 3 application in front of the main type 1 chatbot, able to figure out the intent of the call through a natural language dialog.

Once the intent of the call is determined, if it’s covered by the type 1 chatbot, MIND crafts the question in the format that the type 1 chatbot expects and sends it forward. PhoneMyBot will then retrieve the answer from the chatbot and speak it to the user. If the intent of the call is not in the chatbot’s knowledge base, MIND can determine to send the call to a human agent instead. This works with one chatbot, or several. Sometimes companies deploy more than one specialized chatbot for different tasks and MIND acts as aggregator and selector for the most useful chatbot.

This figure shows the architecture: PhoneMyBot and MIND exchange information initially, before connecting the caller to the chatbot.

For Interactive Media, this is a simple call steering application. We have developed many of these over the years and we can implement one fast and with contained costs. It’s a very cons-effective solution to add the voice channel to even a simple chatbot.

We would love to put more meat on the bone and talk to you about this solution: please contact Interactive Media at info@imnet.com  or click the button below.

Other Articles

Other Articles

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us

Subscription

Receive our exclusive content:

LLM-based chatbots and how to make them more reliable

LLM-based chatbots and how to make them more reliable

LLM-based chatbots and how to make them more reliable

Written by Livio Pugliese

ChatGPT and its siblings are all the rage in customer service chatbots. This is fascinating and terrifying. How do we take the terror out of the equation?

In the past year or so we witnessed an explosion of chatbots based on Large Language Models (LLM). The adoption of LLM technology in conversational AI is truly revolutionizing the field, with a user experience that is better by leaps and bounds than what was coming before. They hold immense promise for applications as varied as customer service, simultaneous translation, general information delivery… anything that has to do with connecting the public with data through natural language, either text- or voice-based.

But not all is bright and beautiful in AI land. There are also significant challenges. In this article I give my take on some of the most common challenges and suggest possible ways to overcome them.

Large Language Models predictability

LLMs are an enormous collection of information fragments, that the AI algorithm connects in a statistical way. In order to provide answers, the algorithm takes the most probable path connecting a certain fragment to another, starting from the question, and considering the question’s context: who is asking it, what the setting for the question is… This process does not always lead to a predictable outcome, as it always happens when statistics is involved.

For some use cases, this would not be a be an unsurmountable issue if answers from LLM chatbots vary to some degree. A conversation summary for instance can be rendered in many different ways, all pretty accurate. Or a product can be recommended by an LLM based algorithm with different words and sentences.

But there are many other applications, more related to core customer service capabilities, where precision is paramount and slip-off could be costly: anytime there are legal implications for instance, or when the chatbot is used as an initial screen for someone to get a loan. In these cases, while the potential of LLM-based algorithms is clear, the risk must be mitigated.

Large Language Models sensitivity to input changes

Language is fluid, and there are typically many ways to say the same thing. English, speakers, like all humans, rely on turn of phrases, metaphors, and synonyms to express themselves. Not everyone will use the same words to ask for the same thing, also depending on the speaker’s education, frame of mind, age, location… As George Bernard Shaw said: “England and America are two countries separated by a common language” – well, even if the language is nominally the same, if the listener is an LLM-based chatbot, the same request could take vastly different meanings depending on the words used.

As an extreme example, suppose that one speaker says: “What would it take to cover all the bases and hit it out of the ballpark?”

Another speaker says: “How can we eliminate risk and be very successful?”

In American English, a human would understand these two sentences to say the same thing. But while speaking to a chatbot the result can be very different, depending on if the chatbot catches on the metaphors.

A solution: sanitizing the input

Sending to an LLM-based chatbot always the same words for each question would solve quite a bit of the precision problem. This is impossible to do in a free domain, where users can ask virtually any question. But it is possible, even relatively easy, in a well-defined domain like a customer service one. What we propose is to front the LLM chatbot with a call steering system, which uses natural language to determine the user’s intent, possibly with a dialog made up of several exchanges. Once it determines the intent, the call steering system send the chatbot always the same worded question for that particular intent, which will have been vetted and tested to produce the best result.

Interactive Media has a long experience with conversational applications and a standard structure and process to create applications that perform call steering. So, we have created an all-in-one platform to help users interact with LLM chatbots in a customer service environment. We integrate PhoneMyBot, Interactive Media’s service to provide the voice channel to any chatbot with our call steering platform, MIND, and the LLM chatbot that is actually doing the heavy customer service part. When users call in they reach MIND, which asks them questions about what they need, possibly in more than one exchange, and classifies their answers to one of the intents available in the domain. Then it sends back the standard question for the intent, which PhoneMyBot forwards to the chatbot, receiving the answer and relaying it to the user.

This technique increases the quality and precision of LLM chatbots, making them more suitable for work in a customer service environment.

We would love to put more meat on the bone and talk to you about this solution: please contact Interactive Media at info@imnet.com or click the button below.

Other Articles

Other Articles

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us

Subscription

Receive our exclusive content:

Multimodal interactions: are they breaking through?

Multimodal interactions: are they breaking through?

Multimodal interactions: are they breaking through?

Written by Livio Pugliese

Last week I watched a webinar and demo by a company providing tools and solutions for conversational customer service. Interactive Media, where I work, is in the same sector and I wanted to scoop out a competitor, see what they have and how they are presenting their solutions to the market. Everyone does this of course; I don’t feel bad about it in the slightest.

This company was presenting with great emphasis a solution that allows a caller to synchronize a voice call (on a smartphone) to a visual IVR component. In essence, when users call, they are offered the option to receive a text message that contains a link to a personalized web application. The web app provides information about what the call is about, and you can navigate it by clicking on the pages, or by voice.

In the demo, this provides a great experience: all the information regarding the case is at the user’s fingertips, and it’s much easier to insert additional data. For instance, think about how hard it is to dictate an email address to an agent (let alone a virtual assistant!). With this type of visual IVR, the user can simply type it into a box, a much more efficient and error-free process.

This particular solution is not simple: you have to use conversational AI to understand what the caller says, being able to identify the intent and navigate precisely by voice, populate the web pages to service the intent on the fly, create and send the link by text message, and, most difficult of all, synchronize the voice and web parts of the session. Well done!

But seeing this demo left me rather surprised: you see, I was doing exactly the same demo with Interactive Media software 5 years ago (and I have the videos to prove it). This made me realize two things. One is that the Interactive Media people and technology are kick-ass, well ahead of most competition. But the other, considering that I was not able to sell this solution, is that sometimes focusing on increasingly sophisticated, “frictionless” services does not pay.

That demo is fantastic, but how many similar applications have you seen in real life? And, based on your real-life experience, how often would you need something similar? In essence, it seems to me that we as an industry are targeting increasingly complex software solutions to an ever-decreasing number of users.

The vast majority of users hope to never have to contact customer service. But when they do it is often for a simple question, one that does not in general need this type of infrastructure. Normally, users can search the company web site, chat with an agent or a chatbot, call in. A good percentage of people calling in does so because they are not comfortable with other channels, either because they are on the move and voice is the best way to interact, or because not everyone is familiar with web technology. Again, as an industry, we tech people tend to project our own experience onto everyone; folks, this is not the entire world!

For people who call in only with voice, there is PhoneMyBot, the Interactive Media service to provide voice channels to chatbots with a no-code, ready to roll approach. Companies that have deployed chatbots but have no conversational AI on the voice channels can use PhoneMyBot to enable telephone conversations with their existing self-service app. Conversational AI vendors who only support textual and web channel can use PhoneMyBot to offer voice channels to their customers. PhoneMyBot targets simpler self-service voice solutions for the vast majority of users.

But if you really need a synchronized voice and Visual IVR application for flashy service to your most tech-savvy customers, why don’t you also call Interactive Media? After all, we have a 5-year advantage.

Please go ahead and try PhoneMyBot for free: contact Interactive Media at info@imnet.com or click the button below.

Other Articles

Other Articles

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us

Subscription

Receive our exclusive content:

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Written by Livio Pugliese

Talking with ChatGPT over the phone is cool. But we can also make it useful.

In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It landed with the energy of the Chicxulub asteroid – the one that killed the dinosaurs 65 million-odd years ago in the Yucatan sea. That asteroid generated a mile-high tsunami, almost as high as ChatGPT’s. One should go slow with the metaphors though: are ChatGPT and its peers going t kill …[gasp]… us? As an incorrigible optimist and I don’t think so, but the jury is out according to much of the press.

But as we all know, even if a device could cause the end of the world as we know it, if it’s new and shiny people will use it. So, after using ChatGPT to write poems about pickleball or essays on Tibetan literature, the tech community is trying to understand what it can do for real business.

For instance, we at Interactive Media have integrated ChatGPT with PhoneMyBot, our service to provide voice channels to chatbots with a no-code, ready to roll approach. Through PhoneMyBot it is now possible to make a phone call to ChatGPT, ask questions and listen to its answers. This is still only a demo, but in the process we have developed some useful ideas on whether and how ChatGPT may work for what we do normally, which is providing tools to companies to service their customers.

Let’s say it immediately: without personalization, ChatGPT is not sufficient to implement a customer service voicebot. The domain is too wide: it is literally the whole Internet. This means that ChatGPT cannot use its normal language model to answer pointed questions on – say – your bank balance today.

To be sure, in customer service there is sometimes a need for general-purpose conversation. In our experience, users sometimes go out on a tangent and ask bots all sorts of questions. For instance: where do you live? how old are you? can I see you? how much are you paid?… ChatGPT certainly has good answers for all these questions, and it would be useful in side conversations. ChatGPT is also language-independent: in essence it can tell what language a user is speaking and answer in the same language. This is a stunning capability and it makes it so much easier to use ChatGPT.

However, it is possible to “fine-tune” ChatGPT for specific domains, adding dozens, hundreds or thousands of examples of specialized prompt-completion pairs that define a separate domain, identified by its own name and id. This domain goes to augment the general-purpose model and allows the chatbot to answer pointed questions. At Interactive Media, we are experimenting with fine-tuning one of the available general purposes models and we can certify that it works: if ChatGPT has the necessary information, not only it answers to precise questions about the specific domain effectively, but also in a pleasant and precise way.

Often, however, customers’ questions may be ambiguous and hard to characterize. In this case ChatGPT can’t be allowed to answer immediately as what it says would be vague or incorrect. But Interactive Media’s conversational AI platform, MIND, placed in front of ChatGPT, can easily be configured to deal with these cases. MIND can identify the real intent of the caller through an initial dialog, and only after that forward the “real” question to ChatGPT. This makes a huge difference in the conversation outcome.

There’s another snag though, because to be useful you also need to access actual data related to the people’s requests. ChatGPT of course does not perform the appropriate database queries into company databases or CRMs. To counter this at Interactive Media we have developed a generalized method to access database data and to insert this data into ChatGPT’s answers. Of course, depending on the data this has to be done on a case-by=case basis, so call us to discuss!

Please go ahead and try PhoneMyBot’s connection with ChatGPT: contact Interactive Media at info@imnet.com or click the button below.

Other Articles

Other Articles

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us

Subscription

Receive our exclusive content:

The future of intelligent voice

The future of intelligent voice

The future of intelligent voice

Written by Livio Pugliese

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do?

Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first Echo product. Soon Google and Apple followed suit, with their Google Assistant and Siri technologies.

While Siri is embedded into Apple hardware as a software feature, both Amazon and Google produced and actively started selling the hardware to support their speech software: a line of smart speakers with sensitive microphones that listen for people uttering a key phrase to start detecting what they say. The rise of these devices has been meteoric. They were cheap, convenient, and they largely supplanted both radio and stereo systems in the home, by streaming content controlled by voice. They were sold by the tens of millions, both in the US and around the world: according to a Comscore report, in 2021 almost half of the US internet users owned at least one of them.

Most people in the US are familiar with Alexa: she listens to the sounds around her and when she hears her name she springs into action. This means recording the sentence that comes after the keyword and sending the audio to the Amazon Cloud for recognition, receiving the answer and playing it back. (Supposedly, nothing is recorded outside of the keyword-initiated transaction of course). The same is true for the Google version; hey Google is both longer and less personal.

As an aside, I know someone who’s name is Alexa – and it was her name well before Amazon released the first Echo: I wonder how she feels being called upon doing the biddings of countless people…

The problem with the status quo: lack of revenues

As it often happens in the tech industry, for smart speakers the technology leapt ahead of the profitable use cases. Yes, people were and are using their smart speakers often, but mostly to ask general questions, check on the weather and ask for music streaming. The vendors figured that, with time and as adoption increased, they could come up with a revenue model that would support the business, but so far no-one has managed it.

Of course, there are ads within music streaming if the owner does not subscribe to a music service, but few and far between not to degrade the experience too much. And a $10 a month music subscription is not a panacea to support providing and maintaining the infrastructure for the rest of the service.

The most profitable use case that was hoped for at the beginning, shopping by voice, never took off: people are understandably weary of providing personal information, credit card numbers, etc. to the Cloud through yet another channel, and by definition any shopping done through a smart speaker is “sight unseen”.

So, in the past few months with the changing economy and the realization of how difficult it is to really monetize smart speakers, there has been a definite retrenching by both Amazon and Google. Amazon laid off a good portion of the Alexa development team, Google reportedly greatly reduced funding for the Assistant line and – this is very recent news – Alphabet is laying off as much as 12000 workers in January 2023. One can imagine that the worst-performing divisions would be most affected.

Smart speakers are in trouble.

Voice apps on smart speakers

However, many companies and organizations developed apps to integrate with Alexa and Google Assistant, through the respective APIs. In this case, the smart speakers act simply as a speech transcription and rendering interface: once the app is active, they transcribe what the user says and send the text to the external service, take the text that service sends back and render it into voice for the user to hear.

Amazon calls these apps Skills; Google calls them Actions. Either way, there are hundred of thousands of them. They can be launched with a special prompt: “Alexa, open [skill name]” or “Hey Google, talk to [action name]”. While many apps have not been successful and have minimal use from this channel, others are important or even essential.

What happens to these apps if the smart speaker vendors limit and then terminate their offer? Some are merely activating an additional channel to a wider service, and presumably would not be impacted too severely. But others were developed specifically to take advantage of the voice channel offered for free by smart speakers. For instance, I recently talked with the developer of a skill for blind people, who use their voice to access information that others get from screens. 

Skills and Actions developers are seriously worried.

 On the other hand, what other conduits are there for two-way, intelligent voice applications in the house? Well, the one we’ve always had: the telephone (no matter if fixed or mobile). Granted, calling an app over the phone is a little more complex that simply saying “Hey Google”, but everyone knows how to use a phone and the technology could not be more tried-and-true. The problem then is connecting existing intelligent applications to the telephone network.

PhoneMyBot as the conduit for voice apps

Interactive Media offers PhoneMyBot, a service born to expand the channels available to chatbots to include voice channels. It performs the same functions that are done by intelligent speakers for their apps, transcribing the users’ speech and sending it to the connected application. Then it receives text in return and transforms it into speech, piping it into the voice network. PhoneMyBot is natively integrated into the telephone network and exposes to apps an API equivalent to the ones from Alexa and Google Assistant. In addition, PhoneMyBot integrates with a number of contact center suites to transfer the call to a human agent if necessary.

What makes PhoneMyBot appealing to small organizations that may become stranded if intelligent speakers decline too much? It’s extremely easy to try: an initial trial period is free, and commercial traffic is billed at a (low) per-minute rate independently from the traffic volume. This makes it ideal for low-budget, pay-as-you-go services. The administration is simple and powerful: a single portal provides access to all the traffic data and stats. And its robust, with an infrastructure built on telco-grade software, managing millions of calls per month.

Go ahead, try it! Click the button below.

Other Articles

Other Articles

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us

Subscription

Receive our exclusive content: