Giving chatbots the gift of voice

July 14, 2020

A battle of the bots

Chatbots are everywhere. According to Gartner, at the moment of writing between 1500 and 2000 companies worldwide have in the past couple of years developed a chatbot platform that they offer their customers as the base for applications. Of course, not all of them are good and the one-shot question-answer bots abound. But many are able to sustain a real dialog and use a well-designed knowledge base and semantic / learning infrastructure based on AI to really recognize and understand what people type, keeping the context, and following up with more questions if the initial meaning is unclear.

But even these “good chatbots” are mostly text-based. While chat usually refers to text (embedded in a website, over a dedicated chat platform like Facebook Messenger or WhatsApp or even via text messages), it’s important to recognize that voice — and in particular voice over the telephone network — is still a big part of how customers interact with businesses.

Voice-enabled bots are still the exception and not the rule. But the time is fast approaching when omni-bots (which can manage equally well voice and text conversations) are the ones that will emerge victorious from this “battle of the bots”. This in turn will be a factor in deciding the winners in the inevitable shake-out that the conversational interactions industry will experience in the next couple of years.

Other Articles

Why voice?

People like to text and type short messages from their devices. It is a very convenient way to communicate on most occasions, but as a habit it’s fairly new, sparked by the availability of devices that let people text and type to communicate. If we focus on customer service, until 10 years ago it was only over voice. There many more channels on which to contact companies now, but telephone calls are still the most common way to do that, and certainly the most satisfying.

After all, if you get upset about a service or a product, it’s difficult to yell while typing: you could use ALL CAPS but, somehow, it’s not the same. Putting jokes aside, voice is what people use when they need to have real-time feedback, and voice enables people to convey information much faster than chatting: if you type fast and well you can put in 40 words per minute. But an average talker will speak 150 words in the same time (even excluding from the stats the end of pharmaceutical commercials). Finally, when all else fails people pick up the phone and call, so one could say that while voice calls may be going down as a percentage of the total interactions, their importance is actually going up.

Also, there are occasions when it’s OK to talk, but not to type: don’t text and drive! Although there are also occasions when texting is the only way to communicate, like when you are at a rock concert…

So, voice has a big role, and especially voice over the telephone network: dialing 1–800-SUPPORT is still the easiest way to get you there. Adding voice support to bots is consequently a great way to expand the reach of conversational technology in the customer service domain to the 50% or so of communications that are currently out of reach.

The challenges of voice

For bots, voice is harder that text. While voice can be transcribed into text rather easily by an ASR (automatic speech recognition) and the transcription can be fed to the bot’s AI, this is still an additional step that needs to be integrated into the system. There are also several TTS (text-to-speech) services that can be used to convert the bot’s answers back to voice — still another step.

What’s more, the knowledge base and AI training for text and voice is not completely overlapping: we say things and use turn of phrases while speaking that we wouldn’t use while typing; on the other hand, the ASR will not make typographical mistakes that are common in chat and must be accounted for by chatbot engines. But these are issues that can be overcome with better AI training — we at Interactive Media know this since we support both voice and chat in our conversational Virtual Agents.

More challenging, the system needs to be very responsive for voice: while no-one would object to a 10-seconds pause between typing a message and receiving a response, try that with voice! And so, the integration needs to be architecturally sound and fast. And not all ASR systems are created equal: while recognition performance of the latest ASRs is uniformly quite good, some systems have an advantage for specific tasks: for instance, Google Speech APIs excel in recognizing addresses due to their integration with Google Maps. It makes sense to use different ASR vendors for different parts of an application.

And then, there is the telephone network to deal with. There are certainly RESTful APIs that are easily integrated into a conversational system, but at volume they can be expensive. Also, usually the companies deploying the bot already have their own telephony infrastructure, and it doesn’t make sense to overhaul it for the use of the bot. Be it implemented through a local switch (PBX) or a SIP trunk from a carrier, telephony is more challenging to integrate with than a purely HTTP based interface.

Finally, if the interaction does not complete within the self-service conversational domain it will need to be forwarded to a human agent. This implies not only forwarding the call to a Contact Center suite (usually over SIP), but also passing over the context gathered so far, and for this an integration with the CTI interface of the Contact Center is needed.

So, there are several factors that contribute in making voice and telephony for bots a complex proposition.

An offer to help

Interactive Media knows a lot about voice and integration with other voice platforms. We started with voice applications, telephony and customer experience in 1996, and so we have both a long experience in what it takes to integrate successfully with the telephone network, and a super-solid platform that has evolved to incorporate the latest architectures and protocols into a proven foundation for all voice communication.

We also have a platform for conversational application with several sizable deployments, both for voice and chat. This has helped us understand the most impactful features of the telephony platform and optimize them as they relate to bots.

So the idea is simple: Interactive Media is on a mission to help chatbots add voice to their repertoire. This starts with telephony integration, of course, but continues with speech transcription and generation, and integration with Contact Center platforms — we integrate natively with several of the most common ones. Our software is already in the cloud, and you can try it at Phone my Bot.

We are looking forward to giving all deserving chatbots the gift of voice.

Other Articles

Making simple chatbots better

Making simple chatbots better

Many deployed chatbots are far from holding real conversations. But they too can be enabled for fluent dialog. This is how we do it.When you consider chatbots these days you think of ChatGPT, Google Bard, Bing Chat, etc. These are all based on Large Language Models...

read more
LLM-based chatbots and how to make them more reliable

LLM-based chatbots and how to make them more reliable

ChatGPT and its siblings are all the rage in customer service chatbots. This is fascinating and terrifying. How do we take the terror out of the equation?In the past year or so we witnessed an explosion of chatbots based on Large Language Models (LLM). The adoption of...

read more
Multimodal interactions: are they breaking through?

Multimodal interactions: are they breaking through?

Last week I watched a webinar and demo by a company providing tools and solutions for conversational customer service. Interactive Media, where I work, is in the same sector and I wanted to scoop out a competitor, see what they have and how they are presenting their...

read more

Interact with us


Receive our exclusive content: