Chatbots and recorded voice – a messaging era dilemma

August 13, 2021

Chatbots converse with people in natural language and have had an extraordinary proliferation in the past few years. They started as little windows on websites, allowing users to write what they were looking for and providing information directly, instead of forcing people to navigate the complete site looking for their content. 

This is certainly a worthwhile mission, but chatbots have expanded from that, to mining databases and presenting personalized results, and performing mission-critical activities like booking and confirming appointments.

But the past few years have also seen the explosion of mobile messaging services, which are now an integral part of (almost) everyone’s life. From simple one-on-one text messages (SMS) to multimedia messages to multiple recipients and platform that straddle the divide between messaging and social networks, like WhatsApp, Viber, Telegram, Facebook Messenger. The advantages of these services are clear: they are software-only and brought to users on a device that’s always with them, they are free or almost free, they offer multimedia capabilities, and writing texts is faster and more flexible than calling. Even though it is less common in the USA, WhatsApp (owned by Facebook) is currently the biggest mobile messaging app in the world, with about 2 billion users and about 100 billion messages sent per day.

Other Articles

And so, people send and enjoy messages at an ever-increasing rate. Of course, chatbots are also in the mix, following their audience to the channels that they use. This way, people can get services from chatbots on their favorite messaging app, just like they were messaging with friends.

Chatbots work on text and all messaging applications are based on text. They all support pictures and videos, which are transferred as text-based links that the app follows to retrieve the content or attachments. Chatbots can connect to any messaging platform with an API that allows it, simulating a mobile device or implementing a business endpoint.

All good then? Not completely. A functionality offered by some messaging platforms is to record a voice message instead of typing and send it instead of (or together with) a text message. This is becoming more and more common – people on the move may not want to stop and type, while recording a brief message is fast and easy. It is also more personal: you can say a lot more with your tone of voice than sending text and emojis.  Humans also appreciate to hear their friends voice more than just reading what they write.

But not chatbots. For them, a recorded voice message in a text exchange means the end of the conversation: they are not (in general) equipped for receiving a voice file and transcribing it into text to feed to the conversational AI engine that propels the conversation. The alternative, that can be used in high-value conversations like sales or customer support ones, is to transition the interaction to a human agent who will listen to the voice message and reply back, taking over the exchange with the user. But this is expensive as it requires the organization to staff humans in a sufficient number to pick up failed bot conversations in addition to conducting their normal business.

Even worse would be for human agents to simply listen and transcribe the message to pass it back to the chatbot: this would be an impossibly dull and menial job and likely to lead to massive turnover.

What is needed is a service to transcribe voice recordings and get them back to chatbots accurately and quickly.  

PhoneMyBot from Interactive Media provides such a service. PhoneMyBot is dedicated to expanding the chatbots realm to voice, be it from the telephone network or any other channel. For the telephone channel, PhoneMyBot must transform live voice from a user into text and text from the chatbot into voice. All of this, in several languages and with a selection of the best speech-to-text service for the job. This also enables PhoneMyBot to spot-transcribe recorded messages.

A crucial point is to make it very easy for chatbots to submit a recorded voice message to transcribe. PhoneMyBot exposes a RESTful API for this, supporting numerous encodings and formats for the voice file. Considering that most users are on WhatsApp and so chatbots also use this channel, PhoneMyBot also provides a WhatsApp enabled number for access. Chatbots can send a message to PhoneMyBot with the voice file and receive back the transcription as the response.

With this feature, we of PhoneMyBot believe that we gave a definitive answer to the recorded voice messages dilemma.

Other Articles

Boosting the development of voice-enabled virtual assistants

Boosting the development of voice-enabled virtual assistants

PhoneMyBot by Interactive Media is a service that transforms chatbots, that work only on text conversations, into voice-enabled virtual assistants. To do this, PhoneMyBot terminates the voice channel – be it a telephone line, a recorded voice message, or other...

read more
WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?Like most Europeans - well, I should say most people in the world - I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of...

read more

Interact with us


Receive our exclusive content: