Using PhoneMyBot to add voice to a chatbot


PhoneMyBot is Interactive Media’s service that allows chatbots – conversational AI applications that use text or web channels to communicate with users – to add the voice channel and have voice conversations with people quickly and easily. This could be over the phone or other voice networks. PhoneMyBot takes care of converting voice into text and text into voice, with a number of features to make this process easier and more pleasant for users, but chatbots should still be mindful of the ways voice conversations differ from text-based ones and adjust their side for a better conversational voice experience.

This document provides a number of tips to refine chatbots dialog to better adapt to voice.


One of the reasons people install voicebots is to lighten the load on human agent who receive telephone calls from customers. There is a good chance that the voicebot will be able to help servicing the most common tasks. A call should be forwarded to a human agent only if the voicebot cannot service the call. So, don’t start by telling a user that they can talk with a human by saying “human agent” at all times: only do that when it’s clear that the voicebot will not be able to help. Otherwise, that’s exactly what most people will do: saying “human agent“ immediately.


The average reader can read about 300 written words per minute, but talking is slower. Normally podcast readers speak 150-160 words per minute, as this is the optimal speed for listeners. This means that a text the chatbot would put out on a text-based channel takes more than double the time to be spoken than read! It makes sense to be as brief as possible on the voice channel then, avoiding preambles and if necessary, splitting a text in two or more parts. This is also a good practice on text channels by the way, it’s just easier to get away with longer texts.


In a text interaction, a chatbot could say something: “please see below for the list of options”. Of course, this does not work for voice, besides being a bit clunky for text too. Much better saying something like “Please choose one of:”. Be on the lookout for this sort of things.


Chatbots act on text channels, but also on web pages. An interaction on a web page can be richer than mere text: users can be presented with buttons to click, forms, pictures, even videos. Needless to say, these elements don’t translate well into voice. In this sense an interaction through PhoneMyBot is similar to one that uses a more text-oriented channel like WhatsApp, but even more constrained. So, the recommendation is to avoid all visual elements if possible. Note that PhoneMyBot has settings that allow substituting a string coming from the chatbot with another pre-defined string. For instance, these can be used to look for and substitute a particular URL string (in the form https://some-URL) with a sentence like: “please see our website”. This can be useful, but the best is to avoid substitutions altogether.


It is easier to guide users to type something than to say something. There are many more turns of phrase that people use speaking than writing and so the conversational AI engine must be prepared for more. This can be done incrementally: conduct reviews of past interactions, focusing on sentences that were not understood, and see if the conversation can be improved by adding synonyms, turns of phrase, idiomatic expressions to the chatbot knowledge base.


Text-to-speech services provide a way to boost their voice output to mimic the way people speak using emphasis, volume, pronunciation etc. PhoneMyBot supports a simplified subset of SSML (Speech Synthesis Markup Language) and takes care of implementing these directions with the TTS services it uses. Chatbots can sound more natural for users by adding tags to their text, which are then interpretedand managed by PhoneMyBot.
Please see for details on the supported tags and how to use them.


PhoneMyBot provides a set of powerful features to help chatbots transition from chat to voice. They are explained in the PhoneMyBot Wiki ( They include features to make the conversation more fluid:

a stock message that PhoneMyBot will say on its own after some time of silence,

a message that PhoneMyBot will say if the speech-to-text engine does not recognize what the user says,

a message said when the chatbot is late sending a reply, an offer to repeat a prompt if it’s long or confusing

a message that PhoneMyBot will say if the user says “no” to an offer to repeat the prompt

Please look at to learn how to configure these messages.

In addition, PhoneMyBot can perform actions triggered by the chatbot through a regular expression. In essence, when the chatbot sends a string that’s configured in PhoneMyBot, PhoneMyBot executes the associated action. This can be:

Speak a prompt and hangup the call

Transferring the call to a telephone number associated with a queue or a person

Use one of the context-aware speech-to-text recognition functions for the next utterance that the user says. This is very useful to improve the recognition percentage of numeric or alphanumeric strings, like social security numbers, license plates, etc. See a list of the available contexts here:

Set up special functions during a prompt, like enabling or disabling barge-in (the ability to detect what the user says while the prompt is playing, stopping the prompt and continuing the conversation), or offering the user to repeat the prompt

Substituting a text with another one, for instance to quickly change a text-oriented prompt into a voice-oriented one.

See to learn how to configure these functions.

Request Demo


Privacy Policy

From our blog

Making simple chatbots better

Making simple chatbots better

Many deployed chatbots are far from holding real conversations. But they too can be enabled for fluent dialog. This is how we do it.When you consider chatbots these days you think of ChatGPT, Google Bard, Bing Chat,...

read more

Interact with us


Receive our exclusive content: