Boosting the development of voice-enabled virtual assistants

Boosting the development of voice-enabled virtual assistants

Boosting the development of voice-enabled virtual assistants

Written by

PhoneMyBot by Interactive Media is a service that transforms chatbots, that work only on text conversations, into voice-enabled virtual assistants. To do this, PhoneMyBot terminates the voice channel – be it a telephone line, a recorded voice message, or other streaming voice channels, transforms the voice into text through a speech-to-text service, and sends the text over to the chatbot.

When PhoneMyBot receives the answer as a text message from the chatbot, it renders it into speech and pipes it back to the user. You can learn more about PhoneMyBot here.

There are many nuances and details that are missing from the description above (some of them are patent-pending), but a key to PhoneMyBot’s success is the ability to integrate with many chatbot platforms. PhoneMyBot offers a standard cloud API that chatbots can use, but it also includes adaptors that use the chatbot platforms’ native API, simulating a simple web client. This way, PhoneMyBot can communicate with existing chatbot deployments without the need for new developments in the chatbot code. At the moment, PhoneMyBot deploys adaptors for about 10 chatbot platforms, but new ones are coming out all the time, depending on our customers’ needs. If you don’t see an adaptor for your platform, let us know and we can add it.

Other Articles

This service was designed to make it cheap and immediate to add voice to an existing chatbot deployment – and it does that, but as an interesting side effect it also lowers the cost of new voicebot developments, while speeding up their deployment time.

Why is that? It all comes down to the dynamics of the conversational AI market for enterprise customers.

A successful conversational AI project entails more than just software and communications. It needs to be tailored to the company’s workflow, products and services, and lingo. Often, the type of language that needs to be used is not the same as in a general-purpose conversation, and this requires conversational applications to be trained to better support it. Of course, this is a common requirement in this type of project, and conversational AI platforms support language customization. But it still means that project development, testing, refining, and deployment take substantial time and effort.

Now, there are only so many conversational AI vendors offering voice integration, and system integrators who can use their platform to implement projects. In addition to the conversational AI part, a voice-enabled project includes integration with the telephone network or the corporate PBX, insertion into the IVR flow, and integration with the voice path in the contact center – both to forward calls if the virtual assistant cannot service them completely, and to provide call-associated data to human agents to make their work easier and provide better service.

All this requires specialized expertise, which few vendors have. These companies and people are in high demand, so delays can be long and costs high. 

But PhoneMyBot provides a ready alternative, with its pre-integrated voice channels. It includes telephone network and WhatsApp connectivity, and APIs to transfer calls to other voice endpoints (for instance, a contact center queue). Interactive Media has tons of experience integrating with the most common contact center suites both to insert the virtual assistant into the IVR flow and to send data attached to calls to the human agent who is servicing it.

This means that the pool of vendors that can bid on a voice-enabled conversational AI project is suddenly much bigger. Even companies with little or no voice expertise can now deliver a high-quality omnichannel virtual assistant: they only need to test their PhoneMyBot integration and iron out any small wrinkle that the additional channel may create in their conversational application strategy.

There are many more text-only conversational AI offers than voice-enabled ones. PhoneMyBot opens the omnichannel market to them, which benefits vendors, their customers, and ultimately the customer experience that you and I receive when we call a customer service line.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content:

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

WhatsApp voice messages and how chatbot can use them

Written by

WhatsApp lets people record and send voice messages. What does it mean for the chatbot customer experience?

Like most Europeans – well, I should say most people in the world – I am a WhatsApp user. WhatsApp has more than 2 billion users worldwide, about a quarter of all humans. And although WhatsApp’s penetration in the United States is lower than in most places, if you are a foreign-born US resident who wants to keep in touch with friends and family back home, like me, WhatsApp is THE app to use.

WhatsApp offers chats, voice calls, video calls, one-on-one or among ad-hoc or organized groups. It also has a business offer, allowing companies to be messaged or called on WhatsApp to be where their customers are.

This feature was introduced in 2018 and is being used more and more: people appreciate using the same app to communicate with individuals and companies, and many telecommunications vendors resell WhatsApp business numbers and the services that come with them.

While I am a member of a couple of organized groups, I mostly use the app to message my friends or call them directly, rarely involving more than one person at a time. But I noticed a funny thing: some of my friends have stopped sending chat messages altogether. Instead, they use another feature of the app, that lets you record a voice message and send it over in a conversation. I prefer to type and let the autocompletion feature on my smartphone work its magic, also considering that receiving a voice message is certainly less immediate than reading a short text. But I can see several reasons for preferring to send a voice recording.

Other Articles

For instance, you may be on the go, without the time and place to type. Or you may have troubles seeing the phone keyboard, either because of light conditions or because you can’t see very well (I certainly have problems typing without my reading glasses, I am at that stage of life). You may want to be more expressive using your tone of voice: spoken communication is much better than text to convey feelings. Or you may not be comfortable writing in general – or the person on the other side may have problems reading. For all these reasons, and possibly others that I can’t think of, sending voice messages instead of typing is on the rise.

And this is fine, as long as you communicate with a human who speaks the same language as you. But there is a special use case that is completely destroyed by this habit: communicating with a chatbot. You see, businesses that use WhatsApp to communicate with their customers via text messages often employ chatbots, automatic “conversational AI” attendants that use natural language capabilities to converse with people, understand the reason for the interaction and help them in a more efficient and cheaper way than having a human customer representative on the line the whole time. Except that chatbots can understand WRITTEN communication, and not voice recordings.

Instead, more and more chatbots that connect with WhatsApp receive recorded voice messages. In this case there are two possibilities: the chatbot recognizes that it cannot access the message and dumps the session. Or it transfers the session to a human agent who listens to the message, researches the answer, and writes back. The first case of course brings to an awful customer experience, the second to a substantial increase in costs, as the human agent is doing the job that the chatbot could do, having to listen to sometimes long and rambling messages to extract meaning.

 

What is there to do? Interactive Media, the company where I work, has launched PhoneMyBot, a service that provides an alternative, cheaper and far more elegant solution to the problem. PhoneMyBot was born to expand the channels available to chatbots to include voice channels. It provides a telephone network interface, along with other voice integrations, transcribing the users’ utterances and sending them to the chatbot, and receiving text in return from the chatbot, transforming it into speech, and sending it back to the user over the voice network. PhoneMyBot is completely cloud-based, and also integrates with a number of contact center suites to transfer the call to a human agent if necessary.

In addition, PhoneMyBot integrates with WhatsApp to receive a recorded voice message in a set language from a chatbot, transcribe it, and send it back to the chatbot as text. All the chatbot has to do is communicate with PhoneMyBot’s WhatsApp number to set the language, send the voice file, and receive the transcription. PhoneMyBot also exposes a standard HTTPS-based API for that, which the chatbot can use with a small development effort.

It may be that the primary reason some people use WhatsApp’s recorded voice messages feature is that they have difficulties reading and writing. You may think this is a problem of the past, overcome now everywhere. But not so fast. The latest figures for United States residents put the non-literacy rate at about 1%. The US is in the middle of the pack here: China (3%), Brazil (7%), India (25%) fare a lot worse. (See https://www.macrotrends.net/countries/ranking/literacy-rate for a complete list). The figures for people who have basic literacy but are uncomfortable reading and writing is likely much higher. So, this is a real possibility.

In addition, PhoneMyBot can also convert the text received from the chatbot to speech (with a choice of voices) and send it back to the chatbot to attach to the WhatsApp response message. This way, users who would like to conduct the complete conversation with recorded messages can receive the chatbot’s answer on their preferred channel.

Sometimes useful features in products and services have unintended consequences. I am sure that when WhatsApp introduced their voice messages feature, they were thinking of human-to-human communications only and for this use case it is a great alternative. But it breaks other use cases, like human-to-machine interactions. Fortunately, PhoneMyBot is there to fix it.

You can try PhoneMyBot’s WhatsApp message transcription right now. To get started scan the code below, fire up WhatsApp on your phone and start the interaction with the word “start” as first message. If you type “help”, PhoneMyBot sends you details on how to use the service.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content:

Speech-to-Text results optimization with Interactive Media’s solutions

Speech-to-Text results optimization with Interactive Media’s solutions

Speech-to-Text results optimization with Interactive Media’s solutions

Written by

An historical perspective

Interactive Media has offered Conversational AI solutions for many years, focusing on voice-enabled Virtual Agents. We deployed our first conversational Virtual Agents way before Conversational AI was a buzz-word and the explosion of self-service conversational deployments. 

Having focused on voice since the beginning, we are keenly aware of the challenges that come with converting the spoken utterances coming from users into text that conversational systems can use.

This is because conversational AI Virtual Agents can hold a spoken conversation, for instance on the phone, but their AI brain works on text. So, they need to convert the sentences spoken by humans into their text counterpart, and the text that the system uses to answer back into speech.

Ten years ago, the options available on the market to interpret speech and convert it into text (ASR, Automatic Speech Recognition, or Speech-to-Text) were limited. One company, Nuance, dominated the field, having developed their own technology, or acquired smaller competitors in different countries to offer Speech-to-Text in different languages. So, initially Interactive Media relied on Nuance’s technology for all its voice-enabled Virtual Agent deployments.

Other Articles

Today’s landscape

The state of the technology is vastly different now. The wide adoption of AI has changed the way human speech is interpreted by machines in a substantial way, making the task to develop Speech-to-Text systems much easier and performance much better – meaning that transcription precision has improved significantly. Speech-to-Text offers have exploded in number and dozens of companies now provide the service, either directly from the public Cloud or integrated more strictly with speech applications.

However, speech is not the same for all people and applications. The variations are staggering. People speak in different ways depending on what they want, what is being asked of them, where they are in a conversation, and of course in dozens of different languages. Providing a Speech-to-Text service that covers effectively all the variations and parts of a conversation is exceedingly hard. So, inevitably some services are better than other for specific tasks and languages.

Interactive Media’s approach to Speech-to-Text

Since Speech-to-Text is still integral to Interactive Media’s offer, we are constantly monitoring its advances and testing different services on a day-to-day basis. We have developed metrics and standardized test suites to inform the decision of what service to use for the benefit of our customers, depending on the use case which dictates the task at hand, the settings, and the language.

What’s the benefit? We have found that the main general-purpose Speech-to-Text services have some weak points, for instance when the task is to fill in a form with numbers or alphanumeric strings. In this case the field of results is limited, but some services don’t seem to use this to their advantage and retain the same percentage of correct recognition as the general speech. But while a 95% recognition accuracy is usually enough to find out an intent (for instance), when you need to take in a string of 10 digits, you’ll get it wrong roughly 40% of the times.

However, other Speech-to-Text engines are optimized for recognizing digits or allow the user to define tight grammars that can help with the task. Using these engines, you can get an accuracy up to 99%, which over 10 digits results in a 90% probability to get the whole string right.

Similarly, there are more common tasks that need optimization for the Virtual Agent to be effective. Maybe the most challenging one is transcribing an email address. Human agents have a hard time with it, and the percentage of errors is exceedingly high. Again, some Speech-to-Text services do better than others and even a 5% difference makes it worth it to switch to a better performing service in mid-call if the volume of traffic is high enough.

So, we engineered our platform to use several of the best Speech-to-Text services, constantly testing the connected services and adding new ones as they become available. It’s a big task, but (we think) we are being fairly smart about it: we model conversations by defining categories of tasks that Virtual Agents must accomplish, and continuously test each of the services we integrate with using sample atomic interactions belonging to each category. This way, we derive scores for the various services for each task, in several languages.

This would be academic without a way for the Virtual Agent application to tell us what to expect. So, we added this feature to all our services, provided by the PhoneMyBot and OMNIA platforms. The API allows to specify the expected category of utterance coming from the user, based on the question being asked. So for instance, if the system prompts the user to provide a numerical code, the service knows that the next utterance is most likely composed of numbers, and will use the Speech-to-Text engine with the best performance recognizing them.​

The difference in performance is substantial – if even 10% less calls have to be forwarded to human agents, especially when the task is simply collecting data from the customer, the customer experience is better and the ROI for our customers soars, which is the promise of Virtual Agents, delivered.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content:

PhoneMyBot outbound service

PhoneMyBot outbound service

PhoneMyBot outbound service

Written by

When people think of chatbots, mostly they envision little helpers popping up on the lower right side of webpages. Maybe a bit annoying if you are not looking for anything particular, often helpful, they take away the guesswork of navigating to the right information within the site by interacting with users in natural language. Users write a question, the chatbot interprets its meaning and answers with the information. Or maybe not – depending on how well made the chatbot is.

Chatbots are supplanting the venerable website FAQs section, provide services and answers for the most common needs, even let users perform some self-service tasks like order products or make payments. This way, chatbots improve the customer experience and service most of the interactions in self-service mode, while costing a fraction of live agents, who can concentrate on the interactions that chatbots cannot solve and require creativity and human touch.

But chatbots always need users to come to them and initiate the interaction.

The reason is obvious: how can a website reach out to users who are not “visiting” its pages? True, chatbots also use other channels: messaging services (WhatsApp, Facebook Messenger), text messages, email. These can be used to start conversations and sometimes they are. But it’s not common or immediate: people are not necessarily watching their messaging apps all the time and messages from companies can be ignored easily.

Other Articles

There are good reasons for companies to reach users proactively and immediately: for instance, to remind them of an appointment and give them the ability to reschedule. Or to confirm an order before it ships. Text messages can be used for that but there’s no guarantee that the answer will be fast – or there will be an answer at all. The main way to reach people quickly and with real-time feedback is a phone call: the phone will ring and if the user answers the unfolding conversation allows to go over the matter completely and with a high degree of certainty. So, this is now done with automatic dialers that are backed up by live agents, which is also expensive and not pleasant for the agents themselves. Too bad that chatbots cannot use the phone.

Or can they?

PhoneMyBot by Interactive Media provides services that allow chatbots to seamlessly operate on voice channels, starting with the telephone. It is a Cloud-based environment with connectivity to the telephone network, APIs to connect with the chatbots, and using multiple speech-to-text and text-to-speech services to “translate” between the voice-based and text-based ends.

PhoneMyBot uses a layer of software adaptors to natively talk with several common conversational AI frameworks. Chatbots based on these frameworks don’t have to do anything to interact with voice users: they see the endpoint as just another website-based client. But of course, this is for incoming calls.

But PhoneMyBot also exposes a standard cloud API that the chatbots can use, and it supports placing calls to telephones. Once the call is established, the chatbot interacts with the user like in any other chat conversation, leaving to PhoneMyBot the task of converting between text and voice. If the call cannot be connected, or it goes to voicemail, the chatbot receives a message from PhoneMyBot and can continue to the next call.

The applications are numerous, all resulting in better customer experience and lower costs for the company:

  • Reservations and scheduling
  • Order confirmations, delivery alerts
  • Reminders or appointment confirmation
  • Service renewal
  • Upselling

With its outbound service, PhoneMyBot allows to use chatbots in a completely new way, giving voice to their chat and opening new perspectives. To learn more please visit https://www.phonemybot.com or contact us at info@phonemybot.com.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content:

Interactive Media’s Conversational Virtual Agents and how they interact with people in natural language

Interactive Media’s Conversational Virtual Agents and how they interact with people in natural language

Interactive Media’s Conversational Virtual Agents and how they interact with people in natural language

Written by

Natural Language Processing (NLP) is gaining ever more relevance as it applies to virtual agents. The technology is efficient in the challenge of automating interactions with customers, without compromising the quality of service.

In this post we will elaborate about the subject. First, we will address the concept and the importance of virtual agents in corporate service processes. Next, we will detail some of the main challenges and major advantages of NLP when used woth bots.

Finally, we will make the case of why Interactive Media, founded more than 20 years ago, is the best choice in service solutions with NLP support.

Happy reading!

Other Articles

Virtual agents: what are they and why are they important in service?

Increasingly common in corporate daily life, virtual agents are computer applications that use artificial intelligence (AI) and machine learning to optimize service processes. Virtual agents use Natural Language Processing (NLP) and Natural Language Understanding (NLU) as the basis for conducting a conversation with people.

By incorporating virtual agents into workflows, companies have a double gain: while leveraging service team productivity, speeding up problem solving through technology, they also optimize important resources – such as labor, time and money. The result of the equation can be extremely positive and, therefore, very attractive to high performance companies.

NLP, in turn, plays a decisive role in the effectiveness of virtual agents, especially when the user experience is a central theme. “Generally speaking, Natural Language Processing is the ability of a computer system to interact with people using speech, adapting to understand what they say and to respond to them in a natural way,” says Livio Pugliese, CEO of PhoneMyBot, an Interactive Media company.

Technically speaking, NLP is at the intersection of linguistics and information technology, benefiting from the advances of both. According to a Gartner report, it is estimated that by 2021, 15% of all customer service interactions will be fully handled by artificial intelligence mechanisms. In Brazil, the virtual agent market also continues to be heated: in 2019 alone, 60 thousand bots were launched, a number almost 353% higher than the previous year.

Natural language: what are the biggest challenges and main advantages?

The rapid advance of technology, especially of artificial intelligence, has in recent years led to a substantial increase in the quality of comprehension and language generation. As a result, virtual agents also improved and gained more and more space in business departments – from sales to technical support.

In practice, NLP needs written text to function. Therefore, it is necessary that machines accurately transcribe what people say, providing a coherent and precise interpretation – which undoubtedly emerges as one of the main challenges of Natural Language Processing.

“Even more challenging, however, is the mission of assigning a faithful meaning to the transcript, since people use many different phrases to say the same thing and, in some cases, the same word can mean different things in specific contexts”, comments Livio Pugliese.

However, overcoming this obstacle brings a substantial reward: adopting automated service solutions brings many advantages and can represent significant gains in the short, medium and long term.

“When we are talking about voice, NLP can help whenever people need to communicate with machines without using their hands,” explains the CEO of PhoneMyBot. The executive reinforces that these days it is not just questions and answers, but conversations. The biggest advantage lies in the ability to understand the user’s intention to, if possible, provide the services most appropriate to the question or complaint.

“Often, the virtual agent gets only some of the meaning during the first conversation exchange” according to Livio Pugliese. But with AI and machine learning resources, this is no longer a problem: the technology can continue to ask complementary questions to single out the intention until it is completely understood, and the most relevant answer is sent to the customer.

The operational impact of virtual agents is important for the bottom line as well. When bots are in charge of leading the initial interaction with customers, often containing the operation into self-service, professionals in the field can dedicate themselves to more analytical and strategic tasks, which helps the overall company performance.

If you still have doubts about the efficiency of virtual agents in generating savings for corporations, it is worth remembering that a survey by Juniper Research predicts that, by 2022, companies will save 8 billion dollars a year with the application of conversational technologies. In other words: it is worth investing now.

Experience and technology: why is Interactive Media a specialist in NLP?

Interactive Media develops, deploys, and continuously improves conversational virtual customer service agents across multiple channels. With great expertise in artificial intelligence tools and machine learning, the company helps its customers to optimize their interaction flows with users.

With more than 20 years of experience in voice applications, Interactive Media has implemented many successful use cases in organizations of the most diverse sizes and segments. “Based on the history, we know that virtual agents can solve up to 80% of the problems in the call center, freeing human agents from countless telephone contacts”, points out Pugliese.

Interactive Media’s platform uses a carefully tailored approach to build technologies that fosters high-performance understanding, allowing for more accurate interactions – which, in turn, improves both ends of the chain. For the company, it is about maximizing resources and reducing costs; for the user, it means that problem solving is more agile and efficient.

“At Interactive Media, we cover the complete lifecycle of virtual NLP agents and also of all integrations, which ensures that we can provide the most appropriate and intelligent solution to the problem presented by the customer”, concludes the PhoneMyBot CEO .

We want to end with two complementary conclusions. The first is that Natural Language Processing emerges as the most assertive way to enable a more human-like automated service, capable of really understand the user’s intention. The second solidifies Interactive Media’s position as a reference in the area; after all, even before the chatbots boom, the company already offered AI-based conversational services and they have only become smarter and more focused with time.

To improve the service flow in your company you need an effective technological mechanism. Contact us and find out how we can help you implement more complete solutions, in line with the demands of an evolving market.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content:

Chatbots and recorded voice – a messaging era dilemma

Chatbots and recorded voice – a messaging era dilemma

Chatbots and recorded voice – a messaging era dilemma

Written by

Chatbots converse with people in natural language and have had an extraordinary proliferation in the past few years. They started as little windows on websites, allowing users to write what they were looking for and providing information directly, instead of forcing people to navigate the complete site looking for their content. 

This is certainly a worthwhile mission, but chatbots have expanded from that, to mining databases and presenting personalized results, and performing mission-critical activities like booking and confirming appointments.

But the past few years have also seen the explosion of mobile messaging services, which are now an integral part of (almost) everyone’s life. From simple one-on-one text messages (SMS) to multimedia messages to multiple recipients and platform that straddle the divide between messaging and social networks, like WhatsApp, Viber, Telegram, Facebook Messenger. The advantages of these services are clear: they are software-only and brought to users on a device that’s always with them, they are free or almost free, they offer multimedia capabilities, and writing texts is faster and more flexible than calling. Even though it is less common in the USA, WhatsApp (owned by Facebook) is currently the biggest mobile messaging app in the world, with about 2 billion users and about 100 billion messages sent per day.

Other Articles

And so, people send and enjoy messages at an ever-increasing rate. Of course, chatbots are also in the mix, following their audience to the channels that they use. This way, people can get services from chatbots on their favorite messaging app, just like they were messaging with friends.

Chatbots work on text and all messaging applications are based on text. They all support pictures and videos, which are transferred as text-based links that the app follows to retrieve the content or attachments. Chatbots can connect to any messaging platform with an API that allows it, simulating a mobile device or implementing a business endpoint.

All good then? Not completely. A functionality offered by some messaging platforms is to record a voice message instead of typing and send it instead of (or together with) a text message. This is becoming more and more common – people on the move may not want to stop and type, while recording a brief message is fast and easy. It is also more personal: you can say a lot more with your tone of voice than sending text and emojis.  Humans also appreciate to hear their friends voice more than just reading what they write.

But not chatbots. For them, a recorded voice message in a text exchange means the end of the conversation: they are not (in general) equipped for receiving a voice file and transcribing it into text to feed to the conversational AI engine that propels the conversation. The alternative, that can be used in high-value conversations like sales or customer support ones, is to transition the interaction to a human agent who will listen to the voice message and reply back, taking over the exchange with the user. But this is expensive as it requires the organization to staff humans in a sufficient number to pick up failed bot conversations in addition to conducting their normal business.

Even worse would be for human agents to simply listen and transcribe the message to pass it back to the chatbot: this would be an impossibly dull and menial job and likely to lead to massive turnover.

What is needed is a service to transcribe voice recordings and get them back to chatbots accurately and quickly.  

PhoneMyBot from Interactive Media provides such a service. PhoneMyBot is dedicated to expanding the chatbots realm to voice, be it from the telephone network or any other channel. For the telephone channel, PhoneMyBot must transform live voice from a user into text and text from the chatbot into voice. All of this, in several languages and with a selection of the best speech-to-text service for the job. This also enables PhoneMyBot to spot-transcribe recorded messages.

A crucial point is to make it very easy for chatbots to submit a recorded voice message to transcribe. PhoneMyBot exposes a RESTful API for this, supporting numerous encodings and formats for the voice file. Considering that most users are on WhatsApp and so chatbots also use this channel, PhoneMyBot also provides a WhatsApp enabled number for access. Chatbots can send a message to PhoneMyBot with the voice file and receive back the transcription as the response.

With this feature, we of PhoneMyBot believe that we gave a definitive answer to the recorded voice messages dilemma.

Other Articles

PhoneMyBot and ChatGPT: giving voice to AI

PhoneMyBot and ChatGPT: giving voice to AI

Talking with ChatGPT over the phone is cool. But we can also make it useful. In the past few months tech people worldwide have been talking almost exclusively about Open AI’s ChatGPT. It’s the first large language model chatbot to make a splash, and what a splash! It...

read more
The future of intelligent voice

The future of intelligent voice

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do? Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first...

read more

Interact with us

Subscription

Receive our exclusive content: