The future of intelligent voice

Written by Livio Pugliese

January 26, 2023

As the market for smart speakers falters, what are the Big Three (Amazon, Apple, Google) going to do?

Alexa, should I bring an umbrella out tomorrow? This is a question that owners of smart speakers have been asking since 2013, the year when Amazon released its first Echo product. Soon Google and Apple followed suit, with their Google Assistant and Siri technologies.

While Siri is embedded into Apple hardware as a software feature, both Amazon and Google produced and actively started selling the hardware to support their speech software: a line of smart speakers with sensitive microphones that listen for people uttering a key phrase to start detecting what they say. The rise of these devices has been meteoric. They were cheap, convenient, and they largely supplanted both radio and stereo systems in the home, by streaming content controlled by voice. They were sold by the tens of millions, both in the US and around the world: according to a Comscore report, in 2021 almost half of the US internet users owned at least one of them.

Most people in the US are familiar with Alexa: she listens to the sounds around her and when she hears her name she springs into action. This means recording the sentence that comes after the keyword and sending the audio to the Amazon Cloud for recognition, receiving the answer and playing it back. (Supposedly, nothing is recorded outside of the keyword-initiated transaction of course). The same is true for the Google version; hey Google is both longer and less personal.

As an aside, I know someone who’s name is Alexa – and it was her name well before Amazon released the first Echo: I wonder how she feels being called upon doing the biddings of countless people…

The problem with the status quo: lack of revenues

As it often happens in the tech industry, for smart speakers the technology leapt ahead of the profitable use cases. Yes, people were and are using their smart speakers often, but mostly to ask general questions, check on the weather and ask for music streaming. The vendors figured that, with time and as adoption increased, they could come up with a revenue model that would support the business, but so far no-one has managed it.

Of course, there are ads within music streaming if the owner does not subscribe to a music service, but few and far between not to degrade the experience too much. And a $10 a month music subscription is not a panacea to support providing and maintaining the infrastructure for the rest of the service.

The most profitable use case that was hoped for at the beginning, shopping by voice, never took off: people are understandably weary of providing personal information, credit card numbers, etc. to the Cloud through yet another channel, and by definition any shopping done through a smart speaker is “sight unseen”.

So, in the past few months with the changing economy and the realization of how difficult it is to really monetize smart speakers, there has been a definite retrenching by both Amazon and Google. Amazon laid off a good portion of the Alexa development team, Google reportedly greatly reduced funding for the Assistant line and – this is very recent news – Alphabet is laying off as much as 12000 workers in January 2023. One can imagine that the worst-performing divisions would be most affected.

Smart speakers are in trouble.

Voice apps on smart speakers

However, many companies and organizations developed apps to integrate with Alexa and Google Assistant, through the respective APIs. In this case, the smart speakers act simply as a speech transcription and rendering interface: once the app is active, they transcribe what the user says and send the text to the external service, take the text that service sends back and render it into voice for the user to hear.

Amazon calls these apps Skills; Google calls them Actions. Either way, there are hundred of thousands of them. They can be launched with a special prompt: “Alexa, open [skill name]” or “Hey Google, talk to [action name]”. While many apps have not been successful and have minimal use from this channel, others are important or even essential.

What happens to these apps if the smart speaker vendors limit and then terminate their offer? Some are merely activating an additional channel to a wider service, and presumably would not be impacted too severely. But others were developed specifically to take advantage of the voice channel offered for free by smart speakers. For instance, I recently talked with the developer of a skill for blind people, who use their voice to access information that others get from screens. 

Skills and Actions developers are seriously worried.

 On the other hand, what other conduits are there for two-way, intelligent voice applications in the house? Well, the one we’ve always had: the telephone (no matter if fixed or mobile). Granted, calling an app over the phone is a little more complex that simply saying “Hey Google”, but everyone knows how to use a phone and the technology could not be more tried-and-true. The problem then is connecting existing intelligent applications to the telephone network.

PhoneMyBot as the conduit for voice apps

Interactive Media offers PhoneMyBot, a service born to expand the channels available to chatbots to include voice channels. It performs the same functions that are done by intelligent speakers for their apps, transcribing the users’ speech and sending it to the connected application. Then it receives text in return and transforms it into speech, piping it into the voice network. PhoneMyBot is natively integrated into the telephone network and exposes to apps an API equivalent to the ones from Alexa and Google Assistant. In addition, PhoneMyBot integrates with a number of contact center suites to transfer the call to a human agent if necessary.

What makes PhoneMyBot appealing to small organizations that may become stranded if intelligent speakers decline too much? It’s extremely easy to try: an initial trial period is free, and commercial traffic is billed at a (low) per-minute rate independently from the traffic volume. This makes it ideal for low-budget, pay-as-you-go services. The administration is simple and powerful: a single portal provides access to all the traffic data and stats. And its robust, with an infrastructure built on telco-grade software, managing millions of calls per month.

Go ahead, try it! Click the button below.

Other Articles

Other Articles

Making simple chatbots better

Making simple chatbots better

Many deployed chatbots are far from holding real conversations. But they too can be enabled for fluent dialog. This is how we do it.When you consider chatbots these days you think of ChatGPT, Google Bard, Bing Chat, etc. These are all based on Large Language Models...

read more
LLM-based chatbots and how to make them more reliable

LLM-based chatbots and how to make them more reliable

ChatGPT and its siblings are all the rage in customer service chatbots. This is fascinating and terrifying. How do we take the terror out of the equation?In the past year or so we witnessed an explosion of chatbots based on Large Language Models (LLM). The adoption of...

read more
Multimodal interactions: are they breaking through?

Multimodal interactions: are they breaking through?

Last week I watched a webinar and demo by a company providing tools and solutions for conversational customer service. Interactive Media, where I work, is in the same sector and I wanted to scoop out a competitor, see what they have and how they are presenting their...

read more

Interact with us


Receive our exclusive content: