'Talk to Me' explains a paradigm shift in the way we consume information

And if you have started wondering why such innovations come only from American companies, Ajit Balakrishnan offers the answer.

Illustration: Dominic Xavier/Rediff.com

If you have ever experienced the half-British-half-desi-accented voice on Google Maps and wondered how she can speak to you so knowledgeably about every back-lane in the town in India in which you live, you are already on the path to reading this book -- Talk to Me: Amazon, Google, Apple and the Race for Voice-Controlled AI.

More so if you are tech-enthused and already are experimenting with gadgets like Google Home and Amazon Echo to which you can speak and give directions.

James Vlahos, an American journalist and a frequent contributor to The New York Times Magazine, Wired, Popular Science, GQ and others, says these are not mere cute gadgets for you to play with.

They represent a paradigm shift from the world of computing that we have known so far, a world in which we used to type text on keyboards and get replies in the form of text or pictures on the glass screens of desktop computers and mobile phones.

We are entering the era of voice computing.

This is a shift from a paradigm that can be traced back to 1867 when the QWERTY-layout for typewriter keyboards was first patented and which has since then been our way of interacting with machines -- whether they are manual or electric typewriters, computer keyboards up to the keyboards on the smartphones to which we are all addicted today.

Impressed as we are with such an innovation, isn't use of voice to speak to and hear back from a gadget just another of those gimmicks that tech companies throw at us and make us spend money on?

Can this use of voice cause anything 'revolutionary'?

As a first example, Vlahos points to the threat to our current practice of navigating from page to page on the Internet by pointing and clicking, or by entering a word or two and launching a search on the WSeb.

Or being presented with banner ads and other links attempting to persuade us to click and take us to some destination we had not imagined existing.

These attempts to lure us to click is the economic foundation of a booming Internet advertising and online media industry that threatens to weaken if not demolish print, television and cinema.

Click-bait advertising may itself be self-immolated in the emerging world in which we will no longer click or type, but merely talk to our gadgets, says the author.

The battle for this voice-based economy, he says, is already underway.

Photograph: Kind courtesy, Amazon.com

Sure, Google Voice and Amazon's Alexa are early leaders, but Apple's Siri, Microsoft's Cortana and Facebook Messenger bot are also contenders.

Then there is WeChat, which already dominates the billion-user Chinese market.

Like all early-stage innovation markets, there are still questions on how these companies will make money from being leaders in voice technology.

Some believe that selling devices like Echo and Home will be the way.

Will advertising have a role, as in voice ads (of the type we hear on the radio)? Or will it be shopping recommendations? No one is sure yet.

The author also describes the many different deep technologies that make all these voice miracles happen.

First, there is the challenge of recognising the meaning of the spoken word -- compounded manifold by the multiple languages in which we speak, then our accents and last but not the least the figures of speech and symbolisms we use ('Find me a restaurant where I can get Goa fish curry but not too spicy?').

Then there is the challenge of 'speech synthesis', the act of composing a reply that is meaningful and, metaphorically and tone-wise, appropriate to the context.

There are heavy-duty technologies involved here: 'Deep learning', 'neural networks', 'natural language understanding', 'speech synthesis' ... the list is long and challenging and usually encompassed in the phrase 'Artificial Intelligence'.

To the author's credit, he does an exhaustive survey of all the hard work that is going on in this area without flooding the average reader with unintelligible tech words.

He also artfully mixes the recounting of what key technologies do with an account of the history of how they came to be and, even more important, adds stories about the personalities of the key innovators as well as descriptions of the social relevance of their innovations.

The author also walks readers through the downside of the voice revolution.

As with the risks during all of the Information Revolution, issues such as invasion of privacy and the use of these tools by authoritarian states for surveillance exist, and the author describes these extensively.

The issues at stake are many. As the author correctly points out, 'Using language is what truly sets us apart as a species...words define and connect us...we should not become so overawed that we forget to evaluate the many risks.'

The drive for innovations is not just in areas like voice-based search or voice-based shopping recommendations.

Experiments are underway in cutting-edge media companies to deliver on-demand voice-based content, such as voice-based news summaries.

And if you have started wondering why such innovations come only from American companies, the author offers the answer.

In 2003, the US Defense Advanced Research Projects Agency launched a $200 million effort that funded more than 400 researchers in 22 universities and companies.

This is what kick-started the US lead in the voice-based AI technologies that underlie the voice computing revolution.

I hope our folks at the defence and IT ministries are paying attention to this book.