I didn’t personally do a speech technology prediction column at the beginning of 2009, but one of my colleagues at Speech Technology Magazine did. Eric Barkin wrote a feature article, entitled, 2009: What the New Year Will Bring, in which he talks about the effects of the economy on speech technologies, and on some of the predictions for speech from some of my analyst colleagues. It’s debatable whether the economy has gotten that much better in a year, but that didn’t stop the speech industry from moving ahead. Harvesting nuggets from Eric’s column, here are the predictions from that column, followed by what I think did happen in his categories, along with some others that didn’t make it on his list, but made headlines in 2009 nonetheless.
• Recession hit businesses will continue to employ speech as a way to automate business processes and cut costs
• Continued rise of hosted speech solutions, particularly as a way of avoiding capital outlay or as a try and buy before investing in an on premises solution
• Healthy growth in outbound speech solutions
• Decrease in growth in voice biometrics
• Slowdown in the development of statistical language models (SLMs)
• Healthy growth in mobile speech applications
• Growth in automated speech-based dictation and transcription in healthcare
• Further consolidation of industry players
Did we hit the mark on these predictions?
Recession
I think the recession did play a part in a bit of a slowdown, but it didn’t stop customers from buying. Instead it slowed the sales cycle down, with a lot more justification and layers of signatures required before deals were signed. Deals did go through, but add on applications, such as voice verification, were shifted later in the sales cycle. What benefitted from this caution? Hosted solutions, our next prediction, were a clear winner.
Hosted Applications
Hosting is hotter than ever. It was a topic gaining momentum that I referenced in my SpeechTek blog in August. We have seen a continued increase in the number of vendors that are offering hosted options for customers in self-service, contact centers and unified communications, all or in part. As examples, Convergys will host a company’s entire customer support, portions of it, or just a single speech component; voice verification. Interactive Intelligence provides hosting for everything from the contact center and unified communications to the PBX piece with their Communication as a Service (CaaS) offering. Everyone seems to be getting into the hosted space, and it is very much a smorgasbord of hosted variations so that a customer can decide how much control and up front expenditure they want, which is a great thing for customers.
Outbound
Yes, yes, and yes. Outbound gained lots of momentum in 2009. Raise your hand if you have gotten more than 10 outbound notifications on your phone in the past year, and not just for political campaigns. There has been an increase in interactive outbound application that allow you to do something with the call, not just listen to it, along with an increase in more usable outbound messages. I know this is the case with me as I’ve been getting everything from prescription notifications, to calls that a package is going to be delivered, to flight delay notifications, and my friends and colleagues have mentioned this as well. In fact, one of those outbound calls saved my bacon on New Year’s day when my flight got cancelled and I heard in time to get one of the last seats out on another carrier! But it’s not just a conversation starter, there has been a big increase in contact center/self-service vendors this year introducing or polishing their outbound notification offerings, and tying them to actionable business outcome, and for businesses, that actionable outcome produced cost savings in getting to customers without people overhead, and in getting customers to take action; such as refill a prescription or pay down some credit card debt. As with hosted, everyone seems to be adding outbound capabilities to their platforms. See my Microsoft speech announcement blog in October.
Voice Biometrics
This one showed more promise than the prediction that it would decline, not so much in numbers, since I don’t have those, but in vendor emphasis and some notable wins. Convergys, for example, made a big push for verification applications this year, and continues to do so. Certain vertical markets, particularly finance, announced an increase in the use of verification to provide an added authentication layer for customers.
Slowdown in SLMs
I’m afraid I can’t speak to this as I don’t have a good feel for what happened with the development of SLMs.
Mobility
Hot. Through a combination of better technology, convenience, and fear, speech-driven mobility applications are hot. Yes fear. This has been brewing for a long time, folks, but fear, much justified fear, around the dangers of the use of mobile phones will driving, particularly texting while driving, has driven (hee hee) a surge of innovation in the mobility space, not to mention a “Distracted Drivers Summit” in November. In fact, an article on MSNBC.com cited a driving simulator study that said that drivers median reaction time increases by 30% when texting and 9% when talking on a phone, while driving.
Vendors took notice of not only this, but the productivity gains from using speech to accelerate the speed of tasks on the phone, even when the user isn’t driving. Take a look at the October announcement of the Sprint Samsung Intrepid mobile phone that provides users with not just voice-activated dialing, but Microsoft Bing-powered mobile search, and voice control of the phone from dialing to commands that allow a user to create and send a text. Powered by Microsoft’s Tellme platform, the Intrepid allows a user to press one button to get to speech access of features.
That was not all. 2009 saw a lot of companies adding to products and product roadmaps, speech-to-text applications and text-to-speech output. Once again voice search was hot as well. We had the second annual Voice Search conference in San Diego in March, covering all aspects of using voice to search, from contact centers, mobile devices to unified communications. There were lots of announcements around the concept of Voice Search. Google announced voice search a year ago, and recently added Mandarin Chinese for speakers using Nokia S60 smart phones, for example, and will be adding support for other phones in the near future.
Industry Consolidation
I don’t believe that this was a great year for consolidation, but instead business as usual. This year the acquisitions in speech were not many, bolstered only by the end of year acquisition of speech-to-text purveyor, SpinVox by Nuance. SVOX acquired the speech assets of Siemens AG in January. Sakhr Software acquired Dial Directions (Arabic ASR – Network ASR) in June. Nuance buys voice-to-text vendor Jott in July. Raytheon acquired BBN in September. Nuance acquired Ecopy in October. Voxeo acquired Motorola’s VoiceXML browser technology in October. None of these were huge headliners such as the Avaya-Nortel acquisition or HP acquiring 3Com, and even if you level the playing field by saying that we don’t have huge speech technology companies, it still isn’t up there with Nuance acquiring SpeechWorks either. Instead it was more in line with companies filling niches, such as needing voice-to-text or voice search, or gaining intellectual property, more than it was big consolidation plays. 2008 was a much more entertaining year for murders and acquisitions, as I sometimes call it.
What else happened?
Text-to-Speech Expansion
Typically taking a back seat to speech recognition, text-to-speech had a big year in 2009, not just because it’s a necessary piece of the user interface’s found in UC, self-service and mobility applications (such as the ubiquitous use of TTS in GPS devices), but also because its being extensively used in the assistive technology market too. For example, a number of E-Readers have hit the market which allows people to read books on a device rather than the old fashioned way. Amazon’s Kindle was the original reader that boasted having TTS, but when the Authors Guild famously objected to the Kindle shipping with TTS activated, causing what they claim was a copyright infringement because a separate device would now be reading books; Amazon backed down and shipped the Kindle with TTS being an option for publishers to provide. This is a truly sad step back in what could be a boon to sight-impaired readers. Other e-readers, such as Barnes and Nobles Nook, the Sony Reader, or one coming out from Apple next year have the potential for TTS to be employed. Certainly in the case of Apple it makes sense because they have a very good TTS product of their own.
Other handheld devices that assist the visually impaired were announced, including the Intel Reader, which uses Loquendo TTS read text that has been converted from print to digital by the device. The text is captured by an integrated high-resolution camera in the device, and it can either be viewed on the device’s 4.3-inch LCD screen, as is or enlarged for better viewing, or listened to with the TTS.
LG Electronics, the world’s third largest mobile phone maker, announced a mobile phone for the blind that helps to read the entire phone menu and text messages, using TTS, and using ASR for voice dialing, search, and command and control functions. It also has raised points on the keypad to help visually impaired to find buttons.
Unified Communications
The growth in the addition of speech technologies to unified communications and collaboration wasn’t predicted in any big way other than to say that vendors are adding speech and text-to-speech to user interfaces, particularly as part of mobility. However, I was personally pleased and surprised by the amount that was added. Cisco, for example, completely wove speech technologies, along with video, into UC and collaboration in a big way near the end of 2009. Cisco has threaded speech across their entire UC and collaboration portfolio, from voice search to command and control, but also throughout some interesting new products they have such as Cisco Pulse and Show and Share. In the later, a user can search video to find clips that they want rather than having to watch the entire video, for example.
That isn’t all, but just enough to say it was a good year for speech technologies. Tomorrow I’ll post what I think will happen in 2010.

Subscribe