The Philadelphia Inquirer, March 18, 1999



A Commanding Voice

The day when you can speak to your computer, VCR or telephone to get them to carry out tasks may come sooner than you expect.

By Andrea Ahles
INQUIRER STAFF WRITER

 


   In a speech technology researcher's vision of the future, your voice could run just about anything.

   No computer keyboard, no remote controls, and no secretary to screen your calls.

   You could program your VCR without touching buttons. You could pick up the phone and say ``Call Mom'' without punching in 11 digits. You could ask your computerized personal assistant to read your e-mails and schedule your appointments.

   And, like Capt. Picard on Star Trek, you could command: ``Computer, find that file.'' With advances in speech recognition technology and natural language understanding, this type of future is closer than most consumers realize.

   ``Speech is such a natural form of communication between human beings, why not between human beings and machines?'' said Hans van der Linde, vice president of sales and marketing for voice control at Philips Speech Processing.

   But the closest that software programmers have come to allowing you to talk to your computer are speech dictation programs. These programs - such as Dragon NaturallySpeaking by Dragon Systems and ViaVoice by IBM - have been on computer store shelves for the last few years.

   Early versions of dictation programs were not very accurate and were cumbersome. But as processing chips got faster, so did the programs' accuracy, said David Nahamoo, senior manager in human languages technology at IBM.

   Faster chips allowed the program's vocabulary to be expanded, Nahamoo said. And the training periods, during which the program learns the user's voice, shrank from an hour to about 20 minutes.

   Richard Ameisen of Havertown has been using the Dragon and IBM programs since January. He said he had tried older dictation programs but found he could type faster than the program could understand his words. But the new programs, Ameisen said, learned his vocabulary faster, and although he still has to speak clearly, he can speak at a more natural pace.

   ``Most of us have the tendency to lapse into bad speaking habits, but once you get the hang of it, the program really is a great convenience,'' Ameisen said.

   And with the new Pentium III processing chip, the new version of Dragon NaturallySpeaking will have a training time of only two minutes, said Roger Matus, vice president of marketing at Dragon Systems.

   Dictation systems are only the beginning. Although speech companies still have several technical hurdles to overcome, they are finding more applications in which speech technology can be used.

   Matus said the next suitable candidates for speech technology are mobile phones and handheld computers such as Palm Pilot.

   ``Computers are getting smaller and smaller. Handheld devices are really tiny and the keyboards are getting tinier, but fingers remain roughly the same size,'' Matus said.

   The challenges preventing researchers from applying speech technology widely are formidable, according to Chin Lee, head of dialogue systems research at Bell Labs.

   ``The systems we do have lack the human intelligence to help the user,'' Lee said.

   Computers are not able to have open-ended conversations with people and have difficulty understanding people with different dialects, Lee said. The quality of the microphone that hears the words is also a problem, he said.

   Nevertheless, consumers will encounter speech recognition technology more and more in their everyday lives. Consumers are more likely to use the technology over the telephone - many already have.

   In 1992, AT&T introduced a computer operator system that asked callers whether they wanted to make a collect or calling card call and recognized those specific words. Now, more complex systems have been introduced by the brokerage Charles Schwab, which lets its customers check on specific stocks and mutual funds, and by American Airlines, which provides flight information.

   Experts say this is just the beginning of speech recognition being used to simplify transactions over the phone and signals the end of touch-tone menus. The technology is allowing more information to become available by phone.

   In Boston, people can call Foodline and ask ``Chef Bob'' for local restaurant recommendations. Introduced in January, the service prompts consumers for type of cuisine, location and price they would like before Chef Bob lists suggestions and provides detailed reviews, said Paul Lightfoot, president and chief executive officer of Foodline.

   Lightfoot said he hopes to add a real-time reservation system to the phone number and the Foodline Web site. Foodline will be launched in New York in July and may start in Philadelphia by the end of the year, he said. New York already has an experimental voice recognition version of MovieFone - the popular phone number used to find out which movies are playing at local theaters.

   Because telephones are already central to doing business and finding out limited types of information, experts say it is logical to implement speech recognition systems in telephones first. Eventually, everything that can be accessed over the Internet will be accessible over the telephone, according to Jay Wilpon, a speech processing division manager at AT&T Labs.

   ``I want to browse the Internet when I'm not in front of my PC and I can't do that right now,'' Wilpon said.

   AT&T, in partnership with Lucent Technologies and Motorola, announced two weeks ago an initiative to standardize a computer language that will make the World Wide Web accessible by phone. Voice eXtensible Markup Language, or VXML, is being developed by 20 companies and would function like HTML, the language used to construct Web sites.

   While many speech technology companies are focusing on telephone-based or computer dictation applications, Philips, the consumer electronics company, wants to put the technology to work in its products.

   Philips, which produces more than 2.5 million televisions a year, is eager to make them easier to use, van der Linde said. People may love their handheld remote controls, but they often lose the devices between their couch cushions, he said.

   The company has already tested voice-controlled televisions in Germany and Singapore. The company hopes to have voice-controlled TVs on the market by 2001, he said.

   Another application that researchers are exploring is the concept of a virtual personal assistant that can take voice messages, read e-mails and conduct research for you.

   For several years, AT&T Labs has been working on what it calls the Baldi family - a set of three-dimensional computer models of humans that can have conversations with people. One of them is a prototype named Katherine, who looks natural but speaks with a computerized voice, said Juergen Schroeter, a manager in AT&T's speech processing software and technology division.

   Researchers have been able to improve the program's intelligence and have it convey body language as it speaks, Schroeter said. Having a face to look at as you speak to a computer is important for the user and makes the interaction seem more natural, he said. AT&T may have commercially viable personal assistants ready by 2003, Schroeter said.

   Despite all the advances speech recognition technology researchers have made in the last few years, there are still microphone, language-understanding and user-interface problems that need to be solved.

   ``Most people have visions of Star Trek in their head,'' Matus said. ``We are on that path, but we have a couple hundred years to get there.''

© 1998 Philadelphia Newspapers Inc.