Matthew Stibbe's Homepage Writer Everything I've Written Voice Recognition

Voice Recognition Software

Headlines:
This is a review of leading voice recognition software from mid-2000.

I have been sceptical of any claims of voice or handwriting recognition by computers since I spent a small fortune on two Apple Newtons several years ago.  These gadgets claimed to be able to read handwriting and turn it into text.  So it was with a certain reserve that I parted with £79.99 for Lernout and Houspie's VoiceXpress Pro and £109.99 for Dragon Systems' NaturallySpeaking Preferred which both claim near-100% accuracy for voice recognition.  I didn’t expect them to work at all.  In the event, they did deliver remarkable voice recognition - it was almost like science fiction.  What stopped them being useful business tools (yet) is something else entirely.

Installing both programs was easy - pop in the disk, run the usual install program and then sit through about thirty minutes of training.  Both programs have tutorials to help new users set up and use the system and they both have training programs that help the programs get used to your voice and PC set-up.  Both systems require you to use a headset with an earphone and a mike - with the result that one looks a bit like a phone operator.  The Dragon Systems one is better because has a bendable mike boom which makes it easier to position.  Positioning turns out to be vital as I found out later.  Once the programs are installed you can begin dictation straight away.  At first it seems almost amazing - you say something and it comes up on the screen in text a few seconds after you said it.

Disappointment sets in almost immediately afterwards, however.  At the first hesitation - a natural 'umm' in your speech - a spurious word appears.  Go a bit too fast or a bit too slow or use a unusual word (although they have vast dictionaries they don't include every word and you have to teach them new words individually) and they start to display gobbledegook.  Position the microphone a few millimetres away from the correct position and accuracy drops dramatically.  And heaven help you if you have to take the headset off to answer the phone and then put it back on again - adjusting it takes a few seconds each time.  It's a bit like giving dictation to a foreigner who has a perfect knowledge of half the dictionary but no understanding of grammar and who can only hear you if they stand exactly five inches to your left but nowhere else.  It's amazing when it works but not much use.

All of this means that you have to spend a lot of time checking and correcting work.  It is possible to do this by voice.  You say things like "go to the beginning of the line, select 'to whom it may concern', delete that", but these commands are subject to the same risks of misunderstanding as dictation.  When either program gets confused between commands and dictation the screen rapidly fills up with all kinds of nonsense.  This brings you quickly back to editing by eye and keyboard and undermines confidence in the whole system.

For me, the final nail in the coffin for these programs is the way in which they are integrated into the operating system and Microsoft Office, or more to the point, the way that they are not integrated.  Firstly, both programs take an age to load, even on a fast Pentium PC with lots of memory.  NaturallySpeaking provides a sort of dictation pad - a simple word processor - but if you want to dictate directly into Word or Outlook it uses a slightly different approach and the recognition accuracy seems to drop dramatically.  VoiceExpress is better in this regard but hogs a large chunk of screen real estate with its menu bar which is the same size as the Windows 95 Taskbar.

Both programs seem more or less equal in terms of actual recognition ability and both offer an equally long list of back-of-the-box features and both suffer from the same fundamental problems.  I think that, in its present form, dictation software would work for people who can't or won't type and I would imagine it is a boon for the dyslexic.  It may also work better if you're used to giving dictation - I am not.  For anyone else, I would advise waiting until the usability gets better.  It seems to me the difficult problem - the recognition - has been solved and it is only a question of integration, user interface design and putting a bit more smarts into the grammar side of things to make it all work.  I would also like a voice recognition system that didn't require a headset but I suppose this might not work in an open plan office with everyone talking at once.

These programs are a remarkable triumph of computer science.  However, they only show us how much more we need to do in order to turn computer speech recognition into speech understanding.  Until this happens, I would recommend getting a secretary or learning to type!