Matthew
Stibbe's Homepage Voice Recognition Software |
Headlines:
This is a review of leading voice recognition software from mid-2000.
I
have been sceptical of any claims of voice or handwriting recognition by
computers since I spent a small fortune on two Apple Newtons several years ago.
These gadgets claimed to be able to read handwriting and turn it into
text. So it was with a certain reserve that I parted with £79.99 for Lernout
and Houspie's VoiceXpress Pro and £109.99 for Dragon Systems' NaturallySpeaking
Preferred which both claim near-100% accuracy for voice recognition. I didn’t expect them to work at all.
In the event, they did deliver remarkable voice recognition - it was
almost like science fiction. What stopped them being useful business tools (yet) is something else
entirely.
Installing
both programs was easy - pop in the disk, run the usual install program and then
sit through about thirty minutes of training. Both programs have tutorials to help new users set up and use the system
and they both have training programs that help the programs get used to your
voice and PC set-up. Both systems require you to use a headset with an earphone and a mike -
with the result that one looks a bit like a phone operator. The Dragon Systems one is better because has a bendable mike boom which
makes it easier to position. Positioning turns out to be vital as I found out later.
Once the programs are installed you can begin dictation straight away. At first it seems almost amazing - you say something and it comes up on
the screen in text a few seconds after you said it.
Disappointment
sets in almost immediately afterwards, however. At the first hesitation - a natural 'umm' in your speech - a spurious
word appears. Go a bit too fast or a bit too slow or use a unusual word (although they
have vast dictionaries they don't include every word and you have to teach them
new words individually) and they start to display gobbledegook. Position the microphone a few millimetres away from the correct position
and accuracy drops dramatically. And heaven help you if you have to take the headset off to answer the
phone and then put it back on again - adjusting it takes a few seconds each
time. It's a bit like giving dictation to a foreigner who has a perfect
knowledge of half the dictionary but no understanding of grammar and who can
only hear you if they stand exactly five inches to your left but nowhere else.
It's amazing when it works but not much use.
All
of this means that you have to spend a lot of time checking and correcting work.
It is possible to do this by voice. You say things like "go to the beginning of the line, select 'to
whom it may concern', delete that", but these commands are subject to the
same risks of misunderstanding as dictation. When either program gets confused between commands and dictation the
screen rapidly fills up with all kinds of nonsense. This
brings you quickly back to editing by eye and keyboard and undermines confidence
in the whole system.
For
me, the final nail in the coffin for these programs is the way in which they are
integrated into the operating system and Microsoft Office, or more to the point,
the way that they are not integrated. Firstly, both programs take an age to load, even on a fast Pentium PC
with lots of memory. NaturallySpeaking provides a sort of dictation pad - a simple word
processor - but if you want to dictate directly into Word or Outlook it uses a
slightly different approach and the recognition accuracy seems to drop
dramatically. VoiceExpress is better in this regard but hogs a large chunk of screen
real estate with its menu bar which is the same size as the Windows 95 Taskbar.
Both
programs seem more or less equal in terms of actual recognition ability and both
offer an equally long list of back-of-the-box features and both suffer from the
same fundamental problems. I think that, in its present form, dictation software would work for
people who can't or won't type and I would imagine it is a boon for the
dyslexic. It may also work better if you're used to giving dictation - I am not.
For anyone else, I would advise waiting until the usability gets better. It seems to me the difficult problem - the recognition - has been solved
and it is only a question of integration, user interface design and putting a
bit more smarts into the grammar side of things to make it all work. I would also like a voice recognition system that didn't require a
headset but I suppose this might not work in an open plan office with everyone
talking at once.
These
programs are a remarkable triumph of computer science. However, they only show us how much more we need to do in order to turn
computer speech recognition into speech understanding. Until this happens, I would recommend getting a secretary or learning to
type!