Microphones and signal quality for optimal speech recognition
Microphones make a big difference in speech recognition accuracy.
Basically you need to have a 30dB signal-to-noise ratio for optimum
accuracy, and many PC microphones don't do much better than 20dB in
optimal operation, and sometimes much worse.
How to check for good signal quality
A good way to check the signal quality is to record a bit of audio
using your setup, and look at the waveform in some waveform editor
like Entropic's ESPS/Waves system; the signal should be quite flat
during the silences and the speech-signal peaks should have sample
values of at least +-5000 or more, even better if they are +- 10000 or
15000 in a loud vowel in the word "now". At the same time there
should be no clipping, where the sample values are maxed out at
+-32768; that introduces very bad distortion.
How to fix poor signal quality
If you have clipping, turn the volumes down. If peaks are too low,
turn the volume up. If the noise floor is visible instead of quite
flat, then see if you can improve the SNR by moving the microphone
closer to the speaker's mouth, by removing noise sources in the
environment, or by increasing the signal volume (for example, if you
have a preamp which you can use to crank up the level of the signal
before it is added to the noise inside the sound card, thus
overpowering the noise and producing a high signal-to-noise ratio).
If you can't get close to a flat noise floor in a waveform display
under any circumstances, then you should definitely try a better mic
or mic-plus-preamp setup.
Some recommended microphones
The Shure SM10A noise-cancelling headset microphone plus the Shure
FP11 preamp. This mic/preamp setup is good for relatively noisy
environments because not only does it have clean recording
electronics, but also it improves the SNR both by keeping the mic
1-2cm from the mouth so that the speech drowns out everything else,
and also by doing a certain amount of directional noise cancellation.
This is the standard data collection mic setup in the speech R&D
community, so most speech recognition systems have been developed
using data that went through one of these, but the cost is on the
order of $400(US) including the preamp. Other preamps I've seen used by
speech people are available from ART and Rane. Preamp and mic cost
around $110-150 for each one, and I got them from a professional music
recording equipment store (Guitar Center, which has stores in at least
San Jose and Seattle)
The Knowles VR-3565 is a good quality microphone which they have
standardized on at Entropic in Cambridge England, where the speech
recognition core technology development at Entropic is done. It costs
in the neighborhood of $75.
The Andrea ANC-200 handset with preamp. This works great in trade
shows because you can hand someone a handset and they talk to it like
a phone, the sound is good. But it seems to be going out of
production, and it seems to break easily. Cost was around $70.
Copyright © 1998-2003
Sprex, Inc.
All rights reserved.
Modified: March 4, 2003