What's the general state of speech recognition right now?

I'm currently evaluating the cyrrebt state of speech recognition (SR) technology, and it seems that there are quite a few APIs and services springing up.

My own experience of SR is that keyword matching works well with multiple speakers, and dictation works OK with trained speakers in very controlled environments. Is this still true? Are there any good approaches for doing speech to text of arbitrary audio files - could be keyword matching from audio streams for indexing or could be an attempt at full transcription.

Does anybody have any comments on the comparison of nuance versus other engines versus the open source solutions?


While newer and friendlier applications designed around speech recognition will continue to be written, speech recognition itself has reached a brick wall. The accuracy of even the best engines drops quickly in the presence of noise, a big problem for smartphone users who often use the technology in noisy environments.

A bigger and related problem is that speech recognizers cannot pick out a single voice in a roomful of voices (the cocktail party problem), something that most humans handle with relative ease. Until someone solves this problem, I'm afraid that speech recognition technology will not advance much. It's a billion dollar problem because a solution will make every existing speech recognition engine obsolete almost overnight.

链接地址: http://www.djcxy.com/p/34432.html

上一篇: 如何训练音节而不是使用HTK的手机?

下一篇: 目前语音识别的一般状态是什么?