"Voice trigger" detection
I have a voice application that would be much-improved if there was the ability to use a "trigger word" to start recording audio. I don't need a full speech-text engine, just the ability to reliably/efficiently detect the trigger word.
I am wondering if there are any specialized speech engines that support this specific use case, or any libraries/methods to developing such a single-purpose detection engine. Ideally I'd like it to work in noisy environments, but it can be trained for a single user's voice.
Pointers to research papers / topics would also be appreciated so I know what to ask for.
A colleague of mine on the Red5 project created a similar demo using trigger words to cause a search to be run against an image repository. Saying "cat" caused an image of a cat to appear within about a second. The client application was written in Flash and the back-end ran on Red5 using the free Sphinx library. You could certainly do what you want with Sphinx without much effort.
Sphinx project: http://cmusphinx.sourceforge.net/sphinx4/
Okay, I could be completely off, but using a full featured speech-recognition library may be overkill for your use-case..
If you can live with something simpler but still audio driven consider this:
Detecting a hand-clap is very simple. A hand-clap will have high energy over the overall audio band. Detecting it is simple and much cheaper computational wise than full-bown speech recoginition.
In a nutshell you record the audio, do a (short time) FFT on the data and detect the case where you have high energy in 80% of the available frequency bins. 80% takes care of any phasing issues due to a simple recording-room/microphone setting. Then adjust the thresold to taste and you're done.
Doing the same with speech-recognition is possible as well, but you will burn tons of CPU cycles.
What O/S? I wonder for example whether Speech functionality in Windows Vista would help you. Recognising a single word seems like the simplest possible problem for any speech analyzer.
链接地址: http://www.djcxy.com/p/64298.html上一篇: 如果是语音或音乐,则从无线电音频流中检测
下一篇: “语音触发”检测