"Voice trigger" detection

2018-06-22 22:12:20

I have a voice application that would be much-improved if there was the ability to use a "trigger word" to start recording audio. I don't need a full speech-text engine, just the ability to reliably/efficiently detect the trigger word.

I am wondering if there are any specialized speech engines that support this specific use case, or any libraries/methods to developing such a single-purpose detection engine. Ideally I'd like it to work in noisy environments, but it can be trained for a single user's voice.

Pointers to research papers / topics would also be appreciated so I know what to ask for.

A colleague of mine on the Red5 project created a similar demo using trigger words to cause a search to be run against an image repository. Saying "cat" caused an image of a cat to appear within about a second. The client application was written in Flash and the back-end ran on Red5 using the free Sphinx library. You could certainly do what you want with Sphinx without much effort.
Sphinx project: http://cmusphinx.sourceforge.net/sphinx4/

Okay, I could be completely off, but using a full featured speech-recognition library may be overkill for your use-case..

If you can live with something simpler but still audio driven consider this:

Detecting a hand-clap is very simple. A hand-clap will have high energy over the overall audio band. Detecting it is simple and much cheaper computational wise than full-bown speech recoginition.

In a nutshell you record the audio, do a (short time) FFT on the data and detect the case where you have high energy in 80% of the available frequency bins. 80% takes care of any phasing issues due to a simple recording-room/microphone setting. Then adjust the thresold to taste and you're done.

Doing the same with speech-recognition is possible as well, but you will burn tons of CPU cycles.

What O/S? I wonder for example whether Speech functionality in Windows Vista would help you. Recognising a single word seems like the simplest possible problem for any speech analyzer.

链接地址: http://www.djcxy.com/p/64298.html

上一篇: 如果是语音或音乐，则从无线电音频流中检测

下一篇: “语音触发”检测