How Google Speech Recognition works?

I'm aware of audio fingerprinting to recognize audio files and it is awesome, but what I really wanna know is how Google makes its Speech Recognition API, how did they take audio and returned words.

I wrote a gem to fingerprint wav audio files and compare them, but if I use fingerprint to compare my voice against a database full of fingerprints, it will probably take forever. How Google do it?

Purpose:

I'm really into Speech Recognition and I want a place to start coding it, but I don't have a clue on where to start. DragonVoice is another example of Speech Recognition software and all this softwares that are out there are really fast.

I want to know the server-flow from getting an audio record to transform it into text.


Use the source, Luke :-)

Best of breed open source speech recognition software (imho) : CMUSphinx http://cmusphinx.sourceforge.net/

The learning curve is a bit steep, but it should be worth it...

FWIW, the description of the tag voice-recognition on Stackoverflow says : Voice Recognition means identification of the person talking and is frequently misapplied to mean "Speech Recognition" - identification of what is being said.

As quoted, it's a very common mistake :-)

Have fun !

链接地址: http://www.djcxy.com/p/34444.html

上一篇: 使用语音/语音识别在我的应用程序中执行指令

下一篇: Google语音识别如何工作?