How Google Speech Recognition works?
I'm aware of audio fingerprinting to recognize audio files and it is awesome, but what I really wanna know is how Google makes its Speech Recognition API, how did they take audio and returned words.
I wrote a gem to fingerprint wav audio files and compare them, but if I use fingerprint to compare my voice against a database full of fingerprints, it will probably take forever. How Google do it?
Purpose:
I'm really into Speech Recognition and I want a place to start coding it, but I don't have a clue on where to start. DragonVoice is another example of Speech Recognition software and all this softwares that are out there are really fast.
I want to know the server-flow from getting an audio record to transform it into text.
Use the source, Luke :-)
Best of breed open source speech recognition software (imho) : CMUSphinx http://cmusphinx.sourceforge.net/
The learning curve is a bit steep, but it should be worth it...
FWIW, the description of the tag voice-recognition on Stackoverflow says : Voice Recognition means identification of the person talking and is frequently misapplied to mean "Speech Recognition" - identification of what is being said.
As quoted, it's a very common mistake :-)
Have fun !
链接地址: http://www.djcxy.com/p/34444.html下一篇: Google语音识别如何工作?