How Google Speech to Text works?

I would like to know, How google converts speech to text in their Speech Recognition API.

Have they stored almost all sounds and match them at particular frequency level or do they have some different audio encoder and decoder algorithm which analyses the voice for different sound pattern like "A", "The" , "B", "V", "D", "Hello" etc.,

It will also be great. if some one could share, How audio are encoded and how stored audio can be filtered with all different sounds, for an example :-

Music which has sound of playing guitar, drum and voice, I would like to filter them out in 3 output with guitar sound separately, drum sound separately, voice sound separately and further decoding voice to text.

Any documentation link or research paper for university would be great.

Thanks


Google speech recognizer is described here. To understand it you probably need to read a textbook Automatic Speech Recognition A Deep Learning Approach first.

Separation of guitar and drums is usually implemented with Non-Negative Matrix Factorization.

链接地址: http://www.djcxy.com/p/5880.html

上一篇: 本土语音到文本

下一篇: 谷歌语音到文本的工作原理