How does Google Speech API chunk the audio for transcription?
When the Google Speech API returns long audio transcriptions, it returns it in the form of short chunks of text of varying length, each with some associated confidence value. I was wondering how the underlying algorithm decides where to place boundaries between the transcribed chunks of audio, since it seems to be more complicated than simply chunking the audio into fixed-duration pieces and transcribing each separately (although I could be wrong about this).
链接地址: http://www.djcxy.com/p/34422.html