How does Google Speech API chunk the audio for transcription?

When the Google Speech API returns long audio transcriptions, it returns it in the form of short chunks of text of varying length, each with some associated confidence value. I was wondering how the underlying algorithm decides where to place boundaries between the transcribed chunks of audio, since it seems to be more complicated than simply chunking the audio into fixed-duration pieces and transcribing each separately (although I could be wrong about this).

链接地址: http://www.djcxy.com/p/34422.html

上一篇: 用音频文件测试Google Speech API

下一篇: Google Speech API如何将音频组合到转录中?