Google语音识别API：每个单词的时间戳？

2018-06-11 23:58:18

可以使用Google的语音识别API通过请求http://www.google.com/speech-api/v2/recognize?...获取音频文件（WAV，MP3等）的转录http://www.google.com/speech-api/v2/recognize?...

例如：我已经在WAV文件中说过“一两三五”。 Google API给了我这个：

{
  u'alternative':
  [
    {u'transcript': u'12345'},
    {u'transcript': u'1 2 3 4 5'},
    {u'transcript': u'one two three four five'}
  ],
  u'final': True
}

问题：是否有可能获得每个单词所说的时间（以秒为单位）？

用我的例子：

['one', 0.23, 0.80], ['two', 1.03, 1.45], ['three', 1.79, 2.35], etc.

即在时间00：00：00.23和00：00：00.80之间已经说过“one”
在时间00：00：01.03和00：00：01.45（以秒为单位）之间已经说过“two”这个词。

PS：寻找支持其他语言而不是英语的API，尤其是法语。

这是不可能的谷歌API。

如果你想要字时间戳，你可以使用其他API，例如：

CMUSphinx - 免费的离线语音识别API

SpeechMatics SaaS语音识别API

来自IBM的语音识别API

我相信其他答案现在已经过时了。现在可以使用Google Cloud Search API：https：//cloud.google.com/speech/docs/async-time-offset

链接地址: http://www.djcxy.com/p/34355.html

上一篇: Google Speech Recognition API: timestamp for each word?

下一篇: Speech to Text (Voice Recognition) Directly from Audio / Transcription