Transcribe MP3 audio file with Bing Speech API (speech to text)

I have a long recording (hour+) in the format of MP3. The following is the info i managed to get from FFMPEG about the audio file:

[mp3 @ 000001fe666da320] Skipping 0 bytes of junk at 58650.
[mjpeg @ 000001fe666effe0] Changing bps to 8
[mp3 @ 000001fe666da320] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '1.mp3':
Duration: 00:57:18.52, start: 0.000000, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, mono, s16p, 192 kb/s
    Stream #0:1: Video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1300x1370, 90k tbr, 90k tbn, 90k tbc

I would like to use Bing Speech API (Microsoft Oxford - Cognitive Services - Speech API) to transcribe this file (speech to text).

I believe that this is achievable by using something like the code below.

Option 1: before sending up any audio data, you must first send up an SpeechAudioFormat descriptor to describe the layout and format of your raw audio data via DataRecognitionClient's sendAudioFormat() method. Can you provide a code sample for this option?

Option 2: converting the file to the target's acceptable format. I have done that with FFMPEG and this is what i got:

Duration: 00:57:23.67, bitrate: 256 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

As I understand from the documentation, this should be acceptable: The audio must be PCM, mono, 16-bit sample, with sample rate of 8000 Hz or 16000 Hz.

I tried to send the audio to the server but did not get any reply. Am I on the right tracks? What is the maximum buffer size?

Do u see other, maybe easier option to get my audio file transcribed?

private void SendAudioHelper(string wavFileName)
        {
            using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
            {
                int bytesRead = 0;
                byte[] buffer = new byte[1024];

                try
                {
                    do
                    {
                        // Get more Audio data to send into byte buffer.
                        bytesRead = fileStream.Read(buffer, 0, buffer.Length);

                        // Send of audio data to service.
                        this.dataClient.SendAudio(buffer, bytesRead);
                    }
                    while (bytesRead > 0);
                }
                finally
                {
                    // We are done sending audio.  Final recognition results will arrive in OnResponseReceived event call.
                    this.dataClient.EndAudio();
                }
            }
        }

There is a limit of 15 seconds when you use the REST implementation. SDK has a limit of 2minutes.

Bing Speech team

链接地址: http://www.djcxy.com/p/34454.html

上一篇: 如何使用Codename One的Google Speech API?

下一篇: 用Bing Speech API转录MP3音频文件(语音到文本)