Detect multiple voices without speech recognition

2018-06-12 00:46:53

Is there a way to just detect in realtime if there are multiple people speaking? Do I need a voice recognition api for that?

I don't want to separate the audio and I don't want to transcribe it either. My approach would be to frequently record using one mic (-> mono) and then analyse those recordings. But how then would I detect und distinguish voices? I'd narrow it down by looking only at relevant frequencies, but then...

I do understand that this is no trivial undertaking. That's why I do hope there's an api out there capable of doing this out of the box - preferably an mobile/web-friendly api.

Now this might sound like a shopping list for Christmas but as mentioned I do not need to know anything about the content. So my guess is that a full fledged speech recognition would have a high toll on the performance.

Most of similar problems (adult/children classifier, speech/music classifier, single voice / voice mixture classifier) are standard machine learning problems. You can solve them with classifier like GMM. You only need to construct training data for your task, so:

Take some amount of clean recordings, you can download audiobook

Prepare mixed data by mixing clean recordings

Train GMM classifier on both

Compare probabilities from clean speech GMM and mixed speech GMM and decide the presence of mixture by ratio of probabilities from two classifiers.

You can find some code samples here:

https://github.com/littleowen/Conceptor

For example you can try

https://github.com/littleowen/Conceptor/blob/master/Gender.ipynb

链接地址: http://www.djcxy.com/p/34450.html

上一篇: 使用Web Speech API进行语音识别

下一篇: 在没有语音识别的情况下检测多个声音