Replies: 1 comment
-
The underlying model has indeed been specifically trained to recognize at most 2 simultaneous speakers. You'll need to train (or finetune) a new one with |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have an audio that includes parts where three or more speakers are speaking at the same time. However, the pre-trained
pyannote
models seems to have problems recognizing this.What I hoped to try:
min_duration_on
for pre-trained model - maybe the overlapping speeches are too short and setting this to0
would be more sensitive to multiple speakers recognition. However, I was not able to achieve this, potential solutions are discussed here with only link to documentationmax_speakers_per_chunk
ormax_speakers_per_frame
- however, this seems possible only for fine-tuning the models.Is there an easy way to use
pipeline
with making the model more willing to recognize three or more speakers at the same timestamp?Current code, which recognizes only up to two speakers at given timestamp:
The audio file is a test video compiled from Mozilla Common Voice dataset, the last 15 seconds contain overlapping speech of three people. I haven't found an example where three or more speakers at the same timestamp are recognized.
Beta Was this translation helpful? Give feedback.
All reactions