This is a try of spectrogram detector by yolo cloned from AlexeyAB/darknet.git.
Sound tagging (Audio tagging) commonly uses whole spectrogram. This aims to detect local area in spectrogram, which has any specific feature.
This time, number of train and test, is very small, 84 and 20.
There are only two classes, voice and instrument, that means bent lines and flat lines,
although they don't mean true voice and true instrument.
mAP is still not good.
Please refer to and
And also refer to to add number of train and test.
Please refer making_spectrogram folder about making this dataset.
to contonued in github repository