Releases: v-dvorak/omr-layout-analysis
OLA model v1.0 2024-10-10
The model file comes with a set of arguments that were used to create it.
- Model performance evaluation at release
- TODO demo interference
OLA model v0.9-2024-08-28
The model file comes with a set of arguments that were used to create it.
- TODO demo interference
Evaluation at release
We compare the YOLOv8m model with the Faster R-CNN model implemented using TensorFlow, previously utilized for a measure detector by A. Pacha.
Three out-of-domain tests and one in domain (using 90/10 train/test split) tests were performed to evaluate models' performance and the results were measured with three metrics - recall, mAP50 and mAP50-90. We used the pycocotools
Python library to calculate these metrics.
Out-of-domain evaluation
id | AudioLabs v2 | Muscima++ | OSLiC | MZKBlank |
---|---|---|---|---|
1 | ✅ | ✅ | ❌ | ✅ |
2 | ❌ | ✅ | ✅ | ✅ |
3 | ✅ | ❌ | ✅ | ✅ |
In-domain evaluation
id | AudioLabs v2 | Muscima++ | OSLiC | MZKBlank |
---|---|---|---|---|
4 | ✅ | ✅ | ✅ | ✅ |
(90/10 train/test split)
Results
Test 1
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 72,028 | 0.620 | 0.727 | 0.507 | 0.679 | 0.554 | 0.571 |
Stave measures | 220,868 | 0.278 | 0.678 | 0.204 | 0.336 | 0.580 | 0.249 |
Staves | 55,038 | 0.355 | 0.921 | 0.295 | 0.430 | 0.829 | 0.334 |
Systems | 17,991 | 0.736 | 0.945 | 0.697 | 0.965 | 0.978 | 0.949 |
Grand staff | 17,959 | 0.744 | 0.982 | 0.701 | 0.815 | 0.901 | 0.792 |
All | 383,884 | 0.547 | 0.851 | 0.481 | 0.654 | 0.790 | 0.579 |
Test 2
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 24,186 | 0.864 | 0.989 | 0.827 | 0.804 | 0.934 | 0.770 |
Stave measures | 50,064 | 0.565 | 0.976 | 0.494 | 0.596 | 0.921 | 0.535 |
Staves | 11,143 | 0.581 | 0.939 | 0.511 | 0.643 | 0.939 | 0.584 |
Systems | 5,376 | 0.873 | 0.989 | 0.832 | 0.892 | 0.960 | 0.860 |
Grand staff | 5,375 | 0.763 | 0.973 | 0.699 | 0.893 | 0.960 | 0.859 |
All | 96,144 | 0.729 | 0.973 | 0.673 | 0.766 | 0.943 | 0.722 |
Test 3
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 2,888 | 0.217 | 0.256 | 0.140 | 0.166 | 0.153 | 0.123 |
Stave measures | 4,616 | 0.062 | 0.196 | 0.026 | 0.243 | 0.420 | 0.174 |
Staves | 883 | 0.045 | 0.061 | 0.008 | 0.416 | 0.723 | 0.329 |
Systems | 484 | 0.173 | 0.237 | 0.111 | 0.191 | 0.192 | 0.140 |
Grand staff | 94 | 0.369 | 0.393 | 0.164 | 0.889 | 0.758 | 0.747 |
All | 8,965 | 0.173 | 0.229 | 0.090 | 0.381 | 0.449 | 0.303 |
Test 4
Class | Instances | Pacha's R-CNN | YOLOv8m | ||||
---|---|---|---|---|---|---|---|
Recall | mAP50 | mAP50-95 | Recall | mAP50 | mAP50-95 | ||
System measures | 9,151 | 0.962 | 0.989 | 0.943 | 0.980 | 0.987 | 0.975 |
Stave measures | 27,294 | 0.876 | 0.979 | 0.831 | 0.946 | 0.989 | 0.930 |
Staves | 6,816 | 0.888 | 0.980 | 0.854 | 0.900 | 0.989 | 0.888 |
Systems | 2,326 | 0.963 | 0.990 | 0.947 | 0.993 | 0.990 | 0.986 |
Grand staff | 2,285 | 0.949 | 0.996 | 0.931 | 0.996 | 1.000 | 0.993 |
All | 47,872 | 0.927 | 0.987 | 0.901 | 0.960 | 0.991 | 0.954 |
Full Changelog: Datasets...evaluation-release
Datasets at release
The final dataset is split into four logical parts:
- AudioLabs v2
- Muscima++
- OSLiC
- MZKBlank
Due to GitHub's restrictions on file size, the OSLiC dataset is split into two parts. OSLiC in COCO format keeps the same folder structure as the original dataset.
Quick Start
To train a YOLO model on the datasets, download all archives that are not tagged with COCO and combine them into one. When setting up the training pass the config.yaml
file as an argument to to the script.
Dataset Overview
images | system measures | stave measures | staves | systems | grand staves | |
---|---|---|---|---|---|---|
AudioLabs v2 | 940 | 24 186 | 50 064 | 11 143 | 5 376 | 5 375 |
Muscima++ | 140 | 2 888 | 4 616 | 883 | 484 | 94 |
OSLiC | 4 927 | 72 028 | 220 868 | 55 038 | 17 991 | 17 959 |
MZKBlank | 1 006 | 0 | 0 | 0 | 0 | 0 |
total | 7 013 | 99 102 | 275 548 | 67 064 | 23851 | 23 428 |
COCO format
zip/
img/ ... all images
json/ ... corresponding labels in COCO format
{
"width": 3483,
"height": 1693,
"system_measures": [
{
"left": 211,
"top": 726,
"width": 701,
"height": 120
},
...
YOLO format
zip/
images/ ... all images
labels/ ... corresponding labels in YOLO format
The *.txt file is formatted with one row per object in class x_center y_center width height format. Box coordinates must be in normalized xywh format (from 0 to 1).
0 0.163365 0.429003 0.205570 0.090634
0 0.328309 0.429003 0.112834 0.090634
0 0.462245 0.429003 0.138961 0.090634
0 0.598048 0.429003 0.124605 0.090634
0 0.741746 0.429003 0.150158 0.090634
0 0.889176 0.429003 0.136090 0.090634