diff --git a/README.md b/README.md
index 3931e50..4a16c0c 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
-
+
# MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
@@ -17,119 +17,119 @@
- [CommonVoice](https://commonvoice.mozilla.org/en/datasets) |
+ CommonVoice |
CC 0 |
6,732 |
bg, cs, da, nl, en, et, fi, fr, de, el, hu, ga, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv |
✅ |
- [CoVoST2](https://github.com/facebookresearch/covost) |
+ CoVoST2 |
CC 0 |
687 |
en, fr, it, es, pt, et, nl, sv, lv, sl |
✅ |
- [CSS10](https://github.com/Kyubyong/css10) |
+ CSS10 |
Public Domain |
99 |
nl, fi, fr, de, el, hu, es |
✅ |
- [EMU](https://ips-lmu.github.io/EMU.html) |
+ EMU |
CC BY 3.0 |
56 |
pl |
✅ |
- [EU Parliament](https://clarin-pl.eu/dspace/handle/11321/821) |
+ EU Parliament |
CC BY 4.0 |
32 |
pl |
✅ |
- [FLEURS](https://huggingface.co/datasets/google/fleurs) |
+ FLEURS |
CC BY 4.0 |
215 |
bg, cs, da, nl, en, et, fi, fr, de, el, hu, ga, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv |
✅ |
- [Large Corpus of Czech Parliament Plenary Hearings](https://lindat.cz/repository/xmlui/handle/11234/1-3126) |
+ Large Corpus of Czech Parliament Plenary Hearings |
CC BY 4.0 |
444 |
cs |
✅ |
- [LibriLight](https://github.com/facebookresearch/libri-light) |
+ LibriLight |
Public Domain |
57,706 |
en |
❌ |
- [LibriTTS](https://www.openslr.org/60/) |
+ LibriTTS |
CC BY 4.0 |
585 |
en |
✅ |
- [LibriSpeech](https://www.openslr.org/12) |
+ LibriSpeech |
CC BY 4.0 |
360 |
en |
✅ |
- [LibriVoxDeEn](https://www.cl.uni-heidelberg.de/statnlpgroup/librivoxdeen/) |
+ LibriVoxDeEn |
Public Domain |
547 |
de |
✅ |
- [MC Speech](https://github.com/czyzi0/the-mc-speech-dataset) |
+ MC Speech |
CC 0 |
22 |
pl |
✅ |
- [Multilingual LibriSpeech](https://www.openslr.org/94/) |
+ Multilingual LibriSpeech |
CC BY 4.0 |
50,687 |
nl, en, fr, de, it, pl, pt, es |
✅ |
- [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) |
+ SIWIS |
CC BY 4.0 |
11 |
fr |
✅ |
- [Speech Commands](http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz) |
+ Speech Commands |
CC BY 4.0 |
18 |
en |
✅ |
- [VCTK](https://datashare.ed.ac.uk/handle/10283/3443) |
+ VCTK |
CC BY 4.0 |
44 |
en |
✅ |
- [VoxPopuli](https://github.com/facebookresearch/voxpopuli) |
+ VoxPopuli |
CC 0 |
383,500 |
bg, hr, cs, da, nl, en, et, fi, fr, de, el, hu, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv |
@@ -141,7 +141,7 @@
✅ |
- [YouTube-Commons](https://huggingface.co/datasets/PleIAs/YouTube-Commons) |
+ YouTube-Commons |
CC BY 4.0 |
3,261 |
bg, cs, nl, en, et, fr, de, el, hu, it, pl, pt, ro, es |
@@ -153,7 +153,7 @@
✅ |
- [MOSEL :grapes:](https://huggingface.co/datasets/FBK-MT/mosel) |
+ MOSEL :grapes: |
CC BY 4.0 |
441,206 |
bg, hr, cs, da, nl, en, et, fi, fr, de, el, hu, it, lv, lt, mt, pl, pt, ro, sk, sl, es, sv |