wav2vec 2.0 Recognize Implementation.
Wave2vec is part of fairseq
This repository is the result of the issue submitted in the fairseq
repository here.
Please first download one of the pre-trained models available from fairseq
(see later).
Model | Finetuning split | Dataset | Model |
---|---|---|---|
Wav2Vec 2.0 Base | No finetuning | Librispeech | download |
Wav2Vec 2.0 Base | 10 minutes | Librispeech | download |
Wav2Vec 2.0 Base | 100 hours | Librispeech | download |
Wav2Vec 2.0 Base | 960 hours | Librispeech | download |
Wav2Vec 2.0 Large | No finetuning | Librispeech | download |
Wav2Vec 2.0 Large | 10 minutes | Librispeech | download |
Wav2Vec 2.0 Large | 100 hours | Librispeech | download |
Wav2Vec 2.0 Large | 960 hours | Librispeech | download |
Wav2Vec 2.0 Large (LV-60) | No finetuning | Libri-Light | download |
Wav2Vec 2.0 Large (LV-60) | 10 minutes | Libri-Light + Librispeech | download |
Wav2Vec 2.0 Large (LV-60) | 100 hours | Libri-Light + Librispeech | download |
Wav2Vec 2.0 Large (LV-60) | 960 hours | Libri-Light + Librispeech | download |
We make use of python:3.8.6-slim-buster
as base image in order to let developers to have more flexibility in customize this Dockerfile
. For a simplifed install please refer to Alternative Install section. If you go for this container, please install using the provided Dockerfile
docker build -t wav2vec -f Dockerfile .
There are two version of recognize.py
.
recognize.py
: For running legacy finetuned model (without Hydra).recognize.hydra.py
: For running new finetuned with newer version of fairseq.
Before running, please copy the downloaded model (e.g. wav2vec_small_10m.pt
) to the data/
folder. Please copy there the wav file to test as well, like data/temp.wav
in the following examples. So the data/
folder will now look like this
.
├── dict.ltr.txt
├── temp.wav
└── wav2vec_small_10m.pt
We now run the container and the we enter and execute the recognition (recognize.py
or recognize.hydra.py
).
docker run -d -it --rm -v $PWD/data:/app/data --name w2v wav2vec
docker exec -it w2v bash
python examples/wav2vec/recognize.py --target_dict_path=/app/data/dict.ltr.txt /app/data/wav2vec_small_10m.pt /app/data/temp.wav
At the very least, we have tested with fairseq master branch (> v0.10.1, commit ac11107). When you run into issues, like this:
omegaconf.errors.ValidationError: Invalid value 'False', expected one of [hard, soft]
full_key: generation.print_alignment
reference_type=GenerationConfig
object_type=GenerationConfig
It's probably that your model've been finetuned (or trained) with other version of fairseq. You should find yourself which version your model are trained, and edit commit hash in Dockerfile accordingly, BUT IT MIGHT BREAK src/recognize.py.
The workaround is look for what's changed in the parameters inside fairseq source code. In the above example, I've managed to find that:
fairseq/dataclass/configs.py (72a25a4 -> 032a404)
- print_alignment: bool = field(
+ print_alignment: Optional[PRINT_ALIGNMENT_CHOICES] = field(
- default=False,
+ default=None,
metadata={
- "help": "if set, uses attention feedback to compute and print alignment to source tokens"
+ "help": "if set, uses attention feedback to compute and print alignment to source tokens "
+ "(valid options are: hard, soft, otherwise treated as hard alignment)",
+ "argparse_const": "hard",
},
)
The problem is fairseq had modified such that generation.print_alignment
not valid anymore, so I modify recognize.hydra.py
as below (you might wanna modify the value instead):
OmegaConf.set_struct(w2v["cfg"], False)
+ del w2v["cfg"].generation["print_alignment"]
cfg = OmegaConf.merge(OmegaConf.structured(Wav2Vec2CheckpointConfig), w2v["cfg"])
We provide an alternative Dockerfile named wav2letter.Dockerfile
that makes use of wav2letter/wav2letter:cpu-latest
Docker image as FROM
.
Here are the commands for build, install and run in this case:
docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash
python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt
Thanks to all contributors to this repo.