This repository implements Hybrid CTC/Attention Architecture for End-to-End Speech Recognition in PyTorch. The architecture has been used as it is from ESPNET, and the pre-processing steps have been modified and converted from CPP to python.
$ git clone https://github.com/mayank-git-hub/ETE-Speech-Recognition
$ cd ETE-Speech-Recognition
$ pip install -r requirements.txt
In specificConfig.py, set the path to where you would want to download the data set in "path_to_download". axel should be installed for downloading the dataset. If you do not want to download it using axel, then download the tar.gz files and place it in the folder - config.path_to_download + '/' + list_i + '.tar.gz'
$ python downloadDataset.py
In specificConfig.py, set the path to where you would want to save the unigram model in "cache_dir".
$ python main.py genUnigram
In specificConfig.py, set the path to where you would want to save the intermediate models in "base_model_path".
$ python main.py train
In specificConfig.py, set the path to where you have the intermediate models in "resume[model_path]" and set "resume[restart]" to True.
$ python main.py train
In specificConfig.py, set the path to where you have the test model in "test_model".
$ python main.py test