Pytorch implementations of Neural Machine Translation by Jointly Learning to Align and Translate for English to Hindi translation on IITB En-Hi parallel corpus.
- numpy==1.17.2
- torch==1.2.0
- torchtext==0.5.0
- tqdm==4.44.1
- indic-nlp-library==0.6
- Python 3.6+
Install prerequisites with:
pip3 install -r requirements.txt
and download and extract IITB En-Hi parallel corpus in Data folder.
To train model :
python3 main.py
To train with different data use `!`!` as separator for source and target language data paths. For example:
python3 main.py --training_data './Data/dev_test/test.en`!`!`./Data/dev_test/test.hi' --validation_data './Data/dev_test/dev.en`!`!`./Data/dev_test/dev.hi'
To run in inference mode, provide trained model and dictionary paths. For example:
python3 main.py --mode infer --load_model_path './trained_models/test_model.pt' --load_dic_path './trained_models/test_dic.pkl'
More configurations can be found here.
- Neural Machine Translation by Jointly Learning to Align and Translate.
- IITB En-Hi parallel corpus.
- Seq2Seq tutorial
Shikhar / @Shikhar