Skip to content

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Notifications You must be signed in to change notification settings

HawkAaron/RNN-Transducer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

786fa75 · Apr 18, 2019

History

48 Commits
Apr 13, 2018
Apr 15, 2018
Apr 12, 2018
Apr 16, 2018
Apr 2, 2019
Apr 15, 2018
Apr 17, 2019
Apr 15, 2018
Apr 18, 2019
Apr 21, 2018
Apr 13, 2018
Apr 15, 2019
Apr 21, 2018
Apr 15, 2018
Apr 16, 2018
Apr 17, 2018
Apr 16, 2018

Repository files navigation

End-to-End Speech Recognition using RNN-Transducer

File description

  • eval.py: rnnt joint model decode
  • model.py: rnnt model, which contains acoustic / phoneme model
  • model2012.py: rnnt model refer to Graves2012
  • seq2seq/*: seq2seq with attention
  • rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
  • DataLoader.py: data process
  • train.py: rnnt training script, can be initialized from CTC and PM model
  • train_ctc.py: ctc training script
  • train_att.py: attention training script

Directory description

  • conf: kaldi feature extraction config

Reference Paper

Run

  • Compile RNNT Loss Follow the instructions in here to compile MXNET with RNNT loss.

  • Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013

  • Train RNNT model:

python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule

Evaluation

Default only for RNNT

  • Greedy decoding:
python eval.py <path to best model parameters> --bi
  • Beam search:
python eval.py <path to best model parameters> --bi --beam <beam size>

Results

  • CTC

    Decode PER
    greedy 20.36
    beam 100 20.03
  • Transducer

    Decode PER
    greedy 20.74
    beam 40 19.84

Requirements

  • Python 3.6
  • MxNet 1.1.0
  • numpy 1.14

TODO

  • beam serach accelaration
  • Seq2Seq with attention

About

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published