Adapting End-to-End (E2E) models to unseen domains is still a big challenge since training E2E models requires lots of paired audio and text training data. We propose a novel domain adaptation framework for the E2E model, which only uses the text of the target domain. Moreover, the proposed methods can keep the performance on the source domain intact while greatly improving the performance on the target domain. The proposed framework consists of two parts: the discriminator and the transfer which were optimized separately. Finally, optimized discriminator and transfer were combined and evaluated on two domain adaption tasks. In the experiments of adapting the English Librispeech to Gigaspeech, we obtained an average relative 11.6% and 11.8% on word error rate (WER) reduction for the target domain dev and test sets, respectively, while almost without WER degradation on the source domain. For the inhouse Chinese corpus aviation and TV, the character error rate (CER) of the source domain increased within 5%, while the CER on the target domain achieved around relative 85% and 42% improvement, respectively. In addition, our approach is also more effective in the mixed domain scenarios in the evaluation.
将目标域文本放在data/target_domain_text目录下用于nnlm训练
将目标域测试集数据集放在data/test下
如果目标域文本数量小于100w条就将它repeat到100w级别
feats.scp为伪造feats
对应的脚本是scripts/data_process.sh
目标领域nnlm training
bash ./scripts/train.sh --expdir exp/grulm2L_target_domain_aviage2L128h --conf conf/grulm2L128h.yaml
./scripts/decode_e2e_ngram.sh --dir ${exp} --data ${dir} --sets ${dataset} --suffix "_${expname}" --nj ${job} --pp ${pp} --target_nnlm exp/lstmlm2L_target_domain_tv/checkpoint
dir为存放模型checkpoint的实验目录,data为数据目录,sets为数据目录里面具体的子集如test,target_nnlm为目标域lm的路径




bash run.sh --stage -1 --target_domain_text target_domain_textdir/tv_text --stop_stage -1 --setname tv
bash run.sh --stage 0 --target_domain_text target_domain_textdir/tv_text --stop_stage 0 --setname tv
decode source domain when target domain is tv
bash run.sh --setname aitrans --stage 1 --target_domain tv
decode target domain tv
bash run.sh --setname tv --stage 1 --target_domain tv
bash run.sh --target_domain_text target_domain_textdir/aviage_text --target_domain aviage
@INPROCEEDINGS{
author={Shao, Hang and Tan, Tian and Wang, Wei and Gong, Xun and Qian, Yanmin},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition},
year={2023},
pages={1-5},
keywords={Degradation;Training;Adaptation models;TV;Error analysis;Training data;Speech recognition;end-to-end speech recognition;domain adaptation;discriminator and transfer;log-likelihood ratio},
doi={10.1109/ICASSP49357.2023.10095910}
}