DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

1. Introduction

As a popular multilingual and multitask pre-trained speech model, Whisper has the problem of curse of multilinguality. To enhance multilingual capabilities in small Whisper models, we propose DQ-Whisper, a novel joint distillation and quantization framework to compress Whisper for efficient inference. Firstly, we propose a novel dynamic matching distillation strategy. Then, a quantization-aware distillation framework is introduced to integrate quantization with distillation. Experimental results on various multilingual datasets show that our suggested distillation approach can effectively enhance the multilingual capabilities of small Whisper models without increasing computational costs. Up to 5.18x reduction in model size is achieved with marginal performance degradation. In addition, quantization is compatible with distillation, which can result in a higher compression rate.

2. Usage

Training

bash scripts/train_ce_ctc.sh --nj 1 --expdir exp/jp_CSJ_KD_logits_v1_alpha0 --conf conf/jap_ts.yaml checkpoint_dir=exp/jp_CSJ_KD_logits_v1_alpha0 data.data_dir=data/csj_whisper optim.lr=3e-5 data.collector.minibatch_size=20 loss.alpha=0

Inference

python inference.py base exp/trans20L-lstm2L_jp_CSJ_KD_logits_v1_alpha0.3

3. Experiment Results

4. Citation

@INPROCEEDINGS{
    author={Shao, Hang and Liu, Bei and Wang, Wei and Gong, Xun and Qian, Yanmin},
    booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)}, 
    title={DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition}, 
    year={2024},
    pages={240-246},
    keywords={Degradation;Quantization (signal);Computational modeling;Conferences;Merging;Speech recognition;Multilingual;Computational efficiency},
    doi={10.1109/SLT61566.2024.10832149}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Pretrained_model		Pretrained_model
conf		conf
extend_codes		extend_codes
local		local
myscripts		myscripts
scripts		scripts
steps		steps
utils		utils
README.md		README.md
cmd.sh		cmd.sh
inference.py		inference.py
path.sh		path.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

Table of Contents

1. Introduction

2. Usage

Training

Inference

3. Experiment Results

4. Citation

About

Releases

Packages

Languages

talkking/DQ-Whisper

Folders and files

Latest commit

History

Repository files navigation

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition

Table of Contents

1. Introduction

2. Usage

Training

Inference

3. Experiment Results

4. Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages