Speech enhancement using deep neural networks (Keras implementation)

by Yong Xu and Qiuqiang Kong

Modified Jeonghwa Yoo (Env: python 3.5 and windows OS)

This code uses deep neural network (DNN) to do speech enhancement. This code is a Keras implementation of The paper:

[1] Xu, Y., Du, J., Dai, L.R. and Lee, C.H., 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), pp.7-19.

Original C++ implementation is here (https://github.com/yongxuUSTC/DNN-for-speech-enhancement) by Yong Xu (yong.xu@surrey.ac.uk).

Original Keras re-implementation(https://github.com/yongxuUSTC/sednn/tree/master/mixture2clean_dnn) is done by Qiuqiang Kong (q.kong@surrey.ac.uk)

Noise(0dB)   PESQ
----------------------
n64     1.36 +- 0.05
n71     1.35 +- 0.18
----------------------
Avg.    1.35 +- 0.12

Run on TIMIT and 115 noises

You may replace the mini data with your own data. We listed the data need to be prepared in meta_data/ to re-run the experiments in [1]. The data contains:

Training: Speech: TIMIT 4620 training sentences. Noise: 115 kinds of noises (http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/115noises.html)

Testing: Speech: TIMIT 168 testing sentences (selected 10% from 1680 testing sentences) Noise: Noise 92 (http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html)

Some of the dataset are not published. Instead, you could collect your own data.

Download and prepare data.
Set MINIDATA=0 in run.py. Modify WORKSPACE, TR_SPEECH_DIR, TR_NOISE_DIR, TE_SPEECH_DIR, TE_NOISE_DIR in run.py and some arguments (get_args() function)
Run run.py

Iteration: 0, tr_loss: 1.228049, te_loss: 1.252313
Iteration: 1000, tr_loss: 0.533825, te_loss: 0.677872
Iteration: 2000, tr_loss: 0.505751, te_loss: 0.678816
Iteration: 3000, tr_loss: 0.483631, te_loss: 0.666576
Iteration: 4000, tr_loss: 0.480287, te_loss: 0.675403
Iteration: 5000, tr_loss: 0.457020, te_loss: 0.676319
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_5000iters.h5
Iteration: 6000, tr_loss: 0.461330, te_loss: 0.673847
Iteration: 7000, tr_loss: 0.445159, te_loss: 0.668545
Iteration: 8000, tr_loss: 0.447244, te_loss: 0.680740
Iteration: 9000, tr_loss: 0.427652, te_loss: 0.678236
Iteration: 10000, tr_loss: 0.421219, te_loss: 0.663294
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_10000iters.h5
Training time: 202.551192045 s

The final PESQ looks like:

Noise(0dB)            PESQ
---------------------------------
pink             2.01 +- 0.23
buccaneer1       1.88 +- 0.25
factory2         2.21 +- 0.21
hfchannel        1.63 +- 0.24
factory1         1.93 +- 0.23
babble           1.81 +- 0.28
m109             2.13 +- 0.25
leopard          2.49 +- 0.23
volvo            2.83 +- 0.23
buccaneer2       2.03 +- 0.25
white            2.00 +- 0.21
f16              1.86 +- 0.24
destroyerops     1.99 +- 0.23
destroyerengine  1.86 +- 0.23
machinegun       2.55 +- 0.27
---------------------------------
Avg.             2.08 +- 0.24

Visualization

In the inference step, you may add --visualize to the arguments to plot the mixture, clean and enhanced speech log magnitude spectrogram.

PESQ (windows OS) from

https://uk.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver

Bugs report:

PESQ dose not support long path/folder name, so please shorten your path/folder name. Or you will get a wrong/low PESQ score (or you can modify the PESQ source code to enlarge the size of the path name variable)
For larger dataset which can not be loaded into the momemory at one time, you can 1. prepare your training scp list ---> 2. random your training scp list ---> 3. split your triaining scp list into several parts ---> 4. read each part for training one by one

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
mini_data		mini_data
utils		utils
work_module		work_module
README.md		README.md
pesq2.exe		pesq2.exe
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech enhancement using deep neural networks (Keras implementation)

by Yong Xu and Qiuqiang Kong

Modified Jeonghwa Yoo (Env: python 3.5 and windows OS)

Run on TIMIT and 115 noises

Visualization

PESQ (windows OS) from

Bugs report:

About

Releases

Packages

Languages

jeongHwarr/sednn_modify

Folders and files

Latest commit

History

Repository files navigation

Speech enhancement using deep neural networks (Keras implementation)

by Yong Xu and Qiuqiang Kong

Modified Jeonghwa Yoo (Env: python 3.5 and windows OS)

Run on TIMIT and 115 noises

Visualization

PESQ (windows OS) from

Bugs report:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages