Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
Signed-off-by: jeonghwaYoo <green2368@naver.com>
  • Loading branch information
jeongHwarr committed Mar 12, 2019
0 parents commit c8e4338
Show file tree
Hide file tree
Showing 20 changed files with 1,224 additions and 0 deletions.
92 changes: 92 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Speech enhancement using deep neural networks (Keras implementation)
by Yong Xu and Qiuqiang Kong
Modified Jeonghwa Yoo (Env: python 3.5 and windows OS)

This code uses deep neural network (DNN) to do speech enhancement. This code is a Keras implementation of The paper:

[1] Xu, Y., Du, J., Dai, L.R. and Lee, C.H., 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), pp.7-19.

Original C++ implementation is here (https://github.com/yongxuUSTC/DNN-for-speech-enhancement) by Yong Xu (yong.xu@surrey.ac.uk). This Keras re-implementation is done by Qiuqiang Kong (q.kong@surrey.ac.uk)

<pre>
Noise(0dB)  PESQ
----------------------
n64 1.36 +- 0.05
n71 1.35 +- 0.18
----------------------
Avg. 1.35 +- 0.12
</pre>

## Run on TIMIT and 115 noises
You may replace the mini data with your own data. We listed the data need to be prepared in meta_data/ to re-run the experiments in [1]. The data contains:

Training:
Speech: TIMIT 4620 training sentences.
Noise: 115 kinds of noises (http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/115noises.html)

Testing:
Speech: TIMIT 168 testing sentences (selected 10% from 1680 testing sentences)
Noise: Noise 92 (http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html)

Some of the dataset are not published. Instead, you could collect your own data.

1. Download and prepare data.

2. Set MINIDATA=0 in run.py. Modify WORKSPACE, TR_SPEECH_DIR, TR_NOISE_DIR, TE_SPEECH_DIR, TE_NOISE_DIR in run.py and some arguments (get_args() function)

3. Run run.py

<pre>
Iteration: 0, tr_loss: 1.228049, te_loss: 1.252313
Iteration: 1000, tr_loss: 0.533825, te_loss: 0.677872
Iteration: 2000, tr_loss: 0.505751, te_loss: 0.678816
Iteration: 3000, tr_loss: 0.483631, te_loss: 0.666576
Iteration: 4000, tr_loss: 0.480287, te_loss: 0.675403
Iteration: 5000, tr_loss: 0.457020, te_loss: 0.676319
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_5000iters.h5
Iteration: 6000, tr_loss: 0.461330, te_loss: 0.673847
Iteration: 7000, tr_loss: 0.445159, te_loss: 0.668545
Iteration: 8000, tr_loss: 0.447244, te_loss: 0.680740
Iteration: 9000, tr_loss: 0.427652, te_loss: 0.678236
Iteration: 10000, tr_loss: 0.421219, te_loss: 0.663294
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_10000iters.h5
Training time: 202.551192045 s
</pre>

The final PESQ looks like:

<pre>
Noise(0dB) PESQ
---------------------------------
pink 2.01 +- 0.23
buccaneer1 1.88 +- 0.25
factory2 2.21 +- 0.21
hfchannel 1.63 +- 0.24
factory1 1.93 +- 0.23
babble 1.81 +- 0.28
m109 2.13 +- 0.25
leopard 2.49 +- 0.23
volvo 2.83 +- 0.23
buccaneer2 2.03 +- 0.25
white 2.00 +- 0.21
f16 1.86 +- 0.24
destroyerops 1.99 +- 0.23
destroyerengine 1.86 +- 0.23
machinegun 2.55 +- 0.27
---------------------------------
Avg. 2.08 +- 0.24
</pre>


## Visualization
In the inference step, you may add --visualize to the arguments to plot the mixture, clean and enhanced speech log magnitude spectrogram.

![alt text](https://github.com/yongxuUSTC/deep_learning_based_speech_enhancement_keras_python/blob/master/mixture2clean_dnn/appendix/enhanced_log_sp.png)

## PESQ (windows OS) from
https://uk.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver

## Bugs report:
1. PESQ dose not support long path/folder name, so please shorten your path/folder name. Or you will get a wrong/low PESQ score (or you can modify the PESQ source code to enlarge the size of the path name variable)

2. For larger dataset which can not be loaded into the momemory at one time, you can 1. prepare your training scp list ---> 2. random your training scp list ---> 3. split your triaining scp list into several parts ---> 4. read each part for training one by one
Binary file added mini_data/test_noise/n64.wav
Binary file not shown.
Binary file added mini_data/test_noise/n71.wav
Binary file not shown.
Binary file added mini_data/test_speech/TEST_DR4_FDMS0_SX48.WAV
Binary file not shown.
Binary file added mini_data/test_speech/TEST_DR5_MRJM3_SI1809.WAV
Binary file not shown.
Binary file added mini_data/train_noise/n1.wav
Binary file not shown.
Binary file added mini_data/train_noise/n49.wav
Binary file not shown.
Binary file added mini_data/train_noise/n95.wav
Binary file not shown.
Binary file added mini_data/train_speech/TRAIN_DR1_FCJF0_SA1.WAV
Binary file not shown.
Binary file added mini_data/train_speech/TRAIN_DR1_FKFB0_SX348.WAV
Binary file not shown.
Binary file added mini_data/train_speech/TRAIN_DR1_MRDD0_SI1680.WAV
Binary file not shown.
Binary file added pesq2.exe
Binary file not shown.
8 changes: 8 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
soundfile>=0.9.0.post1
numpy>=1.13.3
matplotlib>=2.1.1
scipy>=1.0.0
h5py>=2.7.1
scikit-learn>=0.19.1
keras>=2.1.2
tensorflow-gpu>=1.4.1
91 changes: 91 additions & 0 deletions run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import os
import argparse
from work_module import prepare_data, main_dnn, evaluate

CURRENT_PATH = os.path.split(os.getcwd())[-1]

MODEL_NAME='SE-SegCaps'

MINIDATA = True #MINIDATA

if MINIDATA == True:
WORKSPACE = os.path.join("D:/Python_output",CURRENT_PATH+"_MINIDATA")
TR_SPEECH_DIR="mini_data/train_speech"
TR_NOISE_DIR="mini_data/train_noise"
TE_SPEECH_DIR="mini_data/test_speech"
TE_NOISE_DIR="mini_data/test_noise"

else:
WORKSPACE = "D:/Python_output/"+CURRENT_PATH
TR_SPEECH_DIR="D:/train/speech"
TR_NOISE_DIR="D:/noise"
TE_SPEECH_DIR="D:/test"
TE_NOISE_DIR="D:/noise"

def get_args():
parser = argparse.ArgumentParser(description="Speech Enhancement using DNN.")
parser.add_argument('-sr', '--sample_rate', default=8000, type=int,
help="target sampling rate of audio")
parser.add_argument('--fft', default=256, type=int,
help="FFT size")
parser.add_argument('--window', default=256, type=int,
help="FFT size")
parser.add_argument('--overlap', default=192, type=int,
help="overlap size of spectrogram")
parser.add_argument('--n_concat', default=7, type=int,
help="number of frames to concatentate")

parser.add_argument('--tr_snr', default=0, type=int,
help="SNR of training data")
parser.add_argument('--te_snr', default=0, type=int,
help="SNR of test data")

parser.add_argument('--iter', default=10000, type=int)
parser.add_argument('--debug_inter', default=1000, type=int,
help="Interval to debug model")
parser.add_argument('--save_inter', default=5000, type=int,
help="Interval to save model")
parser.add_argument('-b', '--batch_size', default=32, type=int)
parser.add_argument('--lr', default=0.0001, type=float,
help="Initial learning rate")
parser.add_argument('-visual', '--visualize', default=1, type=int, choices=[0,1],
help="If value is 1, visualization of result of inference")

parser.add_argument('--train', default=1, type=int, choices=[0,1],
help="If the value is 1, run trainining")
parser.add_argument('--test', default=1, type=int, choices=[0,1],
help="If the value is 1, run test")

args = parser.parse_args()
return args

if __name__ == '__main__':

args = get_args()

DIRECTORY = {}
DIRECTORY['WORKSPACE'] = WORKSPACE
DIRECTORY['TR_SPEECH_DIR'] = TR_SPEECH_DIR
DIRECTORY['TR_NOISE_DIR'] = TR_NOISE_DIR
DIRECTORY['TE_SPEECH_DIR'] = TE_SPEECH_DIR
DIRECTORY['TE_NOISE_DIR'] = TE_NOISE_DIR

prepare_data.create_mixture_csv(DIRECTORY, args, mode='train')
prepare_data.create_mixture_csv(DIRECTORY, args, mode='test')

prepare_data.calculate_mixture_features(DIRECTORY, args, mode='train')
prepare_data.calculate_mixture_features(DIRECTORY, args, mode='test')

prepare_data.pack_features(DIRECTORY, args, mode='train')
prepare_data.pack_features(DIRECTORY, args, mode='test')

if args.train==1:
prepare_data.compute_scaler(DIRECTORY, args, mode='train')
main_dnn.train(DIRECTORY, args)
evaluate.plot_training_stat(DIRECTORY, args, bgn_iter=0, fin_iter=args.iter, interval_iter=args.debug_inter)

if args.test==1:
main_dnn.inference(DIRECTORY, args)
evaluate.calculate_pesq(DIRECTORY,args)
evaluate.get_stats(DIRECTORY,args)

7 changes: 7 additions & 0 deletions utils/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import os


def makedirs(path):
if not os.path.exists(path):
print(" [*] Make directories : {}".format(path))
os.makedirs(path)
36 changes: 36 additions & 0 deletions work_module/data_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
import numpy as np

class DataGenerator(object):
def __init__(self, batch_size, type, te_max_iter=None):
assert type in ['train', 'test']
self._batch_size_ = batch_size
self._type_ = type
self._te_max_iter_ = te_max_iter

def generate(self, xs, ys):
x = xs[0]
y = ys[0]
batch_size = self._batch_size_
n_samples = len(x)

index = np.arange(n_samples)
np.random.shuffle(index)

iter = 0
epoch = 0
pointer = 0
while True:
if (self._type_ == 'test') and (self._te_max_iter_ is not None):
if iter == self._te_max_iter_:
break
iter += 1
if pointer >= n_samples:
epoch += 1
if (self._type_) == 'test' and (epoch == 1):
break
pointer = 0
np.random.shuffle(index)

batch_idx = index[pointer : min(pointer + batch_size, n_samples)]
pointer += batch_size
yield x[batch_idx], y[batch_idx]
119 changes: 119 additions & 0 deletions work_module/evaluate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""
Summary: Calculate PESQ and overal stats of enhanced speech.
Author: Qiuqiang Kong
Created: 2017.12.22
Modified: -
"""
import argparse
import os
import csv
import numpy as np
import pickle
import matplotlib.pyplot as plt
from utils import makedirs


def plot_training_stat(DIRECTORY, args, bgn_iter, fin_iter, interval_iter):
"""Plot training and testing loss.
Args:
workspace: str, path of workspace.
tr_snr: float, training SNR.
bgn_iter: int, plot from bgn_iter
fin_iter: int, plot finish at fin_iter
interval_iter: int, interval of files.
"""
workspace = DIRECTORY['WORKSPACE']
tr_snr = args.tr_snr
tr_losses, te_losses, iters = [], [], []

# Load stats.
stats_dir = os.path.join(workspace, "training_stats", "%ddb" % int(tr_snr))
for iter in range(bgn_iter, fin_iter+1, interval_iter):
stats_path = os.path.join(stats_dir, "%diters.p" % iter)
dict = pickle.load(open(stats_path, 'rb'))
tr_losses.append(dict['tr_loss'])
te_losses.append(dict['te_loss'])
iters.append(dict['iter'])

# Plot
line_tr, = plt.plot(tr_losses, c='b', label="Train")
line_te, = plt.plot(te_losses, c='r', label="Test")
plt.axis([0, len(iters), 0, max(tr_losses)])
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.legend(handles=[line_tr, line_te])
plt.xticks(np.arange(len(iters)), iters)
plt.show()


def calculate_pesq(DIRECTORY, args):
"""Calculate PESQ of all enhaced speech.
Args:
workspace: str, path of workspace.
speech_dir: str, path of clean speech.
te_snr: float, testing SNR.
"""
workspace = DIRECTORY['WORKSPACE']
speech_dir = DIRECTORY['TE_SPEECH_DIR']
te_snr = args.te_snr

# Remove already existed file.
os.system('del pesq_results.txt')

# Calculate PESQ of all enhaced speech.
enh_speech_dir = os.path.join(workspace, "enh_wavs", "test", "%ddb" % int(te_snr))
names = os.listdir(enh_speech_dir)
for (cnt, na) in enumerate(names):
print(cnt, na)
enh_path = os.path.join(enh_speech_dir, na)

speech_na = na.split('.')[0]
speech_path = os.path.join(speech_dir, "%s.WAV" % speech_na)

# Call executable PESQ tool.
cmd = ' '.join(["pesq2.exe", speech_path, enh_path, '+'+str(args.sample_rate)])
# os.system(cmd)
result = os.popen(cmd).read()
print(result)

def get_stats(DIRECTORY, args):
"""Calculate stats of PESQ.
"""
workspace = DIRECTORY['WORKSPACE']
pesq_path = "pesq_results.txt"
with open(pesq_path, 'rt') as f:
reader = csv.reader(f, delimiter='\t')
lis = list(reader)

pesq_dict = {}
for i1 in range(1, len(lis) - 1):
li = lis[i1]
na = li[1]
pesq = float(li[2])
noise_type = na.split('.')[1]
if noise_type not in pesq_dict.keys():
pesq_dict[noise_type] = [pesq]
else:
pesq_dict[noise_type].append(pesq)

avg_list, std_list = [], []
result_path = os.path.join(workspace, "result")
makedirs(result_path)
result_path = os.path.join(result_path,"result.txt")
file = open(result_path, "w")
f = "{0:<16} {1:<16}"
file.write(f.format("Noise", "PESQ")+"\n")
file.write("---------------------------------\n")
for noise_type in pesq_dict.keys():
pesqs = pesq_dict[noise_type]
avg_pesq = np.mean(pesqs)
std_pesq = np.std(pesqs)
avg_list.append(avg_pesq)
std_list.append(std_pesq)
file.write(f.format(noise_type, "%.2f +- %.2f\n" % (avg_pesq, std_pesq)))
file.write("---------------------------------\n")
file.write(f.format("Avg.", "%.2f +- %.2f\n" % (np.mean(avg_list), np.mean(std_list))))
file.close()
print("Average PESQ score: %s" %np.mean(avg_list))
Loading

0 comments on commit c8e4338

Please sign in to comment.