-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: jeonghwaYoo <green2368@naver.com>
- Loading branch information
0 parents
commit c8e4338
Showing
20 changed files
with
1,224 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Speech enhancement using deep neural networks (Keras implementation) | ||
by Yong Xu and Qiuqiang Kong | ||
Modified Jeonghwa Yoo (Env: python 3.5 and windows OS) | ||
|
||
This code uses deep neural network (DNN) to do speech enhancement. This code is a Keras implementation of The paper: | ||
|
||
[1] Xu, Y., Du, J., Dai, L.R. and Lee, C.H., 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), pp.7-19. | ||
|
||
Original C++ implementation is here (https://github.com/yongxuUSTC/DNN-for-speech-enhancement) by Yong Xu (yong.xu@surrey.ac.uk). This Keras re-implementation is done by Qiuqiang Kong (q.kong@surrey.ac.uk) | ||
|
||
<pre> | ||
Noise(0dB) PESQ | ||
---------------------- | ||
n64 1.36 +- 0.05 | ||
n71 1.35 +- 0.18 | ||
---------------------- | ||
Avg. 1.35 +- 0.12 | ||
</pre> | ||
|
||
## Run on TIMIT and 115 noises | ||
You may replace the mini data with your own data. We listed the data need to be prepared in meta_data/ to re-run the experiments in [1]. The data contains: | ||
|
||
Training: | ||
Speech: TIMIT 4620 training sentences. | ||
Noise: 115 kinds of noises (http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/115noises.html) | ||
|
||
Testing: | ||
Speech: TIMIT 168 testing sentences (selected 10% from 1680 testing sentences) | ||
Noise: Noise 92 (http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html) | ||
|
||
Some of the dataset are not published. Instead, you could collect your own data. | ||
|
||
1. Download and prepare data. | ||
|
||
2. Set MINIDATA=0 in run.py. Modify WORKSPACE, TR_SPEECH_DIR, TR_NOISE_DIR, TE_SPEECH_DIR, TE_NOISE_DIR in run.py and some arguments (get_args() function) | ||
|
||
3. Run run.py | ||
|
||
<pre> | ||
Iteration: 0, tr_loss: 1.228049, te_loss: 1.252313 | ||
Iteration: 1000, tr_loss: 0.533825, te_loss: 0.677872 | ||
Iteration: 2000, tr_loss: 0.505751, te_loss: 0.678816 | ||
Iteration: 3000, tr_loss: 0.483631, te_loss: 0.666576 | ||
Iteration: 4000, tr_loss: 0.480287, te_loss: 0.675403 | ||
Iteration: 5000, tr_loss: 0.457020, te_loss: 0.676319 | ||
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_5000iters.h5 | ||
Iteration: 6000, tr_loss: 0.461330, te_loss: 0.673847 | ||
Iteration: 7000, tr_loss: 0.445159, te_loss: 0.668545 | ||
Iteration: 8000, tr_loss: 0.447244, te_loss: 0.680740 | ||
Iteration: 9000, tr_loss: 0.427652, te_loss: 0.678236 | ||
Iteration: 10000, tr_loss: 0.421219, te_loss: 0.663294 | ||
Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_10000iters.h5 | ||
Training time: 202.551192045 s | ||
</pre> | ||
|
||
The final PESQ looks like: | ||
|
||
<pre> | ||
Noise(0dB) PESQ | ||
--------------------------------- | ||
pink 2.01 +- 0.23 | ||
buccaneer1 1.88 +- 0.25 | ||
factory2 2.21 +- 0.21 | ||
hfchannel 1.63 +- 0.24 | ||
factory1 1.93 +- 0.23 | ||
babble 1.81 +- 0.28 | ||
m109 2.13 +- 0.25 | ||
leopard 2.49 +- 0.23 | ||
volvo 2.83 +- 0.23 | ||
buccaneer2 2.03 +- 0.25 | ||
white 2.00 +- 0.21 | ||
f16 1.86 +- 0.24 | ||
destroyerops 1.99 +- 0.23 | ||
destroyerengine 1.86 +- 0.23 | ||
machinegun 2.55 +- 0.27 | ||
--------------------------------- | ||
Avg. 2.08 +- 0.24 | ||
</pre> | ||
|
||
|
||
## Visualization | ||
In the inference step, you may add --visualize to the arguments to plot the mixture, clean and enhanced speech log magnitude spectrogram. | ||
|
||
 | ||
|
||
## PESQ (windows OS) from | ||
https://uk.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver | ||
|
||
## Bugs report: | ||
1. PESQ dose not support long path/folder name, so please shorten your path/folder name. Or you will get a wrong/low PESQ score (or you can modify the PESQ source code to enlarge the size of the path name variable) | ||
|
||
2. For larger dataset which can not be loaded into the momemory at one time, you can 1. prepare your training scp list ---> 2. random your training scp list ---> 3. split your triaining scp list into several parts ---> 4. read each part for training one by one |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
soundfile>=0.9.0.post1 | ||
numpy>=1.13.3 | ||
matplotlib>=2.1.1 | ||
scipy>=1.0.0 | ||
h5py>=2.7.1 | ||
scikit-learn>=0.19.1 | ||
keras>=2.1.2 | ||
tensorflow-gpu>=1.4.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
import os | ||
import argparse | ||
from work_module import prepare_data, main_dnn, evaluate | ||
|
||
CURRENT_PATH = os.path.split(os.getcwd())[-1] | ||
|
||
MODEL_NAME='SE-SegCaps' | ||
|
||
MINIDATA = True #MINIDATA | ||
|
||
if MINIDATA == True: | ||
WORKSPACE = os.path.join("D:/Python_output",CURRENT_PATH+"_MINIDATA") | ||
TR_SPEECH_DIR="mini_data/train_speech" | ||
TR_NOISE_DIR="mini_data/train_noise" | ||
TE_SPEECH_DIR="mini_data/test_speech" | ||
TE_NOISE_DIR="mini_data/test_noise" | ||
|
||
else: | ||
WORKSPACE = "D:/Python_output/"+CURRENT_PATH | ||
TR_SPEECH_DIR="D:/train/speech" | ||
TR_NOISE_DIR="D:/noise" | ||
TE_SPEECH_DIR="D:/test" | ||
TE_NOISE_DIR="D:/noise" | ||
|
||
def get_args(): | ||
parser = argparse.ArgumentParser(description="Speech Enhancement using DNN.") | ||
parser.add_argument('-sr', '--sample_rate', default=8000, type=int, | ||
help="target sampling rate of audio") | ||
parser.add_argument('--fft', default=256, type=int, | ||
help="FFT size") | ||
parser.add_argument('--window', default=256, type=int, | ||
help="FFT size") | ||
parser.add_argument('--overlap', default=192, type=int, | ||
help="overlap size of spectrogram") | ||
parser.add_argument('--n_concat', default=7, type=int, | ||
help="number of frames to concatentate") | ||
|
||
parser.add_argument('--tr_snr', default=0, type=int, | ||
help="SNR of training data") | ||
parser.add_argument('--te_snr', default=0, type=int, | ||
help="SNR of test data") | ||
|
||
parser.add_argument('--iter', default=10000, type=int) | ||
parser.add_argument('--debug_inter', default=1000, type=int, | ||
help="Interval to debug model") | ||
parser.add_argument('--save_inter', default=5000, type=int, | ||
help="Interval to save model") | ||
parser.add_argument('-b', '--batch_size', default=32, type=int) | ||
parser.add_argument('--lr', default=0.0001, type=float, | ||
help="Initial learning rate") | ||
parser.add_argument('-visual', '--visualize', default=1, type=int, choices=[0,1], | ||
help="If value is 1, visualization of result of inference") | ||
|
||
parser.add_argument('--train', default=1, type=int, choices=[0,1], | ||
help="If the value is 1, run trainining") | ||
parser.add_argument('--test', default=1, type=int, choices=[0,1], | ||
help="If the value is 1, run test") | ||
|
||
args = parser.parse_args() | ||
return args | ||
|
||
if __name__ == '__main__': | ||
|
||
args = get_args() | ||
|
||
DIRECTORY = {} | ||
DIRECTORY['WORKSPACE'] = WORKSPACE | ||
DIRECTORY['TR_SPEECH_DIR'] = TR_SPEECH_DIR | ||
DIRECTORY['TR_NOISE_DIR'] = TR_NOISE_DIR | ||
DIRECTORY['TE_SPEECH_DIR'] = TE_SPEECH_DIR | ||
DIRECTORY['TE_NOISE_DIR'] = TE_NOISE_DIR | ||
|
||
prepare_data.create_mixture_csv(DIRECTORY, args, mode='train') | ||
prepare_data.create_mixture_csv(DIRECTORY, args, mode='test') | ||
|
||
prepare_data.calculate_mixture_features(DIRECTORY, args, mode='train') | ||
prepare_data.calculate_mixture_features(DIRECTORY, args, mode='test') | ||
|
||
prepare_data.pack_features(DIRECTORY, args, mode='train') | ||
prepare_data.pack_features(DIRECTORY, args, mode='test') | ||
|
||
if args.train==1: | ||
prepare_data.compute_scaler(DIRECTORY, args, mode='train') | ||
main_dnn.train(DIRECTORY, args) | ||
evaluate.plot_training_stat(DIRECTORY, args, bgn_iter=0, fin_iter=args.iter, interval_iter=args.debug_inter) | ||
|
||
if args.test==1: | ||
main_dnn.inference(DIRECTORY, args) | ||
evaluate.calculate_pesq(DIRECTORY,args) | ||
evaluate.get_stats(DIRECTORY,args) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
import os | ||
|
||
|
||
def makedirs(path): | ||
if not os.path.exists(path): | ||
print(" [*] Make directories : {}".format(path)) | ||
os.makedirs(path) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import numpy as np | ||
|
||
class DataGenerator(object): | ||
def __init__(self, batch_size, type, te_max_iter=None): | ||
assert type in ['train', 'test'] | ||
self._batch_size_ = batch_size | ||
self._type_ = type | ||
self._te_max_iter_ = te_max_iter | ||
|
||
def generate(self, xs, ys): | ||
x = xs[0] | ||
y = ys[0] | ||
batch_size = self._batch_size_ | ||
n_samples = len(x) | ||
|
||
index = np.arange(n_samples) | ||
np.random.shuffle(index) | ||
|
||
iter = 0 | ||
epoch = 0 | ||
pointer = 0 | ||
while True: | ||
if (self._type_ == 'test') and (self._te_max_iter_ is not None): | ||
if iter == self._te_max_iter_: | ||
break | ||
iter += 1 | ||
if pointer >= n_samples: | ||
epoch += 1 | ||
if (self._type_) == 'test' and (epoch == 1): | ||
break | ||
pointer = 0 | ||
np.random.shuffle(index) | ||
|
||
batch_idx = index[pointer : min(pointer + batch_size, n_samples)] | ||
pointer += batch_size | ||
yield x[batch_idx], y[batch_idx] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
""" | ||
Summary: Calculate PESQ and overal stats of enhanced speech. | ||
Author: Qiuqiang Kong | ||
Created: 2017.12.22 | ||
Modified: - | ||
""" | ||
import argparse | ||
import os | ||
import csv | ||
import numpy as np | ||
import pickle | ||
import matplotlib.pyplot as plt | ||
from utils import makedirs | ||
|
||
|
||
def plot_training_stat(DIRECTORY, args, bgn_iter, fin_iter, interval_iter): | ||
"""Plot training and testing loss. | ||
Args: | ||
workspace: str, path of workspace. | ||
tr_snr: float, training SNR. | ||
bgn_iter: int, plot from bgn_iter | ||
fin_iter: int, plot finish at fin_iter | ||
interval_iter: int, interval of files. | ||
""" | ||
workspace = DIRECTORY['WORKSPACE'] | ||
tr_snr = args.tr_snr | ||
tr_losses, te_losses, iters = [], [], [] | ||
|
||
# Load stats. | ||
stats_dir = os.path.join(workspace, "training_stats", "%ddb" % int(tr_snr)) | ||
for iter in range(bgn_iter, fin_iter+1, interval_iter): | ||
stats_path = os.path.join(stats_dir, "%diters.p" % iter) | ||
dict = pickle.load(open(stats_path, 'rb')) | ||
tr_losses.append(dict['tr_loss']) | ||
te_losses.append(dict['te_loss']) | ||
iters.append(dict['iter']) | ||
|
||
# Plot | ||
line_tr, = plt.plot(tr_losses, c='b', label="Train") | ||
line_te, = plt.plot(te_losses, c='r', label="Test") | ||
plt.axis([0, len(iters), 0, max(tr_losses)]) | ||
plt.xlabel("Iterations") | ||
plt.ylabel("Loss") | ||
plt.legend(handles=[line_tr, line_te]) | ||
plt.xticks(np.arange(len(iters)), iters) | ||
plt.show() | ||
|
||
|
||
def calculate_pesq(DIRECTORY, args): | ||
"""Calculate PESQ of all enhaced speech. | ||
Args: | ||
workspace: str, path of workspace. | ||
speech_dir: str, path of clean speech. | ||
te_snr: float, testing SNR. | ||
""" | ||
workspace = DIRECTORY['WORKSPACE'] | ||
speech_dir = DIRECTORY['TE_SPEECH_DIR'] | ||
te_snr = args.te_snr | ||
|
||
# Remove already existed file. | ||
os.system('del pesq_results.txt') | ||
|
||
# Calculate PESQ of all enhaced speech. | ||
enh_speech_dir = os.path.join(workspace, "enh_wavs", "test", "%ddb" % int(te_snr)) | ||
names = os.listdir(enh_speech_dir) | ||
for (cnt, na) in enumerate(names): | ||
print(cnt, na) | ||
enh_path = os.path.join(enh_speech_dir, na) | ||
|
||
speech_na = na.split('.')[0] | ||
speech_path = os.path.join(speech_dir, "%s.WAV" % speech_na) | ||
|
||
# Call executable PESQ tool. | ||
cmd = ' '.join(["pesq2.exe", speech_path, enh_path, '+'+str(args.sample_rate)]) | ||
# os.system(cmd) | ||
result = os.popen(cmd).read() | ||
print(result) | ||
|
||
def get_stats(DIRECTORY, args): | ||
"""Calculate stats of PESQ. | ||
""" | ||
workspace = DIRECTORY['WORKSPACE'] | ||
pesq_path = "pesq_results.txt" | ||
with open(pesq_path, 'rt') as f: | ||
reader = csv.reader(f, delimiter='\t') | ||
lis = list(reader) | ||
|
||
pesq_dict = {} | ||
for i1 in range(1, len(lis) - 1): | ||
li = lis[i1] | ||
na = li[1] | ||
pesq = float(li[2]) | ||
noise_type = na.split('.')[1] | ||
if noise_type not in pesq_dict.keys(): | ||
pesq_dict[noise_type] = [pesq] | ||
else: | ||
pesq_dict[noise_type].append(pesq) | ||
|
||
avg_list, std_list = [], [] | ||
result_path = os.path.join(workspace, "result") | ||
makedirs(result_path) | ||
result_path = os.path.join(result_path,"result.txt") | ||
file = open(result_path, "w") | ||
f = "{0:<16} {1:<16}" | ||
file.write(f.format("Noise", "PESQ")+"\n") | ||
file.write("---------------------------------\n") | ||
for noise_type in pesq_dict.keys(): | ||
pesqs = pesq_dict[noise_type] | ||
avg_pesq = np.mean(pesqs) | ||
std_pesq = np.std(pesqs) | ||
avg_list.append(avg_pesq) | ||
std_list.append(std_pesq) | ||
file.write(f.format(noise_type, "%.2f +- %.2f\n" % (avg_pesq, std_pesq))) | ||
file.write("---------------------------------\n") | ||
file.write(f.format("Avg.", "%.2f +- %.2f\n" % (np.mean(avg_list), np.mean(std_list)))) | ||
file.close() | ||
print("Average PESQ score: %s" %np.mean(avg_list)) |
Oops, something went wrong.