initial commit

Signed-off-by: jeonghwaYoo <green2368@naver.com>
jeongHwarr · Mar 12, 2019 · c8e4338 · c8e4338
commit c8e4338
Show file tree

Hide file tree

Showing 20 changed files with 1,224 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,92 @@
+# Speech enhancement using deep neural networks (Keras implementation)
+by Yong Xu and Qiuqiang Kong
+Modified Jeonghwa Yoo (Env: python 3.5 and windows OS)
+
+This code uses deep neural network (DNN) to do speech enhancement. This code is a Keras implementation of The paper:
+
+[1] Xu, Y., Du, J., Dai, L.R. and Lee, C.H., 2015. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1), pp.7-19.
+
+Original C++ implementation is here (https://github.com/yongxuUSTC/DNN-for-speech-enhancement) by Yong Xu (yong.xu@surrey.ac.uk). This Keras re-implementation is done by Qiuqiang Kong (q.kong@surrey.ac.uk)
+
+<pre>
+Noise(0dB)   PESQ
+----------------------
+n64     1.36 +- 0.05
+n71     1.35 +- 0.18
+----------------------
+Avg.    1.35 +- 0.12
+</pre>
+
+## Run on TIMIT and 115 noises
+You may replace the mini data with your own data. We listed the data need to be prepared in meta_data/ to re-run the experiments in [1]. The data contains:
+
+Training:
+Speech: TIMIT 4620 training sentences. 
+Noise: 115 kinds of noises (http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/115noises.html)
+
+Testing:
+Speech: TIMIT 168 testing sentences (selected 10% from 1680 testing sentences)
+Noise: Noise 92 (http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html)
+
+Some of the dataset are not published. Instead, you could collect your own data. 
+
+1. Download and prepare data. 
+
+2. Set MINIDATA=0 in run.py. Modify WORKSPACE, TR_SPEECH_DIR, TR_NOISE_DIR, TE_SPEECH_DIR, TE_NOISE_DIR in run.py and some arguments (get_args() function) 
+
+3. Run run.py
+
+<pre>
+Iteration: 0, tr_loss: 1.228049, te_loss: 1.252313
+Iteration: 1000, tr_loss: 0.533825, te_loss: 0.677872
+Iteration: 2000, tr_loss: 0.505751, te_loss: 0.678816
+Iteration: 3000, tr_loss: 0.483631, te_loss: 0.666576
+Iteration: 4000, tr_loss: 0.480287, te_loss: 0.675403
+Iteration: 5000, tr_loss: 0.457020, te_loss: 0.676319
+Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_5000iters.h5
+Iteration: 6000, tr_loss: 0.461330, te_loss: 0.673847
+Iteration: 7000, tr_loss: 0.445159, te_loss: 0.668545
+Iteration: 8000, tr_loss: 0.447244, te_loss: 0.680740
+Iteration: 9000, tr_loss: 0.427652, te_loss: 0.678236
+Iteration: 10000, tr_loss: 0.421219, te_loss: 0.663294
+Saved model to /vol/vssp/msos/qk/workspaces/speech_enhancement/models/0db/md_10000iters.h5
+Training time: 202.551192045 s
+</pre>
+
+The final PESQ looks like:
+
+<pre>
+Noise(0dB)            PESQ
+---------------------------------
+pink             2.01 +- 0.23
+buccaneer1       1.88 +- 0.25
+factory2         2.21 +- 0.21
+hfchannel        1.63 +- 0.24
+factory1         1.93 +- 0.23
+babble           1.81 +- 0.28
+m109             2.13 +- 0.25
+leopard          2.49 +- 0.23
+volvo            2.83 +- 0.23
+buccaneer2       2.03 +- 0.25
+white            2.00 +- 0.21
+f16              1.86 +- 0.24
+destroyerops     1.99 +- 0.23
+destroyerengine  1.86 +- 0.23
+machinegun       2.55 +- 0.27
+---------------------------------
+Avg.             2.08 +- 0.24
+</pre>
+
+
+## Visualization
+In the inference step, you may add --visualize to the arguments to plot the mixture, clean and enhanced speech log magnitude spectrogram. 
+
+![alt text](https://github.com/yongxuUSTC/deep_learning_based_speech_enhancement_keras_python/blob/master/mixture2clean_dnn/appendix/enhanced_log_sp.png)
+
+## PESQ (windows OS) from
+https://uk.mathworks.com/matlabcentral/fileexchange/47333-pesq-matlab-driver
+
+## Bugs report:
+1. PESQ dose not support long path/folder name, so please shorten your path/folder name. Or you will get a wrong/low PESQ score (or you can modify the PESQ source code to enlarge the size of the path name variable)
+
+2. For larger dataset which can not be loaded into the momemory at one time, you can 1. prepare your training scp list ---> 2. random your training scp list ---> 3. split your triaining scp list into several parts ---> 4. read each part for training one by one
diff --git a/mini_data/test_noise/n64.wav b/mini_data/test_noise/n64.wav
diff --git a/mini_data/test_noise/n71.wav b/mini_data/test_noise/n71.wav
diff --git a/mini_data/test_speech/TEST_DR4_FDMS0_SX48.WAV b/mini_data/test_speech/TEST_DR4_FDMS0_SX48.WAV
diff --git a/mini_data/test_speech/TEST_DR5_MRJM3_SI1809.WAV b/mini_data/test_speech/TEST_DR5_MRJM3_SI1809.WAV
diff --git a/mini_data/train_noise/n1.wav b/mini_data/train_noise/n1.wav
diff --git a/mini_data/train_noise/n49.wav b/mini_data/train_noise/n49.wav
diff --git a/mini_data/train_noise/n95.wav b/mini_data/train_noise/n95.wav
diff --git a/mini_data/train_speech/TRAIN_DR1_FCJF0_SA1.WAV b/mini_data/train_speech/TRAIN_DR1_FCJF0_SA1.WAV
diff --git a/mini_data/train_speech/TRAIN_DR1_FKFB0_SX348.WAV b/mini_data/train_speech/TRAIN_DR1_FKFB0_SX348.WAV
diff --git a/mini_data/train_speech/TRAIN_DR1_MRDD0_SI1680.WAV b/mini_data/train_speech/TRAIN_DR1_MRDD0_SI1680.WAV
diff --git a/pesq2.exe b/pesq2.exe
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,8 @@
+soundfile>=0.9.0.post1
+numpy>=1.13.3
+matplotlib>=2.1.1
+scipy>=1.0.0
+h5py>=2.7.1
+scikit-learn>=0.19.1
+keras>=2.1.2
+tensorflow-gpu>=1.4.1
diff --git a/run.py b/run.py
@@ -0,0 +1,91 @@
+import os
+import argparse
+from work_module import prepare_data, main_dnn, evaluate 
+
+CURRENT_PATH = os.path.split(os.getcwd())[-1]
+
+MODEL_NAME='SE-SegCaps'
+
+MINIDATA = True #MINIDATA
+
+if MINIDATA == True:
+    WORKSPACE = os.path.join("D:/Python_output",CURRENT_PATH+"_MINIDATA")
+    TR_SPEECH_DIR="mini_data/train_speech"
+    TR_NOISE_DIR="mini_data/train_noise"
+    TE_SPEECH_DIR="mini_data/test_speech"
+    TE_NOISE_DIR="mini_data/test_noise"
+
+else:    
+    WORKSPACE = "D:/Python_output/"+CURRENT_PATH
+    TR_SPEECH_DIR="D:/train/speech"
+    TR_NOISE_DIR="D:/noise"
+    TE_SPEECH_DIR="D:/test"
+    TE_NOISE_DIR="D:/noise"   
+
+def get_args():
+    parser = argparse.ArgumentParser(description="Speech Enhancement using DNN.")
+    parser.add_argument('-sr', '--sample_rate', default=8000, type=int,
+                     help="target sampling rate of audio")
+    parser.add_argument('--fft', default=256, type=int,
+                         help="FFT size")
+    parser.add_argument('--window', default=256, type=int,
+                         help="FFT size")
+    parser.add_argument('--overlap', default=192, type=int,
+                     help="overlap size of spectrogram") 
+    parser.add_argument('--n_concat', default=7, type=int,
+                     help="number of frames to concatentate")
+
+    parser.add_argument('--tr_snr', default=0, type=int, 
+                        help="SNR of training data")
+    parser.add_argument('--te_snr', default=0, type=int, 
+                        help="SNR of test data") 
+
+    parser.add_argument('--iter', default=10000, type=int)
+    parser.add_argument('--debug_inter', default=1000, type=int, 
+                        help="Interval to debug model")
+    parser.add_argument('--save_inter', default=5000, type=int, 
+                        help="Interval to save model")
+    parser.add_argument('-b', '--batch_size', default=32, type=int)  
+    parser.add_argument('--lr', default=0.0001, type=float,
+                         help="Initial learning rate")
+    parser.add_argument('-visual', '--visualize', default=1, type=int, choices=[0,1],
+                        help="If value is 1, visualization of result of inference") 
+
+    parser.add_argument('--train', default=1, type=int, choices=[0,1],
+                        help="If the value is 1, run trainining") 
+    parser.add_argument('--test', default=1, type=int, choices=[0,1],
+                         help="If the value is 1, run test")
+
+    args = parser.parse_args() 
+    return args
+
+if __name__ == '__main__':
+
+    args = get_args()
+
+    DIRECTORY = {}   
+    DIRECTORY['WORKSPACE'] = WORKSPACE
+    DIRECTORY['TR_SPEECH_DIR'] = TR_SPEECH_DIR
+    DIRECTORY['TR_NOISE_DIR'] = TR_NOISE_DIR
+    DIRECTORY['TE_SPEECH_DIR'] = TE_SPEECH_DIR
+    DIRECTORY['TE_NOISE_DIR'] = TE_NOISE_DIR
+
+    prepare_data.create_mixture_csv(DIRECTORY, args, mode='train')
+    prepare_data.create_mixture_csv(DIRECTORY, args, mode='test')
+
+    prepare_data.calculate_mixture_features(DIRECTORY, args, mode='train')
+    prepare_data.calculate_mixture_features(DIRECTORY, args, mode='test')
+
+    prepare_data.pack_features(DIRECTORY, args, mode='train')
+    prepare_data.pack_features(DIRECTORY, args, mode='test')
+
+    if args.train==1:
+        prepare_data.compute_scaler(DIRECTORY, args, mode='train')
+        main_dnn.train(DIRECTORY, args)
+        evaluate.plot_training_stat(DIRECTORY, args, bgn_iter=0, fin_iter=args.iter, interval_iter=args.debug_inter)
+
+    if args.test==1:
+        main_dnn.inference(DIRECTORY, args)
+        evaluate.calculate_pesq(DIRECTORY,args)
+        evaluate.get_stats(DIRECTORY,args)
+
diff --git a/utils/__init__.py b/utils/__init__.py
@@ -0,0 +1,7 @@
+import os
+
+
+def makedirs(path):
+    if not os.path.exists(path):
+        print(" [*] Make directories : {}".format(path))
+        os.makedirs(path)
diff --git a/work_module/data_generator.py b/work_module/data_generator.py
@@ -0,0 +1,36 @@
+import numpy as np
+
+class DataGenerator(object):
+    def __init__(self, batch_size, type, te_max_iter=None):
+        assert type in ['train', 'test']
+        self._batch_size_ = batch_size
+        self._type_ = type
+        self._te_max_iter_ = te_max_iter
+
+    def generate(self, xs, ys):
+        x = xs[0]
+        y = ys[0]
+        batch_size = self._batch_size_
+        n_samples = len(x)
+
+        index = np.arange(n_samples)
+        np.random.shuffle(index)
+
+        iter = 0
+        epoch = 0
+        pointer = 0
+        while True:
+            if (self._type_ == 'test') and (self._te_max_iter_ is not None):
+                if iter == self._te_max_iter_:
+                    break
+            iter += 1
+            if pointer >= n_samples:
+                epoch += 1
+                if (self._type_) == 'test' and (epoch == 1):
+                    break
+                pointer = 0
+                np.random.shuffle(index)                
+
+            batch_idx = index[pointer : min(pointer + batch_size, n_samples)]
+            pointer += batch_size
+            yield x[batch_idx], y[batch_idx]
diff --git a/work_module/evaluate.py b/work_module/evaluate.py
@@ -0,0 +1,119 @@
+"""
+Summary:  Calculate PESQ and overal stats of enhanced speech. 
+Author:   Qiuqiang Kong
+Created:  2017.12.22
+Modified: -
+"""
+import argparse
+import os
+import csv
+import numpy as np
+import pickle
+import matplotlib.pyplot as plt
+from utils import makedirs
+
+
+def plot_training_stat(DIRECTORY, args, bgn_iter, fin_iter, interval_iter):
+    """Plot training and testing loss. 
+    
+    Args: 
+      workspace: str, path of workspace. 
+      tr_snr: float, training SNR. 
+      bgn_iter: int, plot from bgn_iter
+      fin_iter: int, plot finish at fin_iter
+      interval_iter: int, interval of files. 
+    """
+    workspace = DIRECTORY['WORKSPACE']
+    tr_snr = args.tr_snr 
+    tr_losses, te_losses, iters = [], [], []
+
+    # Load stats. 
+    stats_dir = os.path.join(workspace, "training_stats", "%ddb" % int(tr_snr))
+    for iter in range(bgn_iter, fin_iter+1, interval_iter):
+        stats_path = os.path.join(stats_dir, "%diters.p" % iter)
+        dict = pickle.load(open(stats_path, 'rb'))
+        tr_losses.append(dict['tr_loss'])
+        te_losses.append(dict['te_loss'])
+        iters.append(dict['iter'])
+
+    # Plot
+    line_tr, = plt.plot(tr_losses, c='b', label="Train")
+    line_te, = plt.plot(te_losses, c='r', label="Test")
+    plt.axis([0, len(iters), 0, max(tr_losses)])
+    plt.xlabel("Iterations")
+    plt.ylabel("Loss")
+    plt.legend(handles=[line_tr, line_te])
+    plt.xticks(np.arange(len(iters)), iters)
+    plt.show()
+
+
+def calculate_pesq(DIRECTORY, args):
+    """Calculate PESQ of all enhaced speech. 
+    
+    Args:
+      workspace: str, path of workspace. 
+      speech_dir: str, path of clean speech. 
+      te_snr: float, testing SNR. 
+    """
+    workspace = DIRECTORY['WORKSPACE']
+    speech_dir = DIRECTORY['TE_SPEECH_DIR']
+    te_snr = args.te_snr 
+
+    # Remove already existed file. 
+    os.system('del pesq_results.txt')
+
+    # Calculate PESQ of all enhaced speech. 
+    enh_speech_dir = os.path.join(workspace, "enh_wavs", "test", "%ddb" % int(te_snr))
+    names = os.listdir(enh_speech_dir)
+    for (cnt, na) in enumerate(names):
+        print(cnt, na)
+        enh_path = os.path.join(enh_speech_dir, na)
+
+        speech_na = na.split('.')[0]
+        speech_path = os.path.join(speech_dir, "%s.WAV" % speech_na)
+
+        # Call executable PESQ tool. 
+        cmd = ' '.join(["pesq2.exe", speech_path, enh_path, '+'+str(args.sample_rate)])
+#        os.system(cmd)  
+        result = os.popen(cmd).read()
+        print(result)
+
+def get_stats(DIRECTORY, args):
+    """Calculate stats of PESQ. 
+    """
+    workspace = DIRECTORY['WORKSPACE']
+    pesq_path = "pesq_results.txt"
+    with open(pesq_path, 'rt') as f:
+        reader = csv.reader(f, delimiter='\t')
+        lis = list(reader)
+
+    pesq_dict = {}
+    for i1 in range(1, len(lis) - 1):
+        li = lis[i1]
+        na = li[1]
+        pesq = float(li[2])
+        noise_type = na.split('.')[1]
+        if noise_type not in pesq_dict.keys():
+            pesq_dict[noise_type] = [pesq]
+        else:
+            pesq_dict[noise_type].append(pesq)
+
+    avg_list, std_list = [], []
+    result_path = os.path.join(workspace, "result")
+    makedirs(result_path)
+    result_path = os.path.join(result_path,"result.txt")
+    file = open(result_path, "w")
+    f = "{0:<16} {1:<16}"
+    file.write(f.format("Noise", "PESQ")+"\n")
+    file.write("---------------------------------\n")
+    for noise_type in pesq_dict.keys():
+        pesqs = pesq_dict[noise_type]
+        avg_pesq = np.mean(pesqs)
+        std_pesq = np.std(pesqs)
+        avg_list.append(avg_pesq)
+        std_list.append(std_pesq)
+        file.write(f.format(noise_type, "%.2f +- %.2f\n" % (avg_pesq, std_pesq)))
+    file.write("---------------------------------\n")
+    file.write(f.format("Avg.", "%.2f +- %.2f\n" % (np.mean(avg_list), np.mean(std_list))))
+    file.close()
+    print("Average PESQ score: %s" %np.mean(avg_list))