Now official torchaudio supports MFCC!!! See Here. This Library will no longer be maintained
Based on this repository, this project extends the MFCC function for Pytorch so that backpropagation path could be established through.
- Python >= 3.5
- PyTorch >= 1.0
- numpy
- librosa
git clone
Parameters | Description |
samplerate | samplerate of the signal |
winlen | the length of the analysis window. Defaults 0.025s |
winstep | the length of step between each windows. Defaults 0.01s |
numcep | the number of cepstrum to return. Defaults 13 |
nfilt | the number of filters in the filterbank. Defaults 26 |
nfft | FFT size. Defaults 512 |
lowfreq | lowest band edge of mel filters(Hz) Defaults 0 |
highfreq | highest band edge of mel filters(Hz) Defaults samplerate/2 |
preemph | apply preemphasis filter with preemph as coefficient. 0 is no filter. Defaults 0.97 |
ceplifter | apply a lifter to final cepstral coefficients. 0 is no lifter. Defaults 22 |
appendEnergy | if this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy. |
import librosa
import torch
import pytorch_mfcc
import numpy
device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu') # Device
files = ['english.wav','english_crop.wav'] # Files to load
# Read files
signals = []
wav_lengths = []
sample_rate = 8000 # 8000 for the example file, but normally it is 22050 of 44100. Check it and be careful.
for f in files:
signal,rate = librosa.load(f,sr=sample_rate,mono=True) # Load wavefile. Be careful of the sampling rate.
# Pad signals with zeros, and make batch
max_length = max(wav_lengths)
signals_torch = []
for i in range(len(signals)):
signal = torch.tensor(signals[i],dtype=torch.float32).to(device)
zeros = torch.zeros(max_length - len(signal)).to(device)
signal =[signal,zeros])
signal_batch = torch.stack(signals_torch)
# Now do mfcc
mfcc_layer = pytorch_mfcc.MFCC(samplerate=sample_rate).to(device) # MFCC layer
val,mfcc_lengths = mfcc_layer(signal_batch,wav_lengths) # Do mfcc
- DCT for PyTorch by Ziyang Hu
- This project is based on python_speech_features by James Lyons
sample english.wav and english_crop.wav from:
sox -e signed-integer english.wav
Any contribution is welcomed. Please don't hesitate to make a pull request.