- Heartbeat Sounds Classification and Segmentation
The dataset is available on Kaggle.
Set A:
dataset fname label sublabel
0 a set_a/artifact__201012172012.wav artifact NaN
1 a set_a/artifact__201105040918.wav artifact NaN
2 a set_a/artifact__201105041959.wav artifact NaN
3 a set_a/artifact__201105051017.wav artifact NaN
4 a set_a/artifact__201105060108.wav artifact NaN
... ... ... ... ...
170 a set_a/__201108222234.wav NaN NaN
171 a set_a/__201108222241.wav NaN NaN
172 a set_a/__201108222244.wav NaN NaN
173 a set_a/__201108222247.wav NaN NaN
174 a set_a/__201108222254.wav NaN NaN
175 rows Ă— 4 columns
RangeIndex: 176 entries, 0 to 175
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dataset 176 non-null object
1 fname 176 non-null object
2 label 124 non-null object
3 sublabel 0 non-null float64
dtypes: float64(1), object(3)
memory usage: 5.6+ KB
artifact 40
murmur 34
normal 31
extrahls 19
We should remove all null label entries from the dataframe
setA = setA.dropna(subset=['label'])
We should also remove the artifact label entries since they are anomalies
setA = setA[setA.label != 'artifact']
Now lets look at the altered value counts with the 3 classes
murmur 34
normal 31
extrahls 19
Set B:
dataset fname label sublabel
0 b set_b/Btraining_extrastole_127_1306764300147_C... extrastole NaN
1 b set_b/Btraining_extrastole_128_1306344005749_A... extrastole NaN
2 b set_b/Btraining_extrastole_130_1306347376079_D... extrastole NaN
3 b set_b/Btraining_extrastole_134_1306428161797_C... extrastole NaN
4 b set_b/Btraining_extrastole_138_1306762146980_B... extrastole NaN
... ... ... ... ...
650 b set_b/Btraining_normal_Btraining_noisynormal_2... normal noisynormal
651 b set_b/Btraining_normal_Btraining_noisynormal_2... normal noisynormal
652 b set_b/Btraining_normal_Btraining_noisynormal_2... normal noisynormal
653 b set_b/Btraining_normal_Btraining_noisynormal_2... normal noisynormal
654 b set_b/Btraining_normal_Btraining_noisynormal_2... normal noisynormal
655 rows Ă— 4 columns
RangeIndex: 656 entries, 0 to 655
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 dataset 656 non-null object
1 fname 656 non-null object
2 label 461 non-null object
3 sublabel 149 non-null object
dtypes: object(4)
memory usage: 20.6+ KB
dataset fname label sublabel
count 656 656 461 149
unique 1 656 3 2
top b set_b/Btraining_extrastole_127_1306764300147_C... normal noisynormal
freq 656 1 320 120
We will do the same here as with set A and drop all entries with null values This will give us 3 classes in set B
setB = setB.dropna(subset=['label'])
normal 320
murmur 95
extrastole 46
We will need all 4 categories together to be able to classify, so will join both sets A and B
Combining Sets A and B:
setAB = [setA,setB]
ABdf = pd.concat(setAB)
dataset fname label
40 a set_a/extrahls__201101070953.wav extrahls
41 a set_a/extrahls__201101091153.wav extrahls
42 a set_a/extrahls__201101152255.wav extrahls
43 a set_a/extrahls__201101160804.wav extrahls
44 a set_a/extrahls__201101160808.wav extrahls
.. ... ... ...
650 b set_b/Btraining_normal_Btraining_noisynormal_2... normal
651 b set_b/Btraining_normal_Btraining_noisynormal_2... normal
652 b set_b/Btraining_normal_Btraining_noisynormal_2... normal
653 b set_b/Btraining_normal_Btraining_noisynormal_2... normal
654 b set_b/Btraining_normal_Btraining_noisynormal_2... normal
[544 rows x 4 columns]
The distribution of each class: normal 351
murmur 129
extrastole 46
extrahls 19
The data is very unbalanced, so we will upsample extrahls and extrastole and downsample normal
Set A timing:
fname cycle sound location
0 set_a/normal__201102081321.wav 1 S1 10021
1 set_a/normal__201102081321.wav 1 S2 20759
2 set_a/normal__201102081321.wav 2 S1 35075
3 set_a/normal__201102081321.wav 2 S2 47244
4 set_a/normal__201102081321.wav 3 S1 62992
... ... ... ... ...
384 set_a/normal__201108011118.wav 10 S1 272527
385 set_a/normal__201108011118.wav 10 S2 284673
386 set_a/normal__201108011118.wav 11 S1 300863
387 set_a/normal__201108011118.wav 11 S2 314279
388 set_a/normal__201108011118.wav 12 S1 330980
389 rows Ă— 4 columns
RangeIndex: 390 entries, 0 to 389
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fname 390 non-null object
1 cycle 390 non-null int64
2 sound 390 non-null object
3 location 390 non-null int64
dtypes: int64(2), object(2)
memory usage: 12.3+ KB
cycle location
count 390.000000 390.000000
mean 5.733333 164639.984615
std 3.732807 99310.875752
min 1.000000 2583.000000
25% 3.000000 82313.000000
50% 5.000000 155624.500000
75% 8.000000 239709.750000
max 19.000000 390873.000000
Check to see if the lub and dub are the same amount
S1 195
S2 195
Plot the cycles against their locations
g= sns.histplot(data=setAtiming, x="cycle", y="location", cbar=True)
Analyzing the Audios:
There are naming mistakes in the csv files so we renamed 'fname' to match the audio files. We then created a new dataframe with the information needed.
Audio Label
0 drive/MyDrive/PR_assignment3/set_a/murmur__201... murmur
1 drive/MyDrive/PR_assignment3/set_a/murmur__201... murmur
2 drive/MyDrive/PR_assignment3/set_a/murmur__201... murmur
3 drive/MyDrive/PR_assignment3/set_a/murmur__201... murmur
4 drive/MyDrive/PR_assignment3/set_a/murmur__201... murmur
... ... ...
510 drive/MyDrive/PR_assignment3/set_b/normal_nois... normal
511 drive/MyDrive/PR_assignment3/set_b/normal__159... normal
512 drive/MyDrive/PR_assignment3/set_b/normal_nois... normal
513 drive/MyDrive/PR_assignment3/set_a/normal__201... normal
514 drive/MyDrive/PR_assignment3/set_b/normal_nois... normal
515 rows Ă— 2 columns
- Extrahls
Spectrogram to represent the noise or sound intensity of audio data with respect to frequency and time
Feature Extraction from audio:
Visualize audio data focused on a particular point or mean (centroid)
MFCCs small set of features that describe the overall shape of a spectral envelope
- Murmur
Spectrogram to represent the noise or sound intensity of audio data with respect to frequency and time
Feature Extraction from audio:
Visualize audio data focused on a particular point or mean (centroid)
MFCCs small set of features that describe the overall shape of a spectral envelope
- Normal
Spectrogram to represent the noise or sound intensity of audio data with respect to frequency and time
Feature Extraction from audio:
Visualize audio data focused on a particular point or mean (centroid)
MFCCs small set of features that describe the overall shape of a spectral envelope
- Extrastole
Spectrogram to represent the noise or sound intensity of audio data with respect to frequency and time
Feature Extraction from audio:
Visualize audio data focused on a particular point or mean (centroid)
MFCCs small set of features that describe the overall shape of a spectral envelope
Split the data 70 train, 15 validation, 15 test.
For task A (size):
x test = 177
x test shape = (59, 3)
y test = 59
x train = 816
x train shape = (272, 3)
y train = 272
x val = 177
x validation shape = (59, 3)
y val = 59
For task B (size):
x test = 78
y test = 78
x train = 360
y train = 360
x val = 78
y val = 78
The data is prepared for classification using the following steps:
- The audio files are sampled at a constant rate of 22050 Hz.
- The shorter audio files are padded with zeros to match the length of the longest audio file at 12 seconds.
- The lables are one-hot encoded.
labels = {
'murmur' : np.array([1,0,0,0]),
'normal' : np.array([0,1,0,0]),
'extrahls' : np.array([0,0,1,0]),
'extrastole' : np.array([0,0,0,1]),
Hyperparameter | Value |
Learning Rate Schedule | Step Decay |
Learning Rate Factor | 2e-5 |
Learning Rate Patience | 35 |
Activation Function | ReLU , Softmax |
Optimizer | Adam |
Loss Function | Categorical Cross Entropy |
Epochs | 500 |
Early Stopping | True |
Early Stopping Patience | 50 |
We try different architectures for the fully connected neural network.
Model: "ClassifierA"
Layer (type) Output Shape Param #
flatten_21 (Flatten) (None, 40) 0
dense_57 (Dense) (None, 2048) 83968
dense_58 (Dense) (None, 512) 1049088
dense_59 (Dense) (None, 4) 2052
Total params: 1,135,108
Trainable params: 1,135,108
Non-trainable params: 0
Information | Value |
Number of epochs | 171 |
Training Accuracy | 0.93 |
Training Loss | 0.24 |
Validation Accuracy | 0.78 |
Validation Loss | 0.72 |
We add regularization to the model to prevent overfitting.
Model: "ClassifierB"
Layer (type) Output Shape Param #
flatten_22 (Flatten) (None, 40) 0
dense_60 (Dense) (None, 2048) 83968
dense_61 (Dense) (None, 512) 1049088
dropout_13 (Dropout) (None, 512) 0
dense_62 (Dense) (None, 4) 2052
Total params: 1,135,108
Trainable params: 1,135,108
Non-trainable params: 0
Number of epochs : 308
Training Accuracy : 0.89
Training Loss : 0.30
Validation Accuracy : 0.74
Validation Loss : 0.61
Model: "ClassifierC"
Layer (type) Output Shape Param #
flatten_29 (Flatten) (None, 40) 0
dense_85 (Dense) (None, 2048) 83968
dense_86 (Dense) (None, 1024) 2098176
dense_87 (Dense) (None, 64) 65600
dense_88 (Dense) (None, 64) 4160
dense_89 (Dense) (None, 4) 260
Total params: 2,252,164
Trainable params: 2,252,164
Non-trainable params: 0
Number of epochs : 169
Training Accuracy : 0.86
Training Loss : 0.32
Validation Accuracy : 0.76
Validation Loss : 0.63
The best model is the first model.
Model | Accuracy | Loss | AUC |
C | 0.76 | 0.63 | 0.97 |
A | 0.83 | 0.71 | 0.92 |
precision recall f1-score support
murmur 0.86 0.71 0.77 17
normal 0.75 0.63 0.69 19
extrahls 0.95 1.00 0.98 20
extrastole 0.78 0.95 0.86 22
accuracy 0.83 78
macro avg 0.83 0.82 0.82 78
weighted avg 0.83 0.83 0.83 78
precision recall f1-score support
murmur 0.65 0.76 0.70 17
normal 0.80 0.42 0.55 19
extrahls 0.95 1.00 0.98 20
extrastole 0.78 0.95 0.86 22
accuracy 0.79 78
macro avg 0.80 0.79 0.77 78
weighted avg 0.80 0.79 0.78 78
We try different architectures for the convolutional neural network.
Model: "sequential_9"
Layer (type) Output Shape Param #
conv1d_38 (Conv1D) (None, 40, 64) 256
conv1d_39 (Conv1D) (None, 40, 64) 12352
max_pooling1d_19 (MaxPooling (None, 20, 64) 0
conv1d_40 (Conv1D) (None, 20, 32) 6176
conv1d_41 (Conv1D) (None, 20, 32) 3104
max_pooling1d_20 (MaxPooling (None, 10, 32) 0
flatten_23 (Flatten) (None, 320) 0
dense_63 (Dense) (None, 64) 20544
dense_64 (Dense) (None, 4) 260
Total params: 42,692
Trainable params: 42,692
Non-trainable params: 0
Information | Value |
Number of epochs | 145 |
Training Accuracy | 0.9667 |
Training Loss | 0.1258 |
Validation Accuracy | 0.8333 |
Validation Loss | 0.6647 |
We add batch normalization and dropout.
Model: "sequential_10"
Layer (type) Output Shape Param #
conv1d_42 (Conv1D) (None, 40, 64) 256
conv1d_43 (Conv1D) (None, 40, 64) 12352
max_pooling1d_21 (MaxPooling (None, 20, 64) 0
batch_normalization_8 (Batch (None, 20, 64) 256
conv1d_44 (Conv1D) (None, 20, 32) 6176
conv1d_45 (Conv1D) (None, 20, 32) 3104
max_pooling1d_22 (MaxPooling (None, 10, 32) 0
batch_normalization_9 (Batch (None, 10, 32) 128
flatten_24 (Flatten) (None, 320) 0
dropout_14 (Dropout) (None, 320) 0
dense_65 (Dense) (None, 64) 20544
dropout_15 (Dropout) (None, 64) 0
dense_66 (Dense) (None, 4) 260
Total params: 43,076
Trainable params: 42,884
Non-trainable params: 192
Note: The model is trained for 500 epochs because we do not use early stopping.
Information | Value |
Number of epochs | 500 |
Training Accuracy | 0.9972 |
Training Loss | 0.02 |
Validation Accuracy | 0.7821 |
Validation Loss | 0.7693 |
We add more layers to the model.
Model: "sequential_11"
Layer (type) Output Shape Param #
conv1d_46 (Conv1D) (None, 40, 64) 256
conv1d_47 (Conv1D) (None, 40, 64) 12352
max_pooling1d_23 (MaxPooling (None, 20, 64) 0
conv1d_48 (Conv1D) (None, 20, 32) 6176
conv1d_49 (Conv1D) (None, 20, 32) 3104
max_pooling1d_24 (MaxPooling (None, 10, 32) 0
conv1d_50 (Conv1D) (None, 10, 16) 1552
conv1d_51 (Conv1D) (None, 10, 16) 784
max_pooling1d_25 (MaxPooling (None, 5, 16) 0
flatten_25 (Flatten) (None, 80) 0
dense_67 (Dense) (None, 64) 5184
dense_68 (Dense) (None, 64) 4160
dense_69 (Dense) (None, 4) 260
Total params: 33,828
Trainable params: 33,828
Non-trainable params: 0
Information | Value |
Number of epochs | 144 |
Training Accuracy | 0.9972 |
Training Loss | 0.0565 |
Validation Accuracy | 0.8077 |
Validation Loss | 0.7947 |
The best model is the first model.
Model | Accuracy | Loss | AUC |
A | 0.7436 | 0.5520 | 0.946 |
C | 0.73 | 0.79 | 0.92 |
precision recall f1-score support
murmur 0.65 0.65 0.65 17
normal 0.53 0.42 0.47 19
extrahls 0.95 1.00 0.98 20
extrastole 0.76 0.86 0.81 22
accuracy 0.74 78
macro avg 0.72 0.73 0.73 78
weighted avg 0.73 0.74 0.73 78
precision recall f1-score support
murmur 0.65 0.65 0.65 17
normal 0.50 0.53 0.51 19
extrahls 0.95 1.00 0.98 20
extrastole 0.80 0.73 0.76 22
accuracy 0.73 78
macro avg 0.72 0.73 0.72 78
weighted avg 0.73 0.73 0.73 78
- CNN models take much less time to train than feed forward networks.
- The accuracy of the CNN models is not as good as the feed forward networks.
- Regularization by adding dropout does not always prevent overfitting.
model architechture:
def create_mlp(dim, regress=False):
# define our MLP network
model = Sequential()
model.add(Dense(2048, input_dim=dim, activation="relu"))
model.add(Dense(512, activation="relu"))
model.add(Dense(256, activation="relu"))
# check to see if the regression node should be added
if regress:
model.add(Dense(1, activation="linear"))
# return our model
return model
adam optmizer:
import tensorflow as tf
values = np.arange(0.000001,0.0003,0.00002)[::-1]
# values = np.array([0.00003,0.00005,0.00007,0.00009,0.0001,0.0003])[::-1]
boundaries = np.arange(10, 600,35)[:values.shape[0]-1]
scheduler = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
list(boundaries), list(values))
lrscheduler = tf.keras.callbacks.LearningRateScheduler(scheduler,verbose=1)
import tensorflow_addons as tfa
metric = tfa.metrics.r_square.RSquare()
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mse", optimizer=opt,metrics=metric)
training loss, rsquareloss = 0.06839843094348907, -0.07385599613189697
Mean absolute error = 0.22
Mean squared error = 0.07
Median absolute error = 0.22
Explain variance score = 0.0
R2 score = -0.03
mean -12.865519
std 160.42049
model architechture:
from sklearn.datasets import load_boston
from keras.models import Sequential
from keras.layers import Dense, Conv1D, Flatten
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
model = Sequential()
model.add(Conv1D(256, 2, activation="relu", input_shape=(20, 1)))
model.add(Dense(128, activation="relu"))
model.add(Dense(64, activation="relu"))
metric = tfa.metrics.r_square.RSquare()
model.compile(loss="mse", optimizer="adam",metrics=metric)
same learning rate and meterics as feed forward network
training loss,rsquare loss=[0.06612320989370346, 0.0006309747695922852]
Mean absolute error = 0.22
Mean squared error = 0.07
Median absolute error = 0.22
Explain variance score = 0.0
R2 score = -0.03
cnn got better values in regression model than feed forward network