Autoencoder for sound data in Keras - audio

I have a 2d array of log-scaled mel-spectrograms of sound samples for 5 different categories.
For training I have used convolutional and dense neural network in Keras. Here the code:
model = Sequential()
model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
model.add(MaxPooling1D(2,padding='same',strides=None))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
optimizer='adam',lr=0.01,
metrics=['accuracy'])
What kind of autoencoder can I apply to this type of data input? What model? Any suggestion or also code example would be helpful. :)

Since I don’t have answers to my question about the nature of the data, I will assume that we have set of 2 dimensional data with the shape like (NSamples, 68, 108). Also, I assume that answer on my suggestion to use Convolutional2D instead Convolutional1D is yes
Here is sample of models for convolutional auto encoder, model, which can use a trained auto encoder and how to use weights from an auto encoder for the final model:
from keras.layers.core import Dense, Dropout, Flatten, Reshape
from keras.layers import Conv1D, Conv2D, Deconv2D, MaxPooling1D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Flatten, BatchNormalization, Dropout
from keras.callbacks import ModelCheckpoint
import keras.models as models
import keras.initializers as initializers
from sklearn.model_selection import train_test_split
ae = models.Sequential()
#model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
#encoder
c = Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1))
ae.add(c)
ae.add(MaxPooling2D(pool_size=(2, 2), padding='same', strides=None))
ae.add(Flatten())
initializer=initializers.TruncatedNormal()
d1 = Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
ae.add(d1)
ae.add(BatchNormalization())
ae.add(Dropout(0.8))
d2 = Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
ae.add(d2)
ae.add(Dropout(0.8))
#decodser
ae.add(Dense(d2.input_shape[1], activation='sigmoid'))
ae.add(Dense(d1.input_shape[1], activation='sigmoid'))
ae.add(Reshape((30, 54, 80)))
ae.add(UpSampling2D((2,2)))
ae.add(Deconv2D(filters= c.filters, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
ae.add(Deconv2D(filters= 1, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
ae.compile(loss='binary_crossentropy',
optimizer='adam',lr=0.001,
metrics=['accuracy'])
ae.summary()
#now train your convolutional autoencoder to reconstruct your input data
#reshape your data to (NSamples, 60, 108, 1)
#Then train your autoencoder. it can be something like that:
#X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=43)
#pre_mcp = ModelCheckpoint("CAE.hdf5", monitor='val_accuracy', verbose=2, save_best_only=True, mode='max')
#pre_history = ae.fit(X_train, X_train, epochs=100, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
#model
model = models.Sequential()
#model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
model.add(Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(MaxPooling2D(pool_size=(2, 2), padding='same',strides=None))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
optimizer='adam',lr=0.001,
metrics=['accuracy'])
#Set weights
model.layers[0].set_weights(ae.layers[0].get_weights())
model.layers[3].set_weights(ae.layers[3].get_weights())
model.layers[4].set_weights(ae.layers[4].get_weights())
model.layers[6].set_weights(ae.layers[6].get_weights())
model.summary()
#Now you can train your model with pre-trained weights from autoencoder
A model like this was useful for me with MNIST dataset and improved accuracy of model with initial weights from auto encoder in comparison with model initialized with random weights
However, I would recommend using of several convolutional/deconvolutional layers, probably 3 or more, since from my experience convolutional auto encoders with 3 and more convolutional layers are more efficient than with 1 convolutional layer. In fact, with one convolutional layer I can’t even see any accuracy improvements sometimes
Update:
I checked auto encoder with data provided by Emanuela, also I checked it with different auto encoders architectures without any success
My hypothesis about that is that the data doesn’t contain any significant features, which can be distinguished by auto encoder or even CAE
However, it looks like my assumption about 2 dimensional nature of the data was confirmed by reaching of almost 99.99% validation accuracy:
Nevertheless, in the same time, 97.31% accuracy of training data can indicate potential issues with dataset, so it looks like a good idea to revise it
In addition, I would suggest using ensembles of networks. You could train, for example 10 networks with different validation data and assign a category for items by the most voted categories
Here is my code:
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers import Conv2D, BatchNormalization
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
import keras.models as models
import keras.initializers as initializers
import msgpack
import numpy as np
with open('SoundDataX.msg', "rb") as fx,open('SoundDataY.msg', "rb") as fy:
dataX=msgpack.load(fx)
dataY=msgpack.load(fy)
num_samples = len(dataX)
x = np.empty((num_samples, 60, 108, 1), dtype = np.float32)
y = np.empty((num_samples, 4), dtype = np.float32)
for i in range(0, num_samples):
x[i] = np.asanyarray(dataX[i]).reshape(60, 108, 1)
y[i] = np.asanyarray(dataY[i])
X_train, X_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=43)
#model
model = models.Sequential()
model.add(Conv2D(128, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Conv2D(128, 5, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Conv2D(128, 7, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(4, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001),
metrics=['accuracy'])
model.summary()
filepath="weights-{epoch:02d}-{val_acc:.7f}-{acc:.7f}.hdf5"
mcp = ModelCheckpoint(filepath, monitor='val_acc', verbose=2, save_best_only=True, mode='max')
history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), batch_size=64, verbose=2, callbacks=[mcp])

Related

ValueError: Input 0 of layer sequential_7 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (None, 1024)

New Python developer here. I looked at other similar posts here but i'm not able to get it right. Would appreciate any help.
print('X_train:', X_train.shape)
print('y_train:', y_train1.shape)
print('X_test:', X_train.shape)
print('y_test:', y_train1.shape)
X_train: (42000, 32, 32)
y_train: (42000,)
X_test: (42000, 32, 32)
y_test: (42000,)
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
def featuremodel() :
model = Sequential()
model.add(Conv2D(32, kernel_size=4, activation='relu', input_shape=(X_train.shape[0],32,64)))
model.add(MaxPooling2D(pool_size=3))
model.add(Conv2D(64, kernel_size=4, activation='relu'))
model.add(Flatten())
model.add(Dense(len(y_train[0]), activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['acc'])
model.summary()
model.fit(X_train, y_train, epochs = 10, validation_data = (X_test,y_test))
return model
ValueError: Input 0 of layer sequential_7 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (None, 1024)
The input shape you have specified should be changed. Your input has 42000 samples and each one has (32,32) shape. You should not pass number of samples (42000) to input layer, and you should add a channel dimension. So the input shape should be (32,32,1).
The modified code should be like this:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
# test data
X_train = tf.random.uniform((42000,32,32))
y_train1 = tf.random.uniform((42000,))
X_train = tf.expand_dims(X_train, axis=3) #add channel axis (42000,32,32) => (42000,32,32,1)
model = Sequential()
model.add(Conv2D(32, kernel_size=4, activation='relu', input_shape=(32,32,1))) #change input shape
model.add(MaxPooling2D(pool_size=3))
model.add(Conv2D(64, kernel_size=4, activation='relu'))
model.add(Flatten())
#last layer should have output like your y data. in this case it should be 1, since you have 1 y for each sample
model.add(Dense(1, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['acc'])
model.summary()
model.fit(X_train, y_train1, epochs = 10) #, validation_data = (X_test,y_test))

Q: ValueError Keras expected conv2d_14_input to have shape (3, 12, 1) but got array with shape (3, 12, 6500)?

I am building a CNN for non image data in Keras 2.1.0 on Window 10.
My input feature is a 3x12 matrix of non negative number and my output is a binary multi-label vector with length 6x1
And I was running into this error expected conv2d_14_input to have shape (3, 12, 1) but got array with shape (3, 12, 6500)
Here is my code below
import tensorflow as tf
from scipy.io import loadmat
import numpy as np
from tensorflow.keras.layers import BatchNormalization
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten
reshape_channel_train = loadmat('reshape_channel_train')
reshape_channel_test = loadmat('reshape_channel_test.mat')
reshape_label_train = loadmat('reshape_label_train')
reshape_label_test = loadmat('reshape_label_test')
X_train = reshape_channel_train['store_train']
X_test = reshape_channel_test['store_test']
X_train = np.expand_dims(X_train,axis = 0)
X_test = np.expand_dims(X_test, axis = 0)
Y_train = reshape_label_train['label_train']
Y_test = reshape_label_test['label_test']
classifier = Sequential()
classifier.add(Conv2D(8, kernel_size=(3,3) , input_shape=(3, 12, 1), padding="same"))
classifier.add(BatchNormalization())
classifier.add(Activation('relu'))
classifier.add(Conv2D(8, kernel_size=(3,3), input_shape=(3, 12, 1), padding="same"))
classifier.add(BatchNormalization())
classifier.add(Activation('relu'))
classifier.add(Flatten())
classifier.add(Dense(8, activation='relu'))
classifier.add(Dense(6, activation='sigmoid'))
classifier.compile(optimizer='nadam', loss='binary_crossentropy', metrics=['accuracy'])
history = classifier.fit(X_train, Y_train, batch_size = 32, epochs=100,
validation_data=(X_test, Y_test), verbose=2)
After some searching, I have use the dimension expanding trick but it seem not to work
X_train = np.expand_dims(X_train,axis = 0)
X_test = np.expand_dims(X_test, axis = 0)
The X_train variable containing 6500 training instances is loaded from a Matlab .mat file with dimension 3x12x6500.
Where each training instance is a 3x12 matrix.
Before using the expand_dim tricks, the k-th training sample could be invoke by X_train[:,:,k] and X_train[:,:,k].shape would return (3,12). Also X_train.shape would return (3, 12, 6500)
After using the expand_dim tricks the command X_train[:,:,k].shape would return (1, 3, 6500)
Please help me with this !
Thank you
you manage your data wrongly. A Conv2D layer accepts data in this format (n_sample, height, width, channels) which in your case (for your X_train) became (6500,3,12,1). you need to simply reconduct to this case
# create data as in your matlab data
n_class = 6
n_sample = 6500
X_train = np.random.uniform(0,1, (3,12,n_sample)) # (3,12,n_sample)
Y_train = tf.keras.utils.to_categorical(np.random.randint(0,n_class, n_sample)) # (n_sample, n_classes)
# reshape your data for conv2d
X_train = X_train.transpose(2,0,1) # (n_sample,3,12)
X_train = np.expand_dims(X_train, -1) # (n_sample,3,12,1)
classifier = Sequential()
classifier.add(Conv2D(8, kernel_size=(3,3) , input_shape=(3, 12, 1), padding="same"))
classifier.add(BatchNormalization())
classifier.add(Activation('relu'))
classifier.add(Conv2D(8, kernel_size=(3,3), padding="same"))
classifier.add(BatchNormalization())
classifier.add(Activation('relu'))
classifier.add(Flatten())
classifier.add(Dense(8, activation='relu'))
classifier.add(Dense(n_class, activation='softmax'))
classifier.compile(optimizer='nadam', loss='categorical_crossentropy', metrics=['accuracy'])
history = classifier.fit(X_train, Y_train, batch_size = 32, epochs=2, verbose=2)
# get predictions
pred = np.argmax(classifier.predict(X_train), 1)
I also use a softmax activation with categorical_crossentropy which is more suited for multiclass problem but you can also modify this. remember to applicate the same data manipulation also on your test data
you need to pass data_format="channels_last" argument, bcoz your channels are at last
you try this:
x_train=x_train.reshape((6500,3,12,1))
x_test=x_test.reshape((-1,3,12,1))
and in each of conv2d layer conv2D(<other args>, data_format="channels_last")

ValueError: input tensor must have rank 4 TensorFlow

i am using fashion_mnist images database (60,000 small square 28×28 pixel grayscale images) and i am trying to apply CNN-LSTM in cascading, this is the code i am using:
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
print("Shape of x_train: {}".format(x_train.shape))
print("Shape of y_train: {}".format(y_train.shape))
print()
print("Shape of x_test: {}".format(x_test.shape))
print("Shape of y_test: {}".format(y_test.shape))
# define CNN model
model = Sequential()
model.add(TimeDistributed(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=(60000,28,28))))
model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed((Dropout(0.25))))
model.add(TimeDistributed(Flatten()))
## LSTM
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(Dense(128, activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
##fitting model
model.fit(x_train,y_train,epochs=5)
test_loss, test_acc=model.evaluate(x_test,y_test)
print('Loss: {0}-Acc:{1}')
print(test_acc)
i get the error after running the fitting line, can any one help me solving the error.
Define an Input layer with a four-channel output instead of defining the input with Conv2D layer.

Load models in Keras

I use this code to load a model in Keras using a customer metric (AUC) but this does not work. Could you help me to solve that problem ?
train_datagen = ImageDataGenerator(rescale=1/255)
val_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(32, 32),
batch_size=10,
class_mode='binary')
val_generator = val_datagen.flow_from_directory(
val_dir,
target_size=(32, 32),
batch_size=10,
class_mode='binary')
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=[keras.metrics.AUC(name='auc')])
history = model.fit_generator(train_generator,
steps_per_epoch=1405,
epochs=1,
validation_data=val_generator,
validation_steps=10)
model.save('baseline.h5')
model1 = models.load_model('baseline.h5')
I got a ValueError
ValueError: Unknown metric function: {'class_name': 'AUC', 'config': {'name': 'auc', 'dtype': 'float32', 'num_thresholds': 200, 'curve': 'ROC', 'summation_method': 'interpolation', 'thresholds': [0.005025125628140704, 0.010050251256281407, 0.01507537688442211, 0.020100502512562814
EDIT : I add the imports. I have heard about the argument 'customer_objects' in the load_model method. But I tried : 'custom_object'={'auc':keras.metrics.AUC(name='auc')}
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
from keras import models
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
import os
from sklearn import metrics
from tensorflow import keras
Just don't compile the model:
model1 = models.load_model('baseline.h5', compile=False)
model1.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=[keras.metrics.AUC()])

Keras CNN-LSTM : Error while making y_train

This is my first time asking a question here (that's mean I'm really need help) and sorry for my bad English. I want to make a cnn-lstm layer for video classification in Keras but I have a problem on making my y_train. I will describe my problem after this.
I have videos dataset (1 video has 10 frames) and I converted the videos to images.
First I splited the dataset to xtrain, xtest, ytrain, and ytest (20% test, 80% train) and I did it.
X_train, X_test = img_data[:trainco], img_data[trainco:]
y_train, y_test = y[:trainco], y[trainco:]
X_train shape : (2280, 64, 64, 1) -> I have 2280 images, 64x64 height x widht, 1 channel
y_train shape : (2280, 26) -> 26 classes
And then I must reshape them before entering the cnn-lstm process. *note : I do the same thing with x_test and y_test
time_steps = 10 (because I have 10 frames per video)
X_train = X_train.reshape(int(X_train.shape[0] / time_steps), time_steps, X_train.shape[1], X_train.shape[2], X_train.shape[3])
y_train = y_train.reshape(int(y_train.shape[0] / time_steps), time_steps, y_train.shape[1])
X_train shape : (228, 10, 64, 64, 1), y_train shape : (228, 10, 26)
And then this is my model :
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
checkpoint = ModelCheckpoint(fname, monitor='acc', verbose=1, save_best_only=True, mode='max', save_weights_only=True)
hist = model.fit(X_train, y_train, batch_size=num_batch, nb_epoch=num_epoch, verbose=1, validation_data=(X_test, y_test), callbacks=[checkpoint])
But I got an error that says
ValueError: Error when checking target: expected dense_3 to have 2 dimensions, but got array with shape (228, 10, 26)
Like it says expected to have 2 dimensions. I changed the code to
y_train = y_train.reshape(int(y_train.shape[0] / time_steps), y_train.shape[1])
And I got an error again that says
ValueError: cannot reshape array of size 59280 into shape (228,26)
And then I change the code again to
y_train = y_train.reshape(y_train.shape[0], y_train.shape[1])
And I still got an error
ValueError: Input arrays should have the same number of samples as target arrays. Found 228 input samples and 2280 target samples.
What should I do? I know the problem but I don't know how to solve it. Please help me.
I recreated a slightly simplified version of your situation to reproduce the problem. Basically, it appears that the LSTM layer is only putting out one result for the entire sequence of time steps, thereby reducing the dimension from 3 to 2 in the output. If you run my program below, I've added the model.summary() which provides details of the architecture.
from keras import Sequential
from keras.layers import TimeDistributed, Dense, Conv2D, MaxPooling2D, Flatten, LSTM
import numpy as np
X_train = np.random.random((228, 10, 64, 64, 1))
y_train = np.random.randint(2, size=(228, 10, 26))
num_classes = 26
# Create the model
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (3, 3), strides=(2, 2), activation='relu', padding='same'), input_shape=X_train.shape[1:]))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten(),name='Flatten'))
model.add(LSTM(256, return_sequences=False, input_shape=(64, 64)))
model.add(Dense(128))
model.add(Dense(64))
model.add(Dense(num_classes, activation='softmax', name='FinalDense'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=["accuracy"])
#
model.summary()
# hist = model.fit(X_train, y_train, epochs=1)
I believe you'll need to decide if you want to reduce the dimension of the y_train (target) data to be consistent with the model, or change the model. I hope this helps.

Resources