Hand-Signs Recognition using Deep Learning Convolutional Neural Networks - keras

I am developing a CNN model to recognize 24 hand-signs of American Sign Language. I have 2500 Images/hand-sign. The data split is:
Training = 1250 Images/hand-sign
Validation = 625 Images/hand-sign
Testing = 625 Images/hand-sign
How should I proceed with training the model?:
1. Should I develop a model starting from fewer hand-signs (like 5) and then increase them gradually?
2. Should I start models from scratch or use transfer learning (VGG16 or other)
Applying data augmentation, I did some tests with VGG16 and added a dense classifier at the end and received these accuracies:
Train: 0.87610877
Validation: 0.8867307
Test: 0.96533334
Accuracy and Loss Graph
Test parameters:
NUM_CLASSES = 5
EPOCHS = 50
STEPS_PER_EPOCH = 125
VALIDATION_STEPS = 75
TEST_STEPS = 75
Framework = Keras, Tensorflow
OPTIMIZER = adam
Model:
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(256, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(512, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(512, activation='relu'),
Dense(NUM_CLASSES, activation='softmax')
])
If I try images with slightly different background and predict the classes (predict_classes()), I do not get accurate results. Any suggestions on how to make the model robust?

Related

Adding CTC Loss and CTC decode to a Keras model

I am trying to solve a use case of handwritten text recognition. I have used CNN and LSTM to create a network. The output of this needs to be fed to a CTC layer. I could find some codes to do this in native tensorflow. Is there an easier option for this in Keras.
model = Sequential()
model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Lambda(lambda x: x[:, :, 0, :], output_shape=(None,31,512), mask=None, arguments=None))
#model.add(Bidirectional(LSTM(256, return_sequences=True), input_shape=(31, 256)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Dense(75, activation = 'softmax'))
Any help on how we can easily add CTC Loss and Decode layers to this would be great
A CTC loss function requires four arguments to compute the loss, predicted outputs, ground truth labels, input sequence length to LSTM and ground truth label length. To get this we need to create a custom loss function and then pass it to the model. To make it compatible with your defined model, we need to create a model which takes these four inputs and outputs the loss. This model will be used for training and for testing, the model that you have created earlier can be used.
Let's create a keras model that you used in a different way so that we can create two different versions of the model to be used at training and testing time.
# input with shape of height=32 and width=128
inputs = Input(shape=(32, 128, 1))
# convolution layer with kernel size (3,3)
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
# poolig layer with kernel size (2,2)
pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)
conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)
conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_2)
conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_3)
# poolig layer with kernel size (2,1)
pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)
conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)
conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)
conv_7 = Conv2D(512, (2, 2), activation='relu')(pool_6)
squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)
# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(blstm_1)
outputs = Dense(len(char_list) + 1, activation='softmax')(blstm_2)
# model to be used at test time
test_model = Model(inputs, outputs)
We will use ctc_loss_fuction during training. So, lets implement the ctc_loss_function and create a training model using ctc_loss_function:
labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels,
input_length, label_length])
#model to be used at training time
training_model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
--> Train this model and save the weights in .h5 file
Now use the test model and load saved weights of the training model by using arguments by_name=True so it will load weights for only matching layers.

How should I fit the sample data in model combining CNN and LSTM

I have a sample data of page visits of one page for 803 days. I have extracted features from data like mean, median etc and final shape of data is (803, 25). I had taken a train set of 640 and a test set of 160. I am trying to use CNN+LSTM model using Keras. But I am getting an error in model.fit method.
I have tried permute layer and changed input shapes but still not able to fix it.
trainX.shape = (642, 1, 25)
trainY.shape = (642,)
testX.shape = (161, 1, 25)
testY.shape = (161,)
'''python
# Basic layer
model = Sequential()
model.add(TimeDistributed(Convolution2D(filters = 32, kernel_size = (3, 3), strides=1, padding='SAME', input_shape = (642, 25, 1), activation = 'relu')))
model.add(TimeDistributed(Convolution2D(filters = 32, kernel_size = (3, 3), activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size = (2, 2))))
model.add(TimeDistributed(Convolution2D(32, 3, 3, activation = 'relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size = (2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(Permute((2, 3), input_shape=(1, 25)))
model.add(LSTM(units=54, return_sequences=True))
# To avoid overfitting
model.add(Dropout(0.2))
# Adding 6 more layers
model.add(LSTM(units=25, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=54))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(units = 1, activation='relu', kernel_regularizer=regularizers.l1(0.0001))))
model.add(PReLU(weights=None, alpha_initializer="zero")) # add an advanced activation
model.compile(optimizer = 'adam', loss = customSmapeLoss, metrics=['mae'])
model.fit(trainX, trainY, epochs = 50, batch_size = 32)
predictions = model.predict(testX)
'''
#Runtime Error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-218-86932db86d0b> in <module>()
42
43 model.compile(optimizer = 'adam', loss = customSmapeLoss, metrics=['mae'])
---> 44 model.fit(trainX, trainY, epochs = 50, batch_size = 32)
Error - IndexError: list index out of range
The input_shape requires a tuple of length 4 when it using TimeDistributed with Conv2D
see https://keras.io/layers/wrappers/
input_shape=(10, 299, 299, 3))
Dont you think your data field is too little? usually, CNN+LSTM is meant for a more complicated task with thousands of sequential images/videos.

How apply Gridsearch on autoencoder model?

I want to apply GridSearchCV on the autoencoder model. The code of the atuoencoder and GridSearchCV is added below please tell me how I change this code to run GridSearchCV successfully.
autoencoder = Sequential()
# Encoder Layers
autoencoder.add(Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=x_train.shape[1:]))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))
# Flatten encoding for visualization
autoencoder.add(Flatten())
autoencoder.add(Reshape((4, 4, 8)))
# Decoder Layers
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(16, (3, 3), activation='relu'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(1, (3, 3), activation='sigmoid', padding='same'))
autoencoder.summary()
I want to apply GridSearch on the above autoencoder code
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
model_classifier = KerasClassifier(autoencoder, verbose=1, batch_size=10, epochs=10)
# define the grid search parameters
batch_size = [10]
loss = ['mean_squared_error', 'binary_crossentropy']
optimizer = [Adam, SGD, RMSprop]
learning_rate = [0.001]
epochs = [3, 5]
param_grid = dict(optimizer=optimizer, learning_rate=learning_rate)
grid = GridSearchCV(cv=[(slice(None), slice(None))], estimator=model_classifier, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(x_train, x_train)
print("training Successfully completed")
I have solved this by hard code. I applied for lop on every parameter and get the result.
For best parameter selection I have find the parameter on which I have got high results.

How to copy weights from a 2D convnet in to a 3D Convnet on Keras?

I'm trying to implement a 3D convnet followed by LSTM layer for sequence generation using 3D images as input , on Keras with Tensorflow backend.
I would like to start training with weights of an existing pre-trained model in order to avoid common issues with random initialization .
In order to start with a basic example, I took VGG-16 and I implemented a "3D" version of this network (without the FC layers):
img_input = Input((100,80,80,3))
x = Conv3D(64, (3, 3 ,3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv3D(64, (3, 3 ,3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling3D((1, 2, 2), strides=(1, 2, 2), name='block1_pool')(x)
x = Conv3D(128, (3, 3 ,3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv3D(128, (3, 3 ,3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1,2, 2), name='block2_pool')(x)
x = Conv3D(256, (3, 3 ,3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv3D(256, (3, 3 , 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv3D(256, (3, 3, 3), activation='relu', padding='same', name='block3_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1,2, 2), name='block3_pool')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1, 2, 2), name='block4_pool')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1, 2, 2), name='block5_pool')(x)
So I would like to know how can I load the weights of the pretrained VGG-16 into each one of the 100 slices (my 3D images are composed by 100 80x80 rgb slices) ,
Any advise you can give to me would be useful,
Thanks
This depends on what you are looking to do in your application.
If you are just looking to process the 3D image in terms of slices, then you can define a TimeDistributed VGG16 network (Conv2D instead of Conv3D) would be the way to go.
The model then becomes something like this for each layer you define above:
img_input = Input((100,80,80,3))
x = TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', trainable=False))(img_input)
x = TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', trainable=False))(x)
x = TimeDistributed((MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', trainable=False)(x)
...
...
Note that I have included the option 'trainable=False' over here. This is pretty useful if you only want to train the deeper layers and freeze the lower layers with the well trained wights of VGG.
To load the VGG weights for the model, you can then use the load_weights function of Keras.
model.load_weights(filepath, by_name=True)
If you set the layer names which you do not want to train to be the same as what is defined in the VGG16, then you can simply load those layers by name over here.
However, spatio-temporal feature learning is something that can be potentially done better much by using 3D ConvNets.
If this is the basis of your application, then you cannot directly import VGG16 weights in to a Conv3D model, because the number of parameters in each layer now increases because the filter went from being a 3*3 to a 3*3*3 for example.
You could still load the weights layer by layer to the model by considering which patch of 3*3 from the 3*3*3 would be most suitable for initialization with VGG16 weights. set_weights() function takes as input a list of numpy arrays (for the kernel weights and the bias respectively). You can extract each layer weight from VGG16 and then construct a new numpy array for an equivalent Conv3D weight matrix and feed it to your Conv3D model.
But I would encourage you to look at existing literature and models for processing 3D images to see if those can give you the better initialization using transfer learning.
For example, C3D is one such popular model. ShapeNet and Pascal3D are popular 3D datasets.
This discussion on how to process video data might also be useful to give you better insights on how to proceed.

How can I train on video data using Keras? "transfer learning"

I want to train my model on video data for gesture recognition, proposed using LSTM's and TimeDistributed layers. Would this be ideal way to tackle my problem?
# Convolution
pool_size = 4
# LSTM
lstm_output_size = 1
print('Build model...')
model = Sequential()
model.add(TimeDistributed(Dense(62), input_shape=(img_width, img_height,3)))
model.add(Conv2D(32, (3, 3)))
model.add(Dropout(0.25))
model.add(Conv2D(32, (3, 3)))
model.add(MaxPooling2D(pool_size=pool_size))
# model.add(Dense(1))
model.add(TimeDistributed(Flatten()))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(lstm_output_size))
model.add(Dense(units = 1, activation = 'sigmoid'))
print('Train...')
model.summary()
# Run epochs of sampling data then training
For temporal sequence data LSTM networks are generally the right choice. If you want to analyze video then a combination with 2d convolutions sounds reasonable to me. However, you have to apply TimeDistributed on all layers which dont expect sequence data. In your example that means all laysers expect LSTM.
# Convolution
pool_size = 4
# LSTM
lstm_output_size = 1
print('Build model...')
model = Sequential()
model.add(TimeDistributed(Dense(62), input_shape=(img_width, img_height,3)))
model.add(TimeDistributed(Conv2D(32, (3, 3))))
model.add(Dropout(0.25))
model.add(TimeDistributed(Conv2D(32, (3, 3))))
model.add(TimeDistributed(MaxPooling2D(pool_size=pool_size)))
# model.add(Dense(1))
model.add(TimeDistributed(Flatten()))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(256, return_sequences=True))
model.add(CuDNNLSTM(lstm_output_size))
model.add(Dense(units = 1, activation = 'sigmoid'))
print('Train...')
model.summary()
# run epochs of sampling data then training
The last Dense Layer can stay this way because the final lstm doesnt output a sequence.

Resources