Adding CTC Loss and CTC decode to a Keras model - keras

I am trying to solve a use case of handwritten text recognition. I have used CNN and LSTM to create a network. The output of this needs to be fed to a CTC layer. I could find some codes to do this in native tensorflow. Is there an easier option for this in Keras.
model = Sequential()
model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Lambda(lambda x: x[:, :, 0, :], output_shape=(None,31,512), mask=None, arguments=None))
#model.add(Bidirectional(LSTM(256, return_sequences=True), input_shape=(31, 256)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Dense(75, activation = 'softmax'))
Any help on how we can easily add CTC Loss and Decode layers to this would be great

A CTC loss function requires four arguments to compute the loss, predicted outputs, ground truth labels, input sequence length to LSTM and ground truth label length. To get this we need to create a custom loss function and then pass it to the model. To make it compatible with your defined model, we need to create a model which takes these four inputs and outputs the loss. This model will be used for training and for testing, the model that you have created earlier can be used.
Let's create a keras model that you used in a different way so that we can create two different versions of the model to be used at training and testing time.
# input with shape of height=32 and width=128
inputs = Input(shape=(32, 128, 1))
# convolution layer with kernel size (3,3)
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
# poolig layer with kernel size (2,2)
pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)
conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)
conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_2)
conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_3)
# poolig layer with kernel size (2,1)
pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)
conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)
conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)
conv_7 = Conv2D(512, (2, 2), activation='relu')(pool_6)
squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)
# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(blstm_1)
outputs = Dense(len(char_list) + 1, activation='softmax')(blstm_2)
# model to be used at test time
test_model = Model(inputs, outputs)
We will use ctc_loss_fuction during training. So, lets implement the ctc_loss_function and create a training model using ctc_loss_function:
labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels,
input_length, label_length])
#model to be used at training time
training_model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
--> Train this model and save the weights in .h5 file
Now use the test model and load saved weights of the training model by using arguments by_name=True so it will load weights for only matching layers.

Related

How to pass two images directly into siamese neural network?

Basically, I am taking two images as inputs and preprocessing them and passing them as input to the Siamese CNN model.
def create_base_network_signet(input_shape):
seq = Sequential()
seq.add(Conv2D(96, kernel_size=(11, 11), activation='relu', name='conv1_1', strides=4, input_shape= input_shape,
kernel_initializer='glorot_uniform'))
seq.add(BatchNormalization(epsilon=1e-06, axis=1, momentum=0.9))
seq.add(MaxPooling2D((3,3), strides=(2, 2)))
seq.add(ZeroPadding2D((2, 2)))
seq.add(Conv2D(256, kernel_size=(5, 5), activation='relu', name='conv2_1', strides=1, kernel_initializer='glorot_uniform'))
seq.add(BatchNormalization(epsilon=1e-06, axis=1, momentum=0.9))
seq.add(MaxPooling2D((3,3), strides=(2, 2)))
seq.add(Dropout(0.3))
seq.add(ZeroPadding2D((1, 1)))
seq.add(Conv2D(384, kernel_size=(3, 3), activation='relu', name='conv3_1', strides=1, kernel_initializer='glorot_uniform'))
seq.add(ZeroPadding2D((1, 1)))
seq.add(Conv2D(256, kernel_size=(3, 3), activation='relu', name='conv3_2', strides=1, kernel_initializer='glorot_uniform'))
seq.add(MaxPooling2D((3,3), strides=(2, 2)))
seq.add(Dropout(0.3))# added extra
seq.add(Flatten(name='flatten'))
seq.add(Dense(1024, kernel_regularizer=l2(0.0005), activation='relu', kernel_initializer='glorot_uniform'))
seq.add(Dropout(0.5))
seq.add(Dense(128, kernel_regularizer=l2(0.0005), activation='relu', kernel_initializer='glorot_uniform'))
seq.add(Dense(1, activation='sigmoid'))
return seq
My aim to pass images is something similar to this below
result = model.fit([image1, image2], y = 1, epochs=10)
However, I am getting an error
Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'numpy.ndarray'>"}), <class 'int'>

Resnet autoencoder in keras I/0 issues

I am trying to code a deep auto encoder in keras. My image shape is (4575,32,32,3) and targets are (4575,1)
Here's the function
def build_deep_autoencoder(img_shape, code_size):
H,W,C = img_shape
# encoder
encoder = Sequential()
encoder.add(L.InputLayer(img_shape))
encoder.add(ResNet50(include_top=False,pooling='avg'))
encoder.add(Flatten())
encoder.add(Dense(512, activation='relu'))
encoder.add(Dropout(0.5))
encoder.add(BatchNormalization())
encoder.add(Dense(256, activation='relu'))
encoder.add(Dropout(0.5))
encoder.add(BatchNormalization())
encoder.add(Dense(code_size))
# decoder
decoder = Sequential()
decoder.add(L.InputLayer((code_size,)))
encoder.add(Flatten())
decoder.add(Dense(2*2*256))
decoder.add(Reshape((2, 2, 256)))
decoder.add(Conv2DTranspose(filters=128, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(Conv2DTranspose(filters=64, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(Conv2DTranspose(filters=32, kernel_size=(3, 3), strides=2, activation='elu', padding='same'))
decoder.add(Conv2DTranspose(filters=3, kernel_size=(3, 3), strides=2, activation=None, padding='same'))
return encoder, decoder
encoder,decoder = build_deep_autoencoder(img_shape,code_size=2)
inp = L.Input(img_shape)
code = encoder(inp)
reconstruction = decoder(code)
autoencoder = tensorflow.keras.models.Model(inp,reconstruction)
encoder.summary()
autoencoder.compile('nadam','mse')
autoencoder.fit(x=X,y=y,epochs=10)
I am getting an error:
InvalidArgumentError: Incompatible shapes: [31,32,32,3] vs. [31,1]
[[{{node training_18/Nadam/gradients/loss_12/sequential_28_loss/MeanSquaredError/sub_grad/BroadcastGradientArgs}}]]
I am using tensorflow.python.keras
Any help would be appreciated.

Hand-Signs Recognition using Deep Learning Convolutional Neural Networks

I am developing a CNN model to recognize 24 hand-signs of American Sign Language. I have 2500 Images/hand-sign. The data split is:
Training = 1250 Images/hand-sign
Validation = 625 Images/hand-sign
Testing = 625 Images/hand-sign
How should I proceed with training the model?:
1. Should I develop a model starting from fewer hand-signs (like 5) and then increase them gradually?
2. Should I start models from scratch or use transfer learning (VGG16 or other)
Applying data augmentation, I did some tests with VGG16 and added a dense classifier at the end and received these accuracies:
Train: 0.87610877
Validation: 0.8867307
Test: 0.96533334
Accuracy and Loss Graph
Test parameters:
NUM_CLASSES = 5
EPOCHS = 50
STEPS_PER_EPOCH = 125
VALIDATION_STEPS = 75
TEST_STEPS = 75
Framework = Keras, Tensorflow
OPTIMIZER = adam
Model:
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
MaxPooling2D(pool_size=(2,2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(256, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Conv2D(512, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2,2)),
Flatten(),
Dense(512, activation='relu'),
Dense(NUM_CLASSES, activation='softmax')
])
If I try images with slightly different background and predict the classes (predict_classes()), I do not get accurate results. Any suggestions on how to make the model robust?

How apply Gridsearch on autoencoder model?

I want to apply GridSearchCV on the autoencoder model. The code of the atuoencoder and GridSearchCV is added below please tell me how I change this code to run GridSearchCV successfully.
autoencoder = Sequential()
# Encoder Layers
autoencoder.add(Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=x_train.shape[1:]))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D((2, 2), padding='same'))
autoencoder.add(Conv2D(8, (3, 3), strides=(2,2), activation='relu', padding='same'))
# Flatten encoding for visualization
autoencoder.add(Flatten())
autoencoder.add(Reshape((4, 4, 8)))
# Decoder Layers
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(8, (3, 3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(16, (3, 3), activation='relu'))
autoencoder.add(UpSampling2D((2, 2)))
autoencoder.add(Conv2D(1, (3, 3), activation='sigmoid', padding='same'))
autoencoder.summary()
I want to apply GridSearch on the above autoencoder code
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
model_classifier = KerasClassifier(autoencoder, verbose=1, batch_size=10, epochs=10)
# define the grid search parameters
batch_size = [10]
loss = ['mean_squared_error', 'binary_crossentropy']
optimizer = [Adam, SGD, RMSprop]
learning_rate = [0.001]
epochs = [3, 5]
param_grid = dict(optimizer=optimizer, learning_rate=learning_rate)
grid = GridSearchCV(cv=[(slice(None), slice(None))], estimator=model_classifier, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(x_train, x_train)
print("training Successfully completed")
I have solved this by hard code. I applied for lop on every parameter and get the result.
For best parameter selection I have find the parameter on which I have got high results.

Keras Error - expected activation_1 to have shape (2622,) but got array with shape (1,)

Hello guys I am trying to make pretrained VGG16 on Keras
But it keeps give me error:
ValueError: Error when checking target: expected activation_1 to have
shape (2622,) but got array with shape (1,)
I was trying to create the model based on this poster : Link
Also, I took the pre-trained weight from here. This weight can be read on here
This my code:
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential, Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, ZeroPadding2D
from keras import backend as K
# dimensions of our images.
img_width, img_height = 224, 224
train_data_dir = 'database/train'
validation_data_dir = 'database/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
# build the VGG16 network
model = applications.VGG16(weights='imagenet', include_top=False)
print('VGG Pretrained Model loaded.')
model = Sequential()
model.add(ZeroPadding2D((1, 1), input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))
model.add(Conv2D(4096, (7, 7), activation='relu'))
model.add(Dropout(0.5))
model.add(Conv2D(4096, (1, 1), activation='relu'))
model.add(Dropout(0.5))
model.add(Conv2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation('softmax'))
# model.load_weights('./vgg16_face_weights.h5')
#
# vgg_face_descriptor = Model(inputs=model.layers[0].input, outputs=model.layers[-2].output)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1. / 224,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 224)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
model.save_weights('first_try.h5')
You probably have only one folder inside 'database/train' and 'database/validation'.
Please make sure you have 2622 folders in the two folders so that keras can generate the label correctly.
Following is an example showing that the label should have shape of (batch_size, 2622).
# the above remains the same
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
import numpy as np
classes = 2622
batch_size = 4
y = np.zeros((batch_size, classes))
for i in range(batch_size):
y[i, np.random.choice(classes)] = 1
model.fit(x=np.random.random((batch_size,)+input_shape), y=y, batch_size=batch_size)
model.save_weights('first_try.h5')
EDIT:
To change the last Conv2D layer from 2622 filters to 12 filters while maintaining the loaded weights, here is a workaround:
#define model and load_weights
#......
#build a new model based on the last model
conv = Conv2D(12, (1, 1))(model.layers[-4].output)
flatten = Flatten()(conv)
softmax = Activation('softmax')(flatten)
final_model = Model(inputs=model.input, outputs=softmax)
Ref:Cannot add layers to saved Keras Model. 'Model' object has no attribute 'add'

Resources