Is it possible to train a CNN starting at an intermediate layer (in general and in Keras)? - keras

I'm using mobilenet v2 to train a model on my images. I've frozen all but a few layers and then added additional layers for training. I'd like to be able to train from an intermediate layer rather than from the beginning. My questions:
Is it possible to provide the output of the last frozen layer as the
input for training (it would be a tensor of (?, 7,7,1280))?
How does one specify training to start from that first trainable
(non-frozen) layer? In this case, mbnetv2_conv.layer[153].
What is y_train in this case? I don't quite understand how y_train
is being used during the training process- in general, when does the
CNN refer back to y_train?
Load mobilenet v2
image_size = 224
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze all layers except the last 3 layers
for layer in mbnetv2_conv.layers[:-3]:
layer.trainable = False
# Create the model
model = models.Sequential()
model.add(mbnetv2_conv)
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(3, activation='softmax'))
model.summary()
# Build an array (?,224,224,3) from images
x_train = np.array(all_images)
# Get layer output
from keras import backend as K
get_last_frozen_layer_output = K.function([mbnetv2_conv.layers[0].input],
[mbnetv2_conv.layers[152].output])
last_frozen_layer_output = get_last_frozen_layer_output([x_train])[0]
# Compile the model
from keras.optimizers import SGD
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['acc'])
# how to train from a specific layer and what should y_train be?
model.fit(last_frozen_layer_output, y_train, batch_size=2, epochs=10)

Yes, you can. Two different ways.
First, the hard way makes you build two new models, one with all your frozen layers, one with all your trainable layers. Add a Flatten() layer to the frozen-layers-only model. And you will copy the weights from mobilenet v2 layer by layer to populate the weights of the frozen-layers-only model. Then you will run your input images through the frozen-layers-only model, saving the output to disk in CSV or pickle form. This is now the input for your trainable-layers model, which you train with the model.fit() command as you did above. Save the weights when you're done training. Then you will have to build the original model with both sets of layers, and load the weights into each layer, and save the whole thing. You're done!
However, the easier way is to save the weights of your model separately from the architecture with:
model.save_weights(filename)
then modify the layer.trainable property of the layers in MobileNetV2 before you add it into a new empty model:
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
for layer in mbnetv2_conv.layers[:153]:
layer.trainable = False
model = models.Sequential()
model.add(mbnetv2_conv)
then reload the weights with
newmodel.load_weights(filename)
This lets you adjust which layers in your mbnetv2_conv model you will be training on the fly, and then just call model.fit() to continue training.

Related

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

Retraining the Inception V3 Model for Machine Learning

I'm doing image classification with two classes using the Inception V3 model. Since I'm using two new classes(Normal and Abnormal) I'm freezing the top layers of the Inception V3 model and replacing it with my own.
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
#Classifier Model ontop of Convolutional Model
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(400,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
Is freezing the convolutional layers this way in Inception V3 necessary for training?
#freeze the convolutional layers of InceptionV3
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])
No, it is not necessary to freeze the first layers of a CNN; you can just initialize the weights from a pre-trained model. However, in most cases it is recommended to freeze them as the features they can extract are generic enough to help in any image-related task and doing so can speed up the training process.
That being said, you should experiment a bit with the number of layers you want to freeze. Allowing the latter layers of your base_model to fine-tune on your task could improve performance. You can think of it like a hyper-parameter of your model. Say you want to freeze only the first 30 layers:
for layer in model.layers[:30]:
layer.trainable = False

Is there any method to plot vector(matrix) values after each NN layer

I modified the existing activation function and using it in the Convolutional layer of the Neural Network. I would like to know how does it perform compared to the existing activation function.Is there any method/function to plot in a graph the results(matrix values) after each Neural network layer,so that I could customise my activation function according to the values for better results?
model = Sequential()
e = Embedding(vocab_size, 100, weights=[embedding_matrix], input_length=max_length, trainable=False)
model.add(e)
model.add(Conv1D(64,kernel_size,padding='valid',activation=newactivation,strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Conv1D(256,kernel_size,padding='valid',activation=newactivation,strides=1))
model.add(MaxPooling1D(pool_size=pool_size))
model.add(Bidirectional(GRU(gru_output_size, dropout=0.2, recurrent_dropout=0.2)))
model.add(Bidirectional(LSTM(lstm_output_size)))
model.add(Dense(nclass, activation='softmax'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())
model.fit(padded_docs,y_train, epochs=epoch_size, verbose=0)
loss, accuracy = model.evaluate(tpadded_docs, y_test, verbose=0)
I cannot comment yet so I post this as an answer:
Refer to the Keras FAQ: "How can I obtain the output of an intermediate layer?"
It shows you how you can access the output of each layer. If you are using the version that uses the keras function, you can even access the output of the model in the learning phase (if your model contains layer that behave differently in training vs. testing).

ValueError when Fine-tuning Inception_v3 in Keras

I am trying to fine-tune pre-trained Inceptionv3 in Keras for a multi-label (17) prediction problem.
Here's the code:
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a new top layer
x = base_model.output
predictions = Dense(17, activation='sigmoid')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(loss='binary_crossentropy', # We NEED binary here, since categorical_crossentropy l1 norms the output before calculating loss.
optimizer=SGD(lr=0.0001, momentum=0.9))
# Fit the model (Add history so that the history may be saved)
history = model.fit(x_train, y_train,
batch_size=128,
epochs=1,
verbose=1,
callbacks=callbacks_list,
validation_data=(x_valid, y_valid))
But I got into the following error message and had trouble deciphering what it is saying:
ValueError: Error when checking target: expected dense_1 to have 4
dimensions, but got array with shape (1024, 17)
It seems to have something to do with that it doesn't like my one-hot encoding for the labels as target. But how do I get 4 dimensions target?
It turns out that the code copied from https://keras.io/applications/ would not run out-of-the-box.
The following post has helped me:
Keras VGG16 fine tuning
The changes I need to make are the following:
Add in the input shape to the model definition base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3)), and
Add a Flatten() layer to flatten the tensor output:
x = base_model.output
x = Flatten()(x)
predictions = Dense(17, activation='sigmoid')(x)
Then the model works for me!

Training only one output of a network in Keras

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.
At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.
Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?
I'm using the Theano backend.
Outputting multiple results and optimizing only one of them
Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:
Let's start with this model:
inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)
# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)
# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)
Compile with multiple outputs, set loss as None for extra outputs:
Give None for outputs that you don't want to use for loss calculation and optimization
model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
loss=['categorical_crossentropy', None],
metrics=['accuracy'])
Provide only target outputs when training. Skipping extra outputs:
model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)
# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)
One predict to get them all
Having one model you can run predict only once to get all outputs you need:
predicted_labels, useful_info = model.predict(new_x)
In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.
For example:
https://keras.io/getting-started/functional-api-guide/
from keras.layers import Input, Dense
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

Resources