Retraining the Inception V3 Model for Machine Learning - python-3.x

I'm doing image classification with two classes using the Inception V3 model. Since I'm using two new classes(Normal and Abnormal) I'm freezing the top layers of the Inception V3 model and replacing it with my own.
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
#Classifier Model ontop of Convolutional Model
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(400,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
Is freezing the convolutional layers this way in Inception V3 necessary for training?
#freeze the convolutional layers of InceptionV3
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])

No, it is not necessary to freeze the first layers of a CNN; you can just initialize the weights from a pre-trained model. However, in most cases it is recommended to freeze them as the features they can extract are generic enough to help in any image-related task and doing so can speed up the training process.
That being said, you should experiment a bit with the number of layers you want to freeze. Allowing the latter layers of your base_model to fine-tune on your task could improve performance. You can think of it like a hyper-parameter of your model. Say you want to freeze only the first 30 layers:
for layer in model.layers[:30]:
layer.trainable = False

Related

Question about understanding Weights of Keras LSTM model

I am implementing Federated Learning (FL) using Keras LSTM. (For this question, FL details are not necessary.)
Starting with the simple example where multiple models are trained at different clients. Each client shares their model weights with the server and (in this simple example), the model weights are averaged by the server and a global model is sent to the remaining clients. (Keeping long story short).
Keeping things further simple at this stage: I am using single LSTM unit, with input_shape = (1,1)
Now, when I tried to get the weights of Keras LSTM, it is a list of 3 arrays.
Weights[0] and Weights[1] elements are the floating point values, where as Weight[2] are the Binary 0/1 values. Is my understanding correct that Weight[2] is the On/OFF gate associated with the tanh gate?
Is there any information about these weights?
n_steps = 1
n_features = 1 # This indicates the number of past values
model1 = Sequential()
model1.add(LSTM(1, activation = 'relu', input_shape=(n_steps, n_features)))
model1.compile(loss='mae', optimizer = 'adamax')
Weights = model1.get_weights()
print(model1.summary())

Transfer Learning on Resnet50 doesn't go beyond 40%

I'm using Kaggle - Animal-10 dataset for experimenting transfer learning with FastAI and Keras.
Base model is Resnet-50.
With FastAI I'm able to get accuracy of 95% in 3 epochs.
learn.fine_tune(3, base_lr=1e-2, cbs=[ShowGraphCallback()])
I believe it only trains the top layers.
With Keras
If I train the complete Resnet then only I'm able to achieve accuracy of 96%
If I use the below code for transfer learning, then at max I'm able to reach 40%
num_classes = 10
#number of layers to retrain from previous model
fine_tune = 33 #conv5 block
model = Sequential()
base_layer = ResNet50(include_top=False, pooling='avg', weights="imagenet")
# base_layer.trainable = False
#make only last few layers trainable, except them make all false
for layer in base_layer.layers[:-fine_tune]:
layer.trainable = False
model.add(base_layer)
model.add(layers.Flatten())
# model.add(layers.BatchNormalization())
# model.add(layers.Dense(2048, activation='relu'))
# model.add(layers.Dropout(rate=0.2))
model.add(layers.Dense(1024, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dense(num_classes, activation='softmax'))
I assume the cause Transfer learning with Keras, validation accuracy does not improve from outset (beyond naive baseline) while train accuracy improves
and thats the reason that now I'm re-training complete block5 of Resnet and still it doesn't add any value.

Is it possible to train a CNN starting at an intermediate layer (in general and in Keras)?

I'm using mobilenet v2 to train a model on my images. I've frozen all but a few layers and then added additional layers for training. I'd like to be able to train from an intermediate layer rather than from the beginning. My questions:
Is it possible to provide the output of the last frozen layer as the
input for training (it would be a tensor of (?, 7,7,1280))?
How does one specify training to start from that first trainable
(non-frozen) layer? In this case, mbnetv2_conv.layer[153].
What is y_train in this case? I don't quite understand how y_train
is being used during the training process- in general, when does the
CNN refer back to y_train?
Load mobilenet v2
image_size = 224
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze all layers except the last 3 layers
for layer in mbnetv2_conv.layers[:-3]:
layer.trainable = False
# Create the model
model = models.Sequential()
model.add(mbnetv2_conv)
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(3, activation='softmax'))
model.summary()
# Build an array (?,224,224,3) from images
x_train = np.array(all_images)
# Get layer output
from keras import backend as K
get_last_frozen_layer_output = K.function([mbnetv2_conv.layers[0].input],
[mbnetv2_conv.layers[152].output])
last_frozen_layer_output = get_last_frozen_layer_output([x_train])[0]
# Compile the model
from keras.optimizers import SGD
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['acc'])
# how to train from a specific layer and what should y_train be?
model.fit(last_frozen_layer_output, y_train, batch_size=2, epochs=10)
Yes, you can. Two different ways.
First, the hard way makes you build two new models, one with all your frozen layers, one with all your trainable layers. Add a Flatten() layer to the frozen-layers-only model. And you will copy the weights from mobilenet v2 layer by layer to populate the weights of the frozen-layers-only model. Then you will run your input images through the frozen-layers-only model, saving the output to disk in CSV or pickle form. This is now the input for your trainable-layers model, which you train with the model.fit() command as you did above. Save the weights when you're done training. Then you will have to build the original model with both sets of layers, and load the weights into each layer, and save the whole thing. You're done!
However, the easier way is to save the weights of your model separately from the architecture with:
model.save_weights(filename)
then modify the layer.trainable property of the layers in MobileNetV2 before you add it into a new empty model:
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
for layer in mbnetv2_conv.layers[:153]:
layer.trainable = False
model = models.Sequential()
model.add(mbnetv2_conv)
then reload the weights with
newmodel.load_weights(filename)
This lets you adjust which layers in your mbnetv2_conv model you will be training on the fly, and then just call model.fit() to continue training.

Emotion detection on text

I am a newbie in ML and was experimenting with emotion detection on the text.
So I have an ISEAR dataset which contains tweets with their emotion labeled.
So my current accuracy is 63% and I want to increase to at least 70% or even more maybe.
Heres the code :
inputs = Input(shape=(MAX_LENGTH, ))
embedding_layer = Embedding(vocab_size,
64,
input_length=MAX_LENGTH)(inputs)
# x = Flatten()(embedding_layer)
x = LSTM(32, input_shape=(32, 32))(embedding_layer)
x = Dense(10, activation='relu')(x)
predictions = Dense(num_class, activation='softmax')(x)
model = Model(inputs=[inputs], outputs=predictions)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['acc'])
model.summary()
filepath="weights-simple.hdf5"
checkpointer = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
history = model.fit([X_train], batch_size=64, y=to_categorical(y_train), verbose=1, validation_split=0.1,
shuffle=True, epochs=10, callbacks=[checkpointer])
That's a pretty general question, optimizing the performance of a neural network may require tuning many factors.
For instance:
The optimizer chosen: in NLP tasks rmsprop is also a popular
optimizer
Tweaking the learning rate
Regularization - e.g dropout, recurrent_dropout, batch norm. This may help the model to generalize better
More units in the LSTM
More dimensions in the embedding
You can try grid search, e.g. using different optimizers and evaluate on a validation set.
The data may also need some tweaking, such as:
Text normalization - better representation of the tweets - remove unnecessary tokens (#, #)
Shuffle the data before the fit - keras validation_split creates a validation set using the last data records
There is no simple answer to your question.

shouldn't model.trainable=False freeze weights under the model?

I am trying to freeze the free trained VGG16's layers ('conv_base' below) and add new layers on top of them for feature extracting.
I expect to get same prediction results from 'conv_base' before(ret1) / after(ret2) fit of model but it is not.
Is this wrong way to check weight freezing?
loading VGG16 and set to untrainable
conv_base = applications.VGG16(weights='imagenet', include_top=False, input_shape=[150, 150, 3]) 
conv_base.trainable = False
result before model fit
ret1 = conv_base.predict(np.ones([1, 150, 150, 3]))
add layers on top of the VGG16 and compile a model
model = models.Sequential()
model .add(conv_base)
model .add(layers.Flatten())
model .add(layers.Dense(10, activation='relu'))
model .add(layers.Dense(1, activation='sigmoid'))
m.compile('rmsprop', 'binary_crossentropy', ['accuracy'])
fit the model
m.fit_generator(train_generator, 100, validation_data=validation_generator, validation_steps=50)
result after model fit
ret2 = conv_base.predict(np.ones([1, 150, 150, 3]))
hope this is True but it is not.
np.equal(ret1, ret2)
This is an interesting case. Why something like this happen is caused by the following thing:
You cannot freeze a whole model after compilation and it's not freezed if it's not compiled
If you set a flag model.trainable=False then while compiling keras sets all layers to be not trainable. If you set this flag after compilation - then it will not affect your model at all. The same - if you set this flag before compiling and then you'll reuse a part of a model for compiling another one - it will not affect your reused layers. So model.trainable=False works only when you'll apply it in a following order:
# model definition
model.trainable = False
model.compile()
In any other scenario it wouldn't work as expected.
You must freeze layers individually (before compilation):
for l in conv_base.layers:
l.trainable=False
And if this doesn't work, you should probably use the new sequential model to freeze the layers.
If you have models in models you should do this recursively:
def freezeLayer(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
freezeLayer(l)
freezeLayer(model)
The top-rated answer does not work. As suggested by Keras official documentation (https://keras.io/getting-started/faq/), it should be performed per layer. Although there is a parameter "trainable" for a model, it is probably not implemented yet. The safest way is to do as follows:
for layer in model.layers:
layer.trainable = False
model.compile()

Resources