shouldn't model.trainable=False freeze weights under the model? - keras

I am trying to freeze the free trained VGG16's layers ('conv_base' below) and add new layers on top of them for feature extracting.
I expect to get same prediction results from 'conv_base' before(ret1) / after(ret2) fit of model but it is not.
Is this wrong way to check weight freezing?
loading VGG16 and set to untrainable
conv_base = applications.VGG16(weights='imagenet', include_top=False, input_shape=[150, 150, 3]) 
conv_base.trainable = False
result before model fit
ret1 = conv_base.predict(np.ones([1, 150, 150, 3]))
add layers on top of the VGG16 and compile a model
model = models.Sequential()
model .add(conv_base)
model .add(layers.Flatten())
model .add(layers.Dense(10, activation='relu'))
model .add(layers.Dense(1, activation='sigmoid'))
m.compile('rmsprop', 'binary_crossentropy', ['accuracy'])
fit the model
m.fit_generator(train_generator, 100, validation_data=validation_generator, validation_steps=50)
result after model fit
ret2 = conv_base.predict(np.ones([1, 150, 150, 3]))
hope this is True but it is not.
np.equal(ret1, ret2)

This is an interesting case. Why something like this happen is caused by the following thing:
You cannot freeze a whole model after compilation and it's not freezed if it's not compiled
If you set a flag model.trainable=False then while compiling keras sets all layers to be not trainable. If you set this flag after compilation - then it will not affect your model at all. The same - if you set this flag before compiling and then you'll reuse a part of a model for compiling another one - it will not affect your reused layers. So model.trainable=False works only when you'll apply it in a following order:
# model definition
model.trainable = False
model.compile()
In any other scenario it wouldn't work as expected.

You must freeze layers individually (before compilation):
for l in conv_base.layers:
l.trainable=False
And if this doesn't work, you should probably use the new sequential model to freeze the layers.
If you have models in models you should do this recursively:
def freezeLayer(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
freezeLayer(l)
freezeLayer(model)

The top-rated answer does not work. As suggested by Keras official documentation (https://keras.io/getting-started/faq/), it should be performed per layer. Although there is a parameter "trainable" for a model, it is probably not implemented yet. The safest way is to do as follows:
for layer in model.layers:
layer.trainable = False
model.compile()

Related

CNN alternates between good performance and chance

I have a binary classification problem I am trying to solve with a CNN written in Keras. The input are very sparse 200X125X2 tensors (can be though of as two images stacked together), and its nonzero elements are only ones (representing neuron spike trains). The input is generated using a data generator that I have built, so the model is trained using the fit_generator function.
I have tried various architectures, and some show a decent performance (~88%), but the thing is that sometimes when I train new models, they don't seem to work at all, giving a chance (50%) result every epoch. The weird thing is that it happens sometimes to the same architectures that worked well before. I am running the code on Google Colab (GPU) with TensorFlow 2.0. I have check multiple times that I haven't changed anything in the code. I know that random initialization of the weights and biases may cause slight changes in the performance, but it looks like something else.
Any ideas will be very helpful. Thanks!
Here is the relevant code for one of the models that had this problem (I am using unusual kernels, I know):
# General settings
x_max = 10
x_size, t_size, N_features = parameters(x_max)
batch_size = 64
N_epochs = 10
N_final = 10*N_features
N_final = int(N_final - N_final%(batch_size))
N_val = 100*batch_size
N_test = N_final/5
# Setting up the architecture of the network and compiling
model = Sequential()
model.add(SeparableConv2D(50, (50,30), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(SeparableConv2D(100, (10,6), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fitiing the model on generated data
filepath="......hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
start = time.time()
fit_history = model_delta.fit_generator(generator = data_generator(batch_size,x_max,'delta','_',100),
steps_per_epoch = N_final//batch_size,
validation_data = data_generator(batch_size,x_max,'delta','_',100),
validation_steps = N_val//batch_size,
callbacks = [checkpoint],
epochs = N_epochs)
end = time.time()
The most suspicious thing I see is a 'relu' near the end of the model. Depending on the initialization and on the learning rate, ReLUs can be unlucky and fall into an all-zeros case. When this happens, they completely stop gradients and don't train anymore.
By the looks of your problem (sometimes it works, sometimes it doesn't), it seems very plausible that it's the relu.
So, the first suggesion (this always solves it) is to add a batch normalization before the activation:
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid')
Hint, if you are going to use it with the 4D tensors before the flatten, remember to use the channels dimension: BatchNormalization(1).

Is it possible to train a CNN starting at an intermediate layer (in general and in Keras)?

I'm using mobilenet v2 to train a model on my images. I've frozen all but a few layers and then added additional layers for training. I'd like to be able to train from an intermediate layer rather than from the beginning. My questions:
Is it possible to provide the output of the last frozen layer as the
input for training (it would be a tensor of (?, 7,7,1280))?
How does one specify training to start from that first trainable
(non-frozen) layer? In this case, mbnetv2_conv.layer[153].
What is y_train in this case? I don't quite understand how y_train
is being used during the training process- in general, when does the
CNN refer back to y_train?
Load mobilenet v2
image_size = 224
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
# Freeze all layers except the last 3 layers
for layer in mbnetv2_conv.layers[:-3]:
layer.trainable = False
# Create the model
model = models.Sequential()
model.add(mbnetv2_conv)
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(3, activation='softmax'))
model.summary()
# Build an array (?,224,224,3) from images
x_train = np.array(all_images)
# Get layer output
from keras import backend as K
get_last_frozen_layer_output = K.function([mbnetv2_conv.layers[0].input],
[mbnetv2_conv.layers[152].output])
last_frozen_layer_output = get_last_frozen_layer_output([x_train])[0]
# Compile the model
from keras.optimizers import SGD
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['acc'])
# how to train from a specific layer and what should y_train be?
model.fit(last_frozen_layer_output, y_train, batch_size=2, epochs=10)
Yes, you can. Two different ways.
First, the hard way makes you build two new models, one with all your frozen layers, one with all your trainable layers. Add a Flatten() layer to the frozen-layers-only model. And you will copy the weights from mobilenet v2 layer by layer to populate the weights of the frozen-layers-only model. Then you will run your input images through the frozen-layers-only model, saving the output to disk in CSV or pickle form. This is now the input for your trainable-layers model, which you train with the model.fit() command as you did above. Save the weights when you're done training. Then you will have to build the original model with both sets of layers, and load the weights into each layer, and save the whole thing. You're done!
However, the easier way is to save the weights of your model separately from the architecture with:
model.save_weights(filename)
then modify the layer.trainable property of the layers in MobileNetV2 before you add it into a new empty model:
mbnetv2_conv = MobileNetV2(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))
for layer in mbnetv2_conv.layers[:153]:
layer.trainable = False
model = models.Sequential()
model.add(mbnetv2_conv)
then reload the weights with
newmodel.load_weights(filename)
This lets you adjust which layers in your mbnetv2_conv model you will be training on the fly, and then just call model.fit() to continue training.

Modify ResNet50 output layer for regression

I am trying to create a ResNet50 model for a regression problem, with an output value ranging from -1 to 1.
I omitted the classes argument, and in my preprocessing step I resize my images to 224,224,3.
I try to create the model with
def create_resnet(load_pretrained=False):
if load_pretrained:
weights = 'imagenet'
else:
weights = None
# Get base model
base_model = ResNet50(weights=weights)
optimizer = Adam(lr=1e-3)
base_model.compile(loss='mse', optimizer=optimizer)
return base_model
and then create the model, print the summary and use the fit_generator to train
history = model.fit_generator(batch_generator(X_train, y_train, 100, 1),
steps_per_epoch=300,
epochs=10,
validation_data=batch_generator(X_valid, y_valid, 100, 0),
validation_steps=200,
verbose=1,
shuffle = 1)
I get an error though that says
ValueError: Error when checking target: expected fc1000 to have shape (1000,) but got array with shape (1,)
Looking at the model summary, this makes sense, since the final Dense layer has an output shape of (None, 1000)
fc1000 (Dense) (None, 1000) 2049000 avg_pool[0][0]
But I can't figure out how to modify the model. I've read through the Keras documentation and looked at several examples, but pretty much everything I see is for a classification model.
How can I modify the model so it is formatted properly for regression?
Your code is throwing the error because you're using the original fully-connected top layer that was trained to classify images into one of 1000 classes. To make the network working, you need to replace this top layer with your own which should have the shape compatible with your dataset and task.
Here is a small snippet I was using to create an ImageNet pre-trained model for the regression task (face landmarks prediction) with Keras:
NUM_OF_LANDMARKS = 136
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
}[top]
base = InceptionResNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
In your case, I guess you only need to replace InceptionResNetV2 with ResNet50. Essentially, you are creating a pre-trained model without top layers:
base = ResNet50(input_shape=input_shape, include_top=False)
And then attaching your custom layer on top of it:
x = Flatten()(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='sigmoid')(x)
model = Model(inputs=base.inputs, outputs=x)
That's it.
You also can check this link from the Keras repository that shows how ResNet50 is constructed internally. I believe it will give you some insights about the functional API and layers replacement.
Also, I would say that both regression and classification tasks are not that different if we're talking about fine-tuning pre-trained ImageNet models. The type of task mostly depends on your loss function and the top layer's activation function. Otherwise, you still have a fully-connected layer with N outputs but they are interpreted in a different way.

Retraining the Inception V3 Model for Machine Learning

I'm doing image classification with two classes using the Inception V3 model. Since I'm using two new classes(Normal and Abnormal) I'm freezing the top layers of the Inception V3 model and replacing it with my own.
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
#Classifier Model ontop of Convolutional Model
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(400,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
Is freezing the convolutional layers this way in Inception V3 necessary for training?
#freeze the convolutional layers of InceptionV3
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])
No, it is not necessary to freeze the first layers of a CNN; you can just initialize the weights from a pre-trained model. However, in most cases it is recommended to freeze them as the features they can extract are generic enough to help in any image-related task and doing so can speed up the training process.
That being said, you should experiment a bit with the number of layers you want to freeze. Allowing the latter layers of your base_model to fine-tune on your task could improve performance. You can think of it like a hyper-parameter of your model. Say you want to freeze only the first 30 layers:
for layer in model.layers[:30]:
layer.trainable = False

how to obtain the runtime batch size of a Keras model

Based on this post. I need some basic implementation help. Below you see my model using a Dropout layer. When using the noise_shape parameter, it happens that the last batch does not fit into the batch size creating an error (see other post).
Original model:
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Masking(mask_value=MaskWert, input_shape=(X_train.shape[1],X_train.shape[2]) ))
model.add(Dropout(dropout, noise_shape=(batchsize, 1, X_train.shape[2]) ))
model.add(Dense(hidden_units, activation='sigmoid', kernel_constraint=max_norm(max_value=4.) ))
model.add(LSTM(hidden_units, return_sequences=True, dropout=dropout, recurrent_dropout=dropout))
Now Alexandre Passos suggested to get the runtime batchsize with tf.shape. I tried to implement the runtime batchsize idea it into Keras in different ways but never working.
import Keras.backend as K
def backend_shape(x):
return K.shape(x)
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
batchsize=backend_shape(X_train)
model = Sequential()
...
model.add(Dropout(dropout, noise_shape=(batchsize[0], 1, X_train.shape[2]) ))
...
But that did just give me the input tensor shape but not the runtime input tensor shape.
I also tried to use a Lambda Layer
def output_of_lambda(input_shape):
return (input_shape)
def LSTM_model_2(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Lambda(output_of_lambda, outputshape=output_of_lambda))
...
model.add(Dropout(dropout, noise_shape=(outputshape[0], 1, X_train.shape[2]) ))
And different variants. But as you already guessed, that did not work at all.
Is the model definition actually the correct place?
Could you give me a tip or better just tell me how to obtain the running batch size of a Keras model? Thanks so much.
The current implementation does adjust the according to the runtime batch size. From the Dropout layer implementation code:
symbolic_shape = K.shape(inputs)
noise_shape = [symbolic_shape[axis] if shape is None else shape
for axis, shape in enumerate(self.noise_shape)]
So if you give noise_shape=(None, 1, features) the shape will be (runtime_batchsize, 1, features) following the code above.

Resources