Will pytorch model.train() influence freezed layers? - pytorch

I saw some examples about freezing some layers of pytorch model, something like below.
for i, param in enumerate(Model.parameters()):
if i <3:
param.requires_grad = False
else:
param.requires_grad = True
But I am wondering when I put the mode into train model, i.e., using model.train(), will this function enables the requires_grad=True for all layers?
How can I guarantee the first two layers keeps frozen?

Related

How does SimCSE do dropout twice indepently

How can I pass an input sentence into bert with dropout twice independently?
here is what i try so far, the outputs are identical.
bert = AutoModel.from_pretrained('bert-base-cased')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
sent_dict = tokenizer('Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel', return_tensors='pt')
bert(**sent_dict).pooler_output == bert(**sent_dict).pooler_output
I forgot model.train() :(
Dropout only works in training mode

Modify ResNet50 output layer for regression

I am trying to create a ResNet50 model for a regression problem, with an output value ranging from -1 to 1.
I omitted the classes argument, and in my preprocessing step I resize my images to 224,224,3.
I try to create the model with
def create_resnet(load_pretrained=False):
if load_pretrained:
weights = 'imagenet'
else:
weights = None
# Get base model
base_model = ResNet50(weights=weights)
optimizer = Adam(lr=1e-3)
base_model.compile(loss='mse', optimizer=optimizer)
return base_model
and then create the model, print the summary and use the fit_generator to train
history = model.fit_generator(batch_generator(X_train, y_train, 100, 1),
steps_per_epoch=300,
epochs=10,
validation_data=batch_generator(X_valid, y_valid, 100, 0),
validation_steps=200,
verbose=1,
shuffle = 1)
I get an error though that says
ValueError: Error when checking target: expected fc1000 to have shape (1000,) but got array with shape (1,)
Looking at the model summary, this makes sense, since the final Dense layer has an output shape of (None, 1000)
fc1000 (Dense) (None, 1000) 2049000 avg_pool[0][0]
But I can't figure out how to modify the model. I've read through the Keras documentation and looked at several examples, but pretty much everything I see is for a classification model.
How can I modify the model so it is formatted properly for regression?
Your code is throwing the error because you're using the original fully-connected top layer that was trained to classify images into one of 1000 classes. To make the network working, you need to replace this top layer with your own which should have the shape compatible with your dataset and task.
Here is a small snippet I was using to create an ImageNet pre-trained model for the regression task (face landmarks prediction) with Keras:
NUM_OF_LANDMARKS = 136
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
}[top]
base = InceptionResNetV2(input_shape=input_shape,
include_top=False,
weights='imagenet')
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
In your case, I guess you only need to replace InceptionResNetV2 with ResNet50. Essentially, you are creating a pre-trained model without top layers:
base = ResNet50(input_shape=input_shape, include_top=False)
And then attaching your custom layer on top of it:
x = Flatten()(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='sigmoid')(x)
model = Model(inputs=base.inputs, outputs=x)
That's it.
You also can check this link from the Keras repository that shows how ResNet50 is constructed internally. I believe it will give you some insights about the functional API and layers replacement.
Also, I would say that both regression and classification tasks are not that different if we're talking about fine-tuning pre-trained ImageNet models. The type of task mostly depends on your loss function and the top layer's activation function. Otherwise, you still have a fully-connected layer with N outputs but they are interpreted in a different way.

Retraining the Inception V3 Model for Machine Learning

I'm doing image classification with two classes using the Inception V3 model. Since I'm using two new classes(Normal and Abnormal) I'm freezing the top layers of the Inception V3 model and replacing it with my own.
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
#Classifier Model ontop of Convolutional Model
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(400,activation='relu'))
model_top.add(keras.layers.Dropout(0.5))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
Is freezing the convolutional layers this way in Inception V3 necessary for training?
#freeze the convolutional layers of InceptionV3
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00002,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])
No, it is not necessary to freeze the first layers of a CNN; you can just initialize the weights from a pre-trained model. However, in most cases it is recommended to freeze them as the features they can extract are generic enough to help in any image-related task and doing so can speed up the training process.
That being said, you should experiment a bit with the number of layers you want to freeze. Allowing the latter layers of your base_model to fine-tune on your task could improve performance. You can think of it like a hyper-parameter of your model. Say you want to freeze only the first 30 layers:
for layer in model.layers[:30]:
layer.trainable = False

shouldn't model.trainable=False freeze weights under the model?

I am trying to freeze the free trained VGG16's layers ('conv_base' below) and add new layers on top of them for feature extracting.
I expect to get same prediction results from 'conv_base' before(ret1) / after(ret2) fit of model but it is not.
Is this wrong way to check weight freezing?
loading VGG16 and set to untrainable
conv_base = applications.VGG16(weights='imagenet', include_top=False, input_shape=[150, 150, 3]) 
conv_base.trainable = False
result before model fit
ret1 = conv_base.predict(np.ones([1, 150, 150, 3]))
add layers on top of the VGG16 and compile a model
model = models.Sequential()
model .add(conv_base)
model .add(layers.Flatten())
model .add(layers.Dense(10, activation='relu'))
model .add(layers.Dense(1, activation='sigmoid'))
m.compile('rmsprop', 'binary_crossentropy', ['accuracy'])
fit the model
m.fit_generator(train_generator, 100, validation_data=validation_generator, validation_steps=50)
result after model fit
ret2 = conv_base.predict(np.ones([1, 150, 150, 3]))
hope this is True but it is not.
np.equal(ret1, ret2)
This is an interesting case. Why something like this happen is caused by the following thing:
You cannot freeze a whole model after compilation and it's not freezed if it's not compiled
If you set a flag model.trainable=False then while compiling keras sets all layers to be not trainable. If you set this flag after compilation - then it will not affect your model at all. The same - if you set this flag before compiling and then you'll reuse a part of a model for compiling another one - it will not affect your reused layers. So model.trainable=False works only when you'll apply it in a following order:
# model definition
model.trainable = False
model.compile()
In any other scenario it wouldn't work as expected.
You must freeze layers individually (before compilation):
for l in conv_base.layers:
l.trainable=False
And if this doesn't work, you should probably use the new sequential model to freeze the layers.
If you have models in models you should do this recursively:
def freezeLayer(layer):
layer.trainable = False
if hasattr(layer, 'layers'):
for l in layer.layers:
freezeLayer(l)
freezeLayer(model)
The top-rated answer does not work. As suggested by Keras official documentation (https://keras.io/getting-started/faq/), it should be performed per layer. Although there is a parameter "trainable" for a model, it is probably not implemented yet. The safest way is to do as follows:
for layer in model.layers:
layer.trainable = False
model.compile()

what exactly does 'tf.contrib.rnn.DropoutWrapper'' in tensorflow do? ( three citical questions)

As I know, DropoutWrapper is used as follows
__init__(
cell,
input_keep_prob=1.0,
output_keep_prob=1.0,
state_keep_prob=1.0,
variational_recurrent=False,
input_size=None,
dtype=None,
seed=None
)
.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
the only thing I know is that it is use for dropout while training.
Here are my three questions
What are input_keep_prob,output_keep_prob and state_keep_prob respectively?
(I guess they define dropout probability of each part of RNN, but exactly
where?)
Is dropout in this context applied to RNN not only when training but also prediction process? If it's true, is there any way to decide whether I do or don't use dropout at prediction process?
As API documents in tensorflow web page, if variational_recurrent=True dropout works according to the method on a paper
"Y. Gal, Z Ghahramani. "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks". https://arxiv.org/abs/1512.05287 " I understood this paper roughly. When I train RNN, I use 'batch' not single time-series. In this case, tensorflow automatically assign different dropout mask to different time-series in a batch?
input_keep_prob is for the dropout level (inclusion probability) added when fitting feature weights. output_keep_prob is for the dropout level added for each RNN unit output. state_keep_prob is for the hidden state that is fed to the next layer.
You can initialize each of the above mentioned parameters as follows:
import tensorflow as tf
dropout_placeholder = tf.placeholder_with_default(tf.cast(1.0, tf.float32))
tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicRNNCell(n_hidden_rnn),
input_keep_prob = dropout_placeholder, output_keep_prob = dropout_placeholder,
state_keep_prob = dropout_placeholder)
The default dropout level will be 1 during prediction or anything else that we can feed during training.
The masking is done for the fitted weights rather than for the sequences that are included in the batch. As far as I know, it's done for the entire batch.
keep_prob = tf.cond(dropOut,lambda:tf.constant(0.9), lambda:tf.constant(1.0))
cells = rnn.DropoutWrapper(cells, output_keep_prob=keep_prob)

Resources