fine-tune the model from TensorFlow Hub - tensorflow-hub

I have used this code from the TensorFlow website:
https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub
I have to implement the entire code without any problem, but I am wondering how can I Fine-Tune it since I cannot access the:
base_model.trainable or base_model.layers

You could try setting trainable=True in
feature_extractor_layer = hub.KerasLayer(
feature_extractor_model,
input_shape=(224, 224, 3),
trainable=False)
Though it is better to keep the base layers frozen at least during the initial rounds of training.

Related

How to fine tune InceptionV3 in Keras

I am trying to train a classifier based on the InceptionV3 architecture in Keras.
For this I loaded the pre-trained InceptionV3 model, without top, and added a final fully connected layer for the classes of my classification problem. In the first training I froze the InceptionV3 base model and only trained the final fully connected layer.
In the second step I want to "fine tune" the network by unfreezing a part of the InceptionV3 model.
Now I know that the InceptionV3 model makes extensive use of BatchNorm layers. It is recommended (link to documentation), when BatchNorm layers are "unfrozen" for fine tuning when transfer learning, to keep the mean and variances as computed by the BatchNorm layers fixed. This should be done by setting the BatchNorm layers to inference mode instead of training mode.
Please also see: What's the difference between the training argument in call() and the trainable attribute?
Now my main question is: how to set ONLY the BatchNorm layers of the InceptionV3 model to inference mode?
Currently I set the whole InceptionV3 base model to inference mode by setting the "training" argument when assembling the network:
inputs = keras.Input(shape=input_shape)
# Scale the 0-255 RGB values to 0.0-1.0 RGB values
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
# Set include_top to False so that the final fully connected (with pre-loaded weights) layer is not included.
# We will add our own fully connected layer for our own set of classes to the network.
base_model = keras.applications.InceptionV3(input_shape=input_shape, weights='imagenet', include_top=False)
x = base_model(x, training=False)
# Classification block
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=x)
What I don't like about this, is that in this way I set the whole model to inference mode which may set some layers to inference mode which should not be.
Here is the part of the code that loads the weights from the initial training that I did and the code that freezes the first 150 layers and unfreezes the remaining layers of the InceptionV3 part:
model.load_weights(load_model_weight_file_name)
for layer in base_model.layers[: 150]:
layer.trainable = False
for layer in base_model.layers[ 150:]:
layer.trainable = True
The rest of my code (not shown here) are the usual compile and fit calls.
Running this code seems to result a network that doesn't really learn (loss and accuracy remain approximately the same). I tried different orders of magnitude for the optimization step size, but that doesn't seem to help.
Another thing that I observed it that when I make the whole InceptionV3 part trainable
base_model.trainable = True
that the training starts with an accuracy server orders of magnitude smaller than were my first training round finished (and of course a much higher loss). Can someone explain this to me? I would at least expect the training to continue were it left off in terms of accuracy and loss.
You could do something like:
for layer in base_model.layers:
if isinstance(layer ,tf.keras.layers.BatchNormalization):
layer.trainable=False
This will iterate over each layer and check the type, setting to inference mode if the layer is BatchNorm.
As for the low starting accuracy during transfer learning, you're only loading the weights and not the optimiser state (as would occur with a full model.load() which loads architecture, weights, optimiser state etc).
This doesn't mean there's an error, but if you must load weights only just let it train, the optimiser will configure eventually and you should see progress. Also as you're potentially over-writing the pre-trained weights in your second run, make sure you use a lower learning rate so the updates are small in comparison i.e. fine-tune the weights rather than blast them to pieces.

ResNet50 and VGG16 for data with 2 channels

Is there any way that I can try to modify ResNet50 and VGG16 where my data(spectograms) is of the shape (64,256,2)?
I understand that I can take some layers out and modify them(output, dense) but I am not really sure for input channels.
Can anyone suggest a way to accommodate 2 channels in the models? Help is much appreciated!
You can use a different number of channels in the input (and a different height and width), but in this case, you cannot use the pretrained imagenet weights. You have to train from scratch. You can create them as follows:
from tensorflow import keras # or just import keras
vggnet = keras.applications.vgg16.VGG16(input_shape=(64,256,2), include_top=False, weights=None)
Note the weights=None argument. It means initialize weights randomly. If you have number of channels set to 3, you could use weights='imagenet', but in your case, you have 2 channels, so it won't work and you have to set it to None. The include_top=False is there for you to add final classification layers with different categories yourself. You could also create vgg19.VGG19 in the same way. For ResNet, you could similarly create it as follows:
resnet = keras.applications.resnet50.ResNet50(input_shape=(64, 256, 2), weights=None, include_top=False)
For other models and versions of vgg and resnet, please check here.

How to use a NN in a keras generator?

I am setting up a fit_generator to train a DNN by keras. But don't know how to use a CNN inside this generator.
Basically, I have a pre-trained image generator using fully-connected convolutional networks (we can named it as GEN-NET). Now I want to used this Fully-CNN in my fit_generator to generate unlimited number of images to train another classifier (called CLASS-NET) in keras. But it always crashed my training and the error message is:
ValueError: Tensor Tensor("decoder/transform_output/mul:0", shape=(?, 128, 128, 1), dtype=float32) is not an element of this graph.
This "decoder/transform_output/mul:0" is the output of my CNN GEN-NET.
So my question is that can I use CNN based GEN-NET in my fit_generator to train GLASS-NET or it is not permitted in keras?
Keras does not really like running two separate models in a single session. You could use K.clear_session() after using the model but this would produce a lot of overhead!
Best way to do this, IMHO, is by pre-generating these images and then loading them using a generator. Basically splitting your program into two separate programs.
Otherwise, if you are using tensorflow as back-end there might be a way to do it by switching the default graph on the tf.Session, you could Google that but I would not recommend it! :)
Seems like you might have things a bit mixed up! The CNN (convolutional neural network) needs to be trained to your data, unless you're using a pretrained network for predictions. If you're going to train the CNN, you can do that with either the fit() or the fit_generator() function. Use fit() if you're feeding data directly, and use fit_generator() if your data is handled by Image Data Generators. If you've loaded a pre-trained model/weights only to make predictions, you don't need to use any fit function, since no training needs to be done.

Using the tf.Dataset API with Keras

I am trying to use the Dataset API along with Keras and I am trying to use the third option in the action plan mentioned here. Also I assumed that the third option was already done when seeing the second comment by #fchollet here.
But then when I tried to implement it, I got the following error:
When feeding symbolic tensors to a model, we expect the tensors to have
a static batch size. Got tensor with shape: (None, 32, 64, 64, 3)
I used the following strategy to fit the model:
training_filenames = [.....]
dataset = tf.data.TFRecordDataset(training_filenames)
dataset = dataset.map(_parse_function_all) # Parse the record into tensors.
dataset = dataset.batch(20)
iterator = dataset.make_initializable_iterator()
videos, labels= iterator.get_next()
model = create_base_network(input_shape = ( 32, 64, 64 3))
# output dimension will be (None, 10) for the model above
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(videos, labels, , epochs=10, steps_per_epoch=1000)
I can solve the problem by using fit_generator. I found the solution here I applied #Dat-Nguyen's solution. But then I wasn't able to access the validation dataset within the custom callback in order to compute the AUC metric for example. So I need to use fit instead of fit_generator, but first need to git rid of this error.
So can anyone tell me why I got this error? Is the third step of fitting the model working now in Keras or does it still have issues?
so I figured out how to use keras with tf.DatasetAPI but without validation data. You can check out my question here Keras model.fit() with tf.dataset API iterator initializers
I think I discovered the problem. I am using standalone Keras, not the one imported from Tensorflow. The new feature of feeding the iterator directly to model.fit() is valid only when you are using tf.Keras, not standalone Keras.

Tensorflow Keras Copy Weights From One Model to Another

Using Keras from Tensorflow 1.4.1, how does one copy weights from one model to another?
As some background, I'm trying to implement a deep-q network (DQN) for Atari games following the DQN publication by DeepMind. My understanding is that the implementation uses two networks, Q and Q'. The weights of Q are trained using gradient descent, and then the weights are copied periodically to Q'.
Here's how I build Q and Q':
ACT_SIZE = 4
LEARN_RATE = 0.0025
OBS_SIZE = 128
def buildModel():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Lambda(lambda x: x / 255.0, input_shape=OBS_SIZE))
model.add(tf.keras.layers.Dense(128, activation="relu"))
model.add(tf.keras.layers.Dense(128, activation="relu"))
model.add(tf.keras.layers.Dense(ACT_SIZE, activation="linear"))
opt = tf.keras.optimizers.RMSprop(lr=LEARN_RATE)
model.compile(loss="mean_squared_error", optimizer=opt)
return model
I call that twice to get Q and Q'.
I have an updateTargetModel method below that is my attempt at copying weights. The code runs fine, but my overall DQN implementation is failing. I'm really just trying to verify if this is a valid way of copying weights from one network to another.
def updateTargetModel(model, targetModel):
modelWeights = model.trainable_weights
targetModelWeights = targetModel.trainable_weights
for i in range(len(targetModelWeights)):
targetModelWeights[i].assign(modelWeights[i])
There's another question here that discusses saving and loading weights to and from disk (Tensorflow Copy Weights Issue), but there's no accepted answer. There is also a question about loading weights from individual layers (Copying weights from one Conv2D layer to another), but I'm wanting to copy the entire model's weights.
Actually what you've done is much more than simply copying weights. You made these two models identical all the time. Every time you update one model - the second one is also updated - as both models have the same weights variables.
If you want to just copy weights - the simplest way is by this command:
target_model.set_weights(model.get_weights())

Resources