How to use a trained network as a branch in another network keras? - python-3.x

Assume i have network simliar to this:
[ICNet_Keras] (https://github.com/aitorzip/Keras-ICNet/blob/master/model.py)
train procedure in this repo is wrong!
it has three branches.
resolution 1/4 branch is a pretrained network, with saved weights.
resolution 1/2 branch is part of 1/4 network, and weight-shared with 1/4 (i don't know how)
resolution 1 branch is my customization.
train procedure is something like this:
1/4 is trained on Cityscapes (for relaxation) saved and reloaded
the first few layers of 1/4 branch are used for feeding resolution 1/2 images
The last branch is for full resolution image.
these branches are related with CFF (Cascade Feature Fusion module).
how can I load 1/4 pretrained weight and train whole network?
how about weight sharing between some layers in 1/4 and 1/2 branch?
for simplicity you can assume
1/4 has 5 layers trained separately and loaded for finetuning
1/2 has 2 first layers of 1/4
1 has 2 independent layer
and CFFs are just upsample+concat

Have your input tensor:
inputs = Input(size)
If you trained the model yourself, make sure you train it with a variable image size (it's convolutional, right?): input shape = (None, None, channels).
If not, you will need to rebuild the model with variable image size. Make sure you don't use Flatten, it will not support variable image sizes. It will not support weight transfer if you want to use what is after the flatten.
1/4
Load your saved model (no need to compile, you are not training it directly):
lowRes = load_model(filename, compile=False, custom_objects=if_needed)
Pass the inputs through it (maybe do some rescaling first)
lowOut = lowRes(inputs)
1/2
Get the segment from lowRes:
midRes = Model(lowRes.input, lowRes.layers[1].output)
Pass the inputs through it (maybe do some rescaling first)
midOut = midRes(inputs)
1/1
Build whatever it is:
....
....
hiRes = Model(....)
Pass the inputs through it:
hiOut = hiRes(inputs)
Old answer
Layers and models can be used more than once, as many times as you need.
Shared layer:
Create the layer:
layer = Conv2D(....)
Use the layer:
out1 = layer(input1)
out2 = layer(input2)
out3 = layer(input3)
It's the same layer, so, the same weights.
Shared model:
A Model is a Layer, so it works exactly the same:
model = load_some_model()
branch1_out = model(input_branch1)
branch2_out = model(input_branch2)
Final model:
At the end, just create a model defining the input tensors and output tensors:
final_model = Model(inputs = input_or_list_of_inputs,
outputs= output_or_list_of_outputs)

Related

How to fine tune InceptionV3 in Keras

I am trying to train a classifier based on the InceptionV3 architecture in Keras.
For this I loaded the pre-trained InceptionV3 model, without top, and added a final fully connected layer for the classes of my classification problem. In the first training I froze the InceptionV3 base model and only trained the final fully connected layer.
In the second step I want to "fine tune" the network by unfreezing a part of the InceptionV3 model.
Now I know that the InceptionV3 model makes extensive use of BatchNorm layers. It is recommended (link to documentation), when BatchNorm layers are "unfrozen" for fine tuning when transfer learning, to keep the mean and variances as computed by the BatchNorm layers fixed. This should be done by setting the BatchNorm layers to inference mode instead of training mode.
Please also see: What's the difference between the training argument in call() and the trainable attribute?
Now my main question is: how to set ONLY the BatchNorm layers of the InceptionV3 model to inference mode?
Currently I set the whole InceptionV3 base model to inference mode by setting the "training" argument when assembling the network:
inputs = keras.Input(shape=input_shape)
# Scale the 0-255 RGB values to 0.0-1.0 RGB values
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
# Set include_top to False so that the final fully connected (with pre-loaded weights) layer is not included.
# We will add our own fully connected layer for our own set of classes to the network.
base_model = keras.applications.InceptionV3(input_shape=input_shape, weights='imagenet', include_top=False)
x = base_model(x, training=False)
# Classification block
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=x)
What I don't like about this, is that in this way I set the whole model to inference mode which may set some layers to inference mode which should not be.
Here is the part of the code that loads the weights from the initial training that I did and the code that freezes the first 150 layers and unfreezes the remaining layers of the InceptionV3 part:
model.load_weights(load_model_weight_file_name)
for layer in base_model.layers[: 150]:
layer.trainable = False
for layer in base_model.layers[ 150:]:
layer.trainable = True
The rest of my code (not shown here) are the usual compile and fit calls.
Running this code seems to result a network that doesn't really learn (loss and accuracy remain approximately the same). I tried different orders of magnitude for the optimization step size, but that doesn't seem to help.
Another thing that I observed it that when I make the whole InceptionV3 part trainable
base_model.trainable = True
that the training starts with an accuracy server orders of magnitude smaller than were my first training round finished (and of course a much higher loss). Can someone explain this to me? I would at least expect the training to continue were it left off in terms of accuracy and loss.
You could do something like:
for layer in base_model.layers:
if isinstance(layer ,tf.keras.layers.BatchNormalization):
layer.trainable=False
This will iterate over each layer and check the type, setting to inference mode if the layer is BatchNorm.
As for the low starting accuracy during transfer learning, you're only loading the weights and not the optimiser state (as would occur with a full model.load() which loads architecture, weights, optimiser state etc).
This doesn't mean there's an error, but if you must load weights only just let it train, the optimiser will configure eventually and you should see progress. Also as you're potentially over-writing the pre-trained weights in your second run, make sure you use a lower learning rate so the updates are small in comparison i.e. fine-tune the weights rather than blast them to pieces.

What do units and layers do in neural network?

model = keras.Sequential([
# the hidden ReLU layers
layers.Dense(units=4, activation='relu', input_shape=[2]),
layers.Dense(units=3, activation='relu'),
# the linear output layer
layers.Dense(units=1),
])
The above is a Keras sequential model example from Kaggle. I'm having a problem understanding these two things.
Are the units the number of nodes in a hidden layer? I see some people put 250 or what ever. What does the number do when it gets changed higher or lower?
Why would another hidden layer need to be added? What does it actually do the data to add more and more layers?
Answers in brief
units is representing how many neurons in a particular layer.When you have higher number,model has higher parameters to update during learning.Same thing goes to layers as well.(more layers and more neurons take more time to train the model).selecting how many neurons is depend on the use case and dataset and model architecture.
When you have more hidden layers, you have more parameters to update.More parameters and layers meaning model is able to understand complex relationships hidden in the data. For example when you have a image classification(multiple), you need more deep layers with neurons to understand the features in the image, which use to classify in final layer.
play with tensorflow playground,it will give great idea when you change the layers and neurons.

How add new class in saved keras sequential model

I have 10 class dataset with this I got 85% accuracy, got the same accuracy on a saved model.
now I want to add a new class, how to add a new class To the saved model.
I tried by deleting the last layer and train but model get overfit and in prediction every Images show same result (newly added class).
This is what I did
model.pop()
base_model_layers = model.output
pred = Dense(11, activation='softmax')(base_model_layers)
model = Model(inputs=model.input, outputs=pred)
# compile and fit step
I have trained model with 10 class I want to load the model train with class 11 data and give predictions.
Using the model.pop() method and then the Keras Model() API will lead you to an error. The Model() API does not have the .pop() method, so if you want to re-train your model more than once you will have this error.
But the error only occurs if you, after the re-training, save the model and use the new saved model in the next re-training.
Another very wrong and used approach is to use the model.layers.pop(). This time the problem is that function only removes the last layer in the copy it returns. So, the model still has the layer, and just the method's return does not have the layer.
I recommend the following solution:
Admitting you have your already trained model saved in the model variable, something like:
model = load_my_trained_model_function()
# creating a new model
model_2 = Sequential()
# getting all the layers except the output one
for layer in model.layers[:-1]: # just exclude last layer from copying
model_2.add(layer)
# prevent the already trained layers from being trained again
# (you can use layers[:-n] to only freeze the model layers until the nth layer)
for layer in model_2.layers:
layer.trainable = False
# adding the new output layer, the name parameter is important
# otherwise, you will add a Dense_1 named layer, that normally already exists, leading to an error
model_2.add(Dense(num_neurons_you_want, name='new_Dense', activation='softmax'))
Now you should specify the compile and fit methods to train your model and it's done:
model_2.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# model.fit trains the model
model_history = model_2.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_split=0.1)
EDIT:
Note that by adding a new output layer we do not have the weights and biases adjusted in the last training.
Thereby we lost pretty much everything from the previous training.
We need to save the weights and biases of the output layer of the previous training, and then we must add them to the new output layer.
We also must think if we should let all the layers train or not, or even if we should allow the training of only some intercalated layers.
To get the weights and biases from the output layer using Keras we can use the following method:
# weights_training[0] = layer weights
# weights_training[1] = layer biases
weights_training = model.layers[-1].get_weights()
Now you should specify the weights for the new output layer. You can use, for example, the mean of the weights for the weights of the new classes. It's up to you.
To set the weights and biases of the new output layer using Keras we can use the following method:
model_2.layers[-1].set_weights(weights_re_training)
model.pop()
base_model_layers = model.output
pred = Dense(11, activation='softmax')(base_model_layers)
model = Model(inputs=model.input, outputs=pred)
Freeze the first layers, before train it
for layer in model.layers[:-2]:
layer.trainable = False
I am assuming that the problem is singlelabel-multiclass classification i.e. a sample will belong to only 1 of the 11 classes.
This answer will be completely based on implementing the way humans learn into machines. Hence, this will not provide you with a proper code of how to do that but it will tell you what to do and you will be able to easily implement it in keras.
How does a human child learn when you teach him new things? At first, we ask him to forget the old and learn the new. This does not actually mean that the old learning is useless but it means that for the time while he is learning the new, the old knowledge should not interfere as it will confuse the brain. So, the child will only learn the new for sometime.
But the problem here is, things are related. Suppose, the child learned C programming language and then learned compilers. There is a relation between compilers and programming language. The child cannot master computer science if he learns these subjects separately, right? At this point we introduce the term 'intelligence'.
The kid who understands that there is a relation between the things he learned before and the things he learned now is 'intelligent'. And the kid who finds the actual relation between the two things is 'smart'. (Going deep into this is off-topic)
What I am trying to say is:
Make the model learn the new class separately.
And then, make the model find a relation between the previously learned classes and the new class.
To do this, you need to train two different models:
The model which learns to classify on the new class: this model will be a binary classifier. It predicts a 1 if the sample belongs to class 11 and 0 if it doesn't. Now, you already have the training data for samples belonging to class 11 but you might not have data for the samples which doesn't belong to class 11. For this, you can randomly select samples which belong to classes 1 to 10. But note that the ratio of samples belonging to class 11 to that not belonging to class 11 must be 1:1 in order to train the model properly. That means, 50% of the samples must belong to class 11.
Now, you have two separate models: the one which predicts class 1-10 and one which predicts class 11. Now, concatenate the outputs of (the 2nd last layers) these two models with a newly created Dense layer with 11 nodes and let the whole model retrain itself adjusting the weights of pretrained two models and learning new weights of the dense layer. Keep the learning rate low.
The final model is the third model which is a combination of two models (without last Dense layer) + a new Dense layer.
Thank you..

Loading image with different input size than training in Keras

I am working on a CNN that deals with super-resolution. It is required that I extract patches from the image, then train on these small patches (ie.41x41).
However, when it comes to predicting the image, the image is of a larger size than the patches. But Keras doesn't allow me to predict an image of larger size than the training images.
I have read Can Keras deal with input images with different size?. I have tried the way by putting None in my network input shape and then loading the weights. However, when it comes to this line: c1 = PReLU()(c1), I get the error: nt() argument must be a string, a bytes-like object or a number, not 'NoneType'. The code is attched below.
How can I fix this problem? I am using Keras with tensorflow backend. I have no fully connected layers, all are Conv2D with relu, except for the snippet below, is PReLU for c1.
Thanks.
input_shape = (None,None,1)
x = Input(shape = input_shape)
c1 = Convolution2D(64, (3,3), init = 'he_normal', padding='same', name='Conv1')(x)
c1 = PReLU()(c1)
#............................
output_img = keras.layers.add([x, finalconv])
model = Model(x, output_img)
Keras doesn't allow me to predict an image of larger size than the
training images
This is wrong, and keras allows you to do so when your network is designed properly.
However, when it comes to this line: c1 = PReLU()(c1), I get the
error: nt() argument must be a string, a bytes-like object or a
number, not 'NoneType'.
This error is expected because your input shape contains None. Actually, if you previously set shared_axes=[1,2] for PReLU (default value shared_axes=None), you will not see this error.
Therefore, the real issue here is that PReLU's parameters, previously set only for an 41x41 input, but now are asked to work for an arbitrary input size.
The best solution is to train a new model with input shape = (None,None,3) directly.
If you don't care about the possible degradation, you can load all layer weights of your pretrained model except for the PReLU layer. Then manually compute appropriate PReLU parameters can be shared across shared_axes =[1,2], and use it as the new PReLU parameters.

Tensorflow Keras Copy Weights From One Model to Another

Using Keras from Tensorflow 1.4.1, how does one copy weights from one model to another?
As some background, I'm trying to implement a deep-q network (DQN) for Atari games following the DQN publication by DeepMind. My understanding is that the implementation uses two networks, Q and Q'. The weights of Q are trained using gradient descent, and then the weights are copied periodically to Q'.
Here's how I build Q and Q':
ACT_SIZE = 4
LEARN_RATE = 0.0025
OBS_SIZE = 128
def buildModel():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Lambda(lambda x: x / 255.0, input_shape=OBS_SIZE))
model.add(tf.keras.layers.Dense(128, activation="relu"))
model.add(tf.keras.layers.Dense(128, activation="relu"))
model.add(tf.keras.layers.Dense(ACT_SIZE, activation="linear"))
opt = tf.keras.optimizers.RMSprop(lr=LEARN_RATE)
model.compile(loss="mean_squared_error", optimizer=opt)
return model
I call that twice to get Q and Q'.
I have an updateTargetModel method below that is my attempt at copying weights. The code runs fine, but my overall DQN implementation is failing. I'm really just trying to verify if this is a valid way of copying weights from one network to another.
def updateTargetModel(model, targetModel):
modelWeights = model.trainable_weights
targetModelWeights = targetModel.trainable_weights
for i in range(len(targetModelWeights)):
targetModelWeights[i].assign(modelWeights[i])
There's another question here that discusses saving and loading weights to and from disk (Tensorflow Copy Weights Issue), but there's no accepted answer. There is also a question about loading weights from individual layers (Copying weights from one Conv2D layer to another), but I'm wanting to copy the entire model's weights.
Actually what you've done is much more than simply copying weights. You made these two models identical all the time. Every time you update one model - the second one is also updated - as both models have the same weights variables.
If you want to just copy weights - the simplest way is by this command:
target_model.set_weights(model.get_weights())

Resources