Memory issue when retraining dense layers of VGG 16 - keras

I wanted to retrain the fully connected layers of VGG 16 for big gray level images (1800x1800), using Keras with Then backend.
So I've:
created a new VGG with a single color channel and loaded the weights from of the original VGG.
add trainable=False to all the convolution layers (the pooling and padding are not trainable by definition)
delete the two first dense layers to keep only the output layer with two neurons
increase drastically the max pooling dimensions and strides because I work with inputs 1800x1800 (no choice). The dimensions drop very quickly to match the original VGG dimensions.
reduce the batch size in order to reduce the memory required.
But when I start the training, I face a CNMEM_STATUS_OUT_OF_MEMORY error. I use NVIDIA K40, so I have 12Go of memory.
Any idea how to fix it?

Related

How to fine tune InceptionV3 in Keras

I am trying to train a classifier based on the InceptionV3 architecture in Keras.
For this I loaded the pre-trained InceptionV3 model, without top, and added a final fully connected layer for the classes of my classification problem. In the first training I froze the InceptionV3 base model and only trained the final fully connected layer.
In the second step I want to "fine tune" the network by unfreezing a part of the InceptionV3 model.
Now I know that the InceptionV3 model makes extensive use of BatchNorm layers. It is recommended (link to documentation), when BatchNorm layers are "unfrozen" for fine tuning when transfer learning, to keep the mean and variances as computed by the BatchNorm layers fixed. This should be done by setting the BatchNorm layers to inference mode instead of training mode.
Please also see: What's the difference between the training argument in call() and the trainable attribute?
Now my main question is: how to set ONLY the BatchNorm layers of the InceptionV3 model to inference mode?
Currently I set the whole InceptionV3 base model to inference mode by setting the "training" argument when assembling the network:
inputs = keras.Input(shape=input_shape)
# Scale the 0-255 RGB values to 0.0-1.0 RGB values
x = layers.experimental.preprocessing.Rescaling(1./255)(inputs)
# Set include_top to False so that the final fully connected (with pre-loaded weights) layer is not included.
# We will add our own fully connected layer for our own set of classes to the network.
base_model = keras.applications.InceptionV3(input_shape=input_shape, weights='imagenet', include_top=False)
x = base_model(x, training=False)
# Classification block
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(num_classes, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=x)
What I don't like about this, is that in this way I set the whole model to inference mode which may set some layers to inference mode which should not be.
Here is the part of the code that loads the weights from the initial training that I did and the code that freezes the first 150 layers and unfreezes the remaining layers of the InceptionV3 part:
model.load_weights(load_model_weight_file_name)
for layer in base_model.layers[: 150]:
layer.trainable = False
for layer in base_model.layers[ 150:]:
layer.trainable = True
The rest of my code (not shown here) are the usual compile and fit calls.
Running this code seems to result a network that doesn't really learn (loss and accuracy remain approximately the same). I tried different orders of magnitude for the optimization step size, but that doesn't seem to help.
Another thing that I observed it that when I make the whole InceptionV3 part trainable
base_model.trainable = True
that the training starts with an accuracy server orders of magnitude smaller than were my first training round finished (and of course a much higher loss). Can someone explain this to me? I would at least expect the training to continue were it left off in terms of accuracy and loss.
You could do something like:
for layer in base_model.layers:
if isinstance(layer ,tf.keras.layers.BatchNormalization):
layer.trainable=False
This will iterate over each layer and check the type, setting to inference mode if the layer is BatchNorm.
As for the low starting accuracy during transfer learning, you're only loading the weights and not the optimiser state (as would occur with a full model.load() which loads architecture, weights, optimiser state etc).
This doesn't mean there's an error, but if you must load weights only just let it train, the optimiser will configure eventually and you should see progress. Also as you're potentially over-writing the pre-trained weights in your second run, make sure you use a lower learning rate so the updates are small in comparison i.e. fine-tune the weights rather than blast them to pieces.

What do units and layers do in neural network?

model = keras.Sequential([
# the hidden ReLU layers
layers.Dense(units=4, activation='relu', input_shape=[2]),
layers.Dense(units=3, activation='relu'),
# the linear output layer
layers.Dense(units=1),
])
The above is a Keras sequential model example from Kaggle. I'm having a problem understanding these two things.
Are the units the number of nodes in a hidden layer? I see some people put 250 or what ever. What does the number do when it gets changed higher or lower?
Why would another hidden layer need to be added? What does it actually do the data to add more and more layers?
Answers in brief
units is representing how many neurons in a particular layer.When you have higher number,model has higher parameters to update during learning.Same thing goes to layers as well.(more layers and more neurons take more time to train the model).selecting how many neurons is depend on the use case and dataset and model architecture.
When you have more hidden layers, you have more parameters to update.More parameters and layers meaning model is able to understand complex relationships hidden in the data. For example when you have a image classification(multiple), you need more deep layers with neurons to understand the features in the image, which use to classify in final layer.
play with tensorflow playground,it will give great idea when you change the layers and neurons.

How should I change the model if accuracy is very low?

I am new to deep learning and I want to build an image classifier using CNN(keras). I have built a model with 2 convolution layers (filters = 32 , kernel = 3x3) followed by a MaxPooling layer(2x2) and this repeated 2 times. Finally 2 fully connected layers. I am getting an accuracy of 50%. My question is how do we choose the model to begin with. Like how do we decide that there should be 2 convolution layers followed by a MaxPooling layer or 1 convolution and 1 MaxPooling layer. Also how do we choose the number of filters in each convolution layer and the kernel size.
If my model is not working then how to decide what changes to be made to the model .
model = Sequential()
model.add(Convolution2D(32,3,3,input_shape=
(280,280,3),activation='relu'))
model.add(Convolution2D(32,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(Convolution2D(64,3,3,activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
#model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(output_dim=256 , activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(output_dim=5,activation='softmax'))
I am getting an accuracy of 50% after 5 epochs. What changes should i make in my model?
Let us first start with the more straightforward part. Knowing the number of input and output layers and the number of their neurons is the easiest part. Every network has a single input layer and a single output layer. The number of neurons in the input layer equals the number of input variables in the data being processed. The number of neurons in the output layer equals the number of outputs associated with each input.
But the challenge is knowing the number of hidden layers and their neurons.
The answer is you cannot analytically calculate the number of layers or the number of nodes to use per layer in an artificial neural network to address a specific real-world predictive modeling problem.
The number of layers and the number of nodes in each layer are model hyperparameters that you must specify and learn.
You must discover the answer using a robust test harness and controlled experiments. Regardless of the heuristics, you might encounter, all answers will come back to the need for careful experimentation to see what works best for your specific dataset.
Again the filter size is one such hyperparameter you should specify before training your network.
For an image recognition problem, if you think that a big amount of pixels are necessary for the network to recognize the object you will use large filters (as 11x11 or 9x9). If you think what differentiates objects are some small and local features you should use small filters (3x3 or 5x5).
These are some tips but do not exist any rules.
There are many tricks to increase the accuracy of your deep learning model. Kindly refer to this link Improve deep learning model performance.
Hope this will help you.

Replicating TF model in Keras from pbtxt and ckpts

I'm not sure if this is actually possible with the information I have, but please let me know if that's the case.
From a previously trained Tensorflow model I have the following files:
graph.pbtxt, checkpoint, model.ckpt-10000.data-00000-of-00001, model.ckpt-10000.index, and model.ckpt-10000.meta
I was told that the input size of this model was a Dense layer of size 5000 and the output was a Dense sigmoid binary classification, but I don't know how many/what size layers were in between. (I'm also not 100% positive that the input size is correct).
From this information and associated files, is there a way to replicate the TF model with trained weights into a Keras functional model?
(The idea was that this small dense network was added onto the last FC layer of VGG-16, so that'll be my end goal.)

Low gradient values while training convolutional neural network

I am training a neural network with 3 convolutional layers and 1 fully connected layer which further connected to a regression layer. My training data set if TID: for image quality estimation task. I am pre-processing images by doing local contrast normalization on a small 6 * 6 window, which results in pixel values ranging from -4 to +4 approximately. I am using RELU activation function in all the layers.
The problem is that the gradient values for different layers produced by my network are very small, somewhere around 2e-6 to 2e-7, which I guess are not ideal for good convergence as it is not producing expected results. I have already tried altering initialization of network weights and biases, changing learning rates, employing L1 and L2 regularization, etc., but nothing seems to be help elevate this problem.
So my first question is that is it actually a problem? or is a common thing to have such small gradient values in convolutional network? If it is a problem then what could an appropriate solution for it?

Resources