Fine-tuning Resnet using pre-convoluted features - pytorch

As we know that convolution layers are expensive to calculate , I would like to compute the output of the convolution layers once and use them to train the fully connected layer of my Resnet, in order to speed up the process.
In the case of a VGG model, we can compute the output of the first convolutional part as follows
x = model_vgg.features(inputs)
But how can I do to extract features from a Resnet?
Thanks in advance

I guess you can try hacking through the net. I'll use resnet18 as an example:
import torch
from torch import nn
from torchvision.models import resnet18
net = resnet18(pretrained=False)
print(net)
You'll see something like:
....
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=1000, bias=True)
Let's store the Linear layer somewhere, and in its place, put a dummy layer. Then the output of the whole net is actually the output of the conv layers.
x = torch.randn(4,3,32,32) # dummy input of batch size 4 and 32x32 rgb images
out = net(x)
print(out.shape)
>>> 4, 1000 # batch size 4, 1000 default class predictions
store_fc = net.fc # Save the actual Linear layer for later
net.fc = nn.Identity() # Add a layer which actually does nothing
out = net(x)
print(out.shape)
>>> 4, 512 # batch size 4, 512 features that are the input to the actual fc layer.

Related

Autoencoder of CNN - decrease or increase filters?

In an Autoencoder based on CNN, will you increase or decrease the number of filters between layers ? As we compress the information, I was thinking of decreasing.
Example here of the encoder part where the number of filters is decreased at each new layer, from 16 to 8 to 4.
x = Conv2D(filters = 16, kernel_size = 3, activation='relu', padding='same', name='encoder_1a')(inputs)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_1b')(x)
x = Conv2D(filters = 8, kernel_size = 3, activation='relu', padding='same', name='encoder_2a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_2b')(x)
x = Conv2D(filters = 4, kernel_size = 3, activation='relu', padding='same', name='encoder_3a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_3b')(x)
It is not always the case that the filter sizes are reduced or increased with increasing number of layers in encoder. In most examples of encoder I have seen of convolutional autoencoder architectures the height and width is decreased through strided convolution or pooling, and depth of layer is increased (filter sizes are increased), kept similar to last one or varied with each new layer in encoder. But there is also examples where the output channels or filter sizes are decreased with more layers.
Usually autoencoder encodes input into latent representation/vector or embedding that has lower dimension than input that minimizes reconstruction error. So both of the above can be used for creating undercomplete autoencoder by varying kernel size, number of layers, adding an extra layer at the end of encoder with a certain dimension etc.
Filter increase example
In the image below as more layers are added in encoder the filter sizes increase. But as the input 28*28*1 = 784 dimension features and the flattened representation 3*3*128 = 1152 is more so another layer is added before final layer which is the embedding layer. It reduces the feature dimension with predefined number of outputs in fully connected network. Even the last dense/fully connected layer can be replaced by varying the number of layers or kernel size to have an output (1, 1, NUM_FILTERS).
Filter decrease example
An easy example of filters decreasing in encoder as the number of layers increase can be found on keras convolutional autoencoder example just as your code.
import keras
from keras import layers
input_img = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
References
https://www.deeplearningbook.org/contents/autoencoders.html
https://xifengguo.github.io/papers/ICONIP17-DCEC.pdf
https://blog.keras.io/building-autoencoders-in-keras.html

How to find the name of layers in preloaded torchvision models?

I'm trying to use GradCAM with a Deeplabv3 resnet50 model preloaded from torchvision, but in Captum I need to say the name of the layer (of type nn.module). I can't find any documentation for how this is done, does anyone possibly have any ideas of how to get the name of the final ReLu layer?
Thanks in advance!
You can have a look at its representation and get an idea of where it's located by simply printing it:
>>> model = torchvision.models.segmentation.deeplabv3_resnet50()
>>> model
DeepLabV3(
(backbone): IntermediateLayerGetter(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
...
To get the actual exact name of the layer you can loop over the modules with named_modules and only pick the nn.ReLU layers:
>>> relus = [name for name, module in model.named_modules() if isinstance(module, nn.ReLU)]
>>> relus
['backbone.relu',
'backbone.layer1.0.relu',
'backbone.layer1.1.relu',
'backbone.layer1.2.relu',
'backbone.layer2.0.relu',
'backbone.layer2.1.relu',
'backbone.layer2.2.relu',
'backbone.layer2.3.relu',
'backbone.layer3.0.relu',
'backbone.layer3.1.relu',
'backbone.layer3.2.relu',
'backbone.layer3.3.relu',
'backbone.layer3.4.relu',
'backbone.layer3.5.relu',
'backbone.layer4.0.relu',
'backbone.layer4.1.relu',
'backbone.layer4.2.relu',
'classifier.0.convs.0.2',
'classifier.0.convs.1.2',
'classifier.0.convs.2.2',
'classifier.0.convs.3.2',
'classifier.0.convs.4.3',
'classifier.0.project.2',
'classifier.3']
Then pick the last one:
>>> relus[-1]
'classifier.3'

coreML model converted from pytorch model giving the wrong prediction probabilities

I have a pytorch binary classification model that I converted to coreML. I converted my model directly and indirectly through onnx using the following tutorials/documentation respectively
https://coremltools.readme.io/docs/pytorch-conversion, and
https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb .
The output prior to the softmax function and the probabilities are similar for both the original pytorch and the onnx model converted from PyTorch. But the output for the coreML model converted from PyTorch via the tutorial documentation is completely incorrect. I had no errors compiling the coreML method from either method.
Checking the weights of the last layer for coreML and Pytorch seem to be the same. the output of the coreML model prior to softmax gives me
{'classLabel': '_xx', 'classLabelProbs': {'_xx': 29.15625, 'xx': -22.53125}}
while the output from the pytorch model give me [-3.2185674 3.4477997]
The output of the conversion from onnx to coreML looks like...
58/69: Converting Node Type Add
59/69: Converting Node Type Relu
60/69: Converting Node Type Conv
61/69: Converting Node Type BatchNormalization
62/69: Converting Node Type Relu
63/69: Converting Node Type Conv
64/69: Converting Node Type BatchNormalization
65/69: Converting Node Type Add
66/69: Converting Node Type Relu
67/69: Converting Node Type GlobalAveragePool
68/69: Converting Node Type Flatten
69/69: Converting Node Type Gemm
Translation to CoreML spec completed. Now compiling the CoreML model.
Model Compilation done.
While the output of the pytorch model when I print looks like this for the final layer....
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=2, bias=True).
How do I go about resolving the quantitative errors produced from my coreML model that was converted from PyTorch?
It's probably an issue with your image preprocessing options: https://machinethink.net/blog/help-core-ml-gives-wrong-output/
update:
using the coreml unified api I have added a scaled layer. My outputs are not giving any probabilities for my classifier.
![last couple layers of converted pytorch model][1]
[1]: https://i.stack.imgur.com/9bzd2.png
the last layer prints out a tensor instead of probabilities. So I added a softmax function via the network.builder
builder.add_softmax(name="softmax", input_name="305", output_name="307:labelProbabilityLayerName")
the previous last node had the output name equal to "307:labelProbabilityLayerName" and I changed it to "305", prior to me adding the softmax(). this way the previous last node's output is the input to my softmax. Also, now the output can be passed to my softmax can now connect to the original string class printing out the intended probabilites.
I am still getting an error saying...
"RuntimeError: Error compiling model: "Error reading protobuf spec. validator error: Layer 'softmax' consumes an input named '307' which is not present in this network."."
Which doesnt make sense because I defined my softmax to consume '305' and also updated that last layer which is an innerproduct layer to output 305.

Keras 2D Dense Layer for Output

I am playing with a model which should take a 8x8 chess board as input, encoded as a 224x224 grey image, and then output a 64x13 one-hot-encoded logistic regression = probabilities of pieces on the squares.
Now, after the Convolutional layers I don't quite know, how to proceed to get a 2D-Dense layer as a result/target.
I tried adding a Dense(64,13) as a layer to my Sequential model, but I get the error "Dense` can accept only 1 positional arguments ('units',)"
Is it even possible to train for 2D-targets?
EDIT1:
Here is the relevant part of my code, simplified:
# X.shape = (10000, 224, 224, 1)
# Y.shape = (10000, 64, 13)
model = Sequential([
Conv2D(8, (3,3), activation='relu', input_shape=(224, 224, 1)),
Conv2D(8, (3,3), activation='relu'),
# some more repetitive Conv + Pooling Layers here
Flatten(),
Dense(64,13)
])
TypeError: Dense can accept only 1 positional arguments ('units',), but you passed the following positional arguments: [64, 13]
EDIT2: As Anand V. Singh suggested, I changed Dense(64, 13) to Dense(832), which works fine. Loss = mse.
Wouldn't it be better to use "sparse_categorical_crossentropy" as loss and 64x1 encoding (instead of 64x13) ?
In Dense you only pass the number of layers you expect as output, if you want (64x13) as output, put the layer dimension as Dense(832) (64x13 = 832) and then reshape later. You will also need to reshape Y so as to accurately calculate loss, which will be used for back propagation.
# X.shape = (10000, 224, 224, 1)
# Y.shape = (10000, 64, 13)
Y = Y.reshape(10000, 64*13)
model = Sequential([
Conv2D(8, (3,3), activation='relu', input_shape=(224, 224, 1)),
Conv2D(8, (3,3), activation='relu'),
# some more repetitive Conv + Pooling Layers here
Flatten(),
Dense(64*13)
])
That should get the job done, if it doesn't post where it fails and we can proceed further.
A Reshape layer allows you to control the output shape.
Flatten(),
Dense(64*13),
Reshape((64,13))#2D

Neural Networks: Why can't I go deeper?

I am using this model to get depth maps from images:
def get_model(learning_rate=0.001, channels=2):
h = 128 # height of the image
w = 128 # width of the image
c = channels # no of channels
encoding_size = 512
# encoder
image = Input(shape=(c, h, w))
conv_1_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(image)
conv_1_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv_1_1)
pool_1_2 = MaxPooling2D((2, 2))(conv_1_2)
conv_2_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool_1_2)
conv_2_2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv_2_1)
pool_2_2 = MaxPooling2D((2, 2))(conv_2_2)
conv_3_1 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_2_2)
conv_3_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv_3_1)
# pool_3_2 = MaxPooling2D((2, 2))(conv_3_2)
# conv_4_1 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_3_2)
# conv_4_2 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_4_1)
# pool_4_3 = MaxPooling2D((2, 2))(conv_4_2)
# conv_5_1 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4_3)
# conv_5_2 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv_5_1)
flat_5_2 = Flatten()(conv_3_2)
encoding = Dense(encoding_size, activation='tanh')(flat_5_2)
# decoder
reshaped_6_1 = Reshape((8, 8, 8))(encoding)
conv_6_1 = Conv2D(128, (3, 3), activation='relu', padding='same')(reshaped_6_1)
conv_6_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv_6_1)
upsample_6_2 = UpSampling2D((2, 2))(conv_6_2)
conv_7_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(upsample_6_2)
conv_7_2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv_7_1)
upsample_7_2 = UpSampling2D((2, 2))(conv_7_2)
conv_8_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(upsample_7_2)
conv_8_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv_8_1)
upsample_8_2 = UpSampling2D((2, 2))(conv_8_2)
conv_9_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(upsample_8_2)
conv_9_2 = Conv2D(16, (3, 3), activation='relu', padding='same')(conv_9_1)
upsample_9_2 = UpSampling2D((2, 2))(conv_9_2)
conv_10_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(upsample_9_2)
conv_10_2 = Conv2D(1, (3, 3), activation='relu', padding='same')(conv_10_1)
output = Conv2D(1, (1, 1), activation=relu_normalized, padding='same')(conv_10_2)
model = Model(inputs=image, outputs=output)
model.compile(loss='mae', optimizer=Adam(learning_rate))
return model
Input: 2x128x128 (two bw images) - Squished to [0,1] (preprocessing normalization)
Output: 1x128x128 (depth map) - Squished to [0,1] by relu-normalized
NOTE: relu_normalized is just relu followed by squishing values to 0-1 so as to have a proper image. Sigmoid doesn't seem to fit this criteria.
When I add any more layers, the loss becomes a constant and backprop is not happening properly because both the output and gradients are becoming zero (and hence changing the learning rate didn't change anything in the network)
So if I want to go deeper to generalize more, by uncommenting the lines (and of course connecting conv_5_2 to flat_5_2), what is it that I am missing?
My thoughts:
Using Sigmoid would lead to vanishing gradient problem, but I am using relu's, would that problem still exist?
Changing anything in the network, like encoding size, even changing to activations to elu or selu doesn't show any progress.
Why are my outputs getting closer to zero when I try to add even one more conv layer followed by max_pooling?
UPDATE:
Here's relu_normalized,
def relu_normalized(x):
epsilon = 1e-6
relu_x = relu(x)
relu_scaled_x = relu_x / (K.max(relu_x) + epsilon)
return relu_scaled_x
and later after getting the output which has range [0,1], we simple do output_image = 255 * output and we can save this as b/w image now.
If you want go deeper you have to add some batch normalization layer (in Keras https://keras.io/layers/normalization/#batchnormalization) in this case.
From Ian Goodfellow's book, on the batch normalization chapter:
Very deep models involve the composition of several functions or layers. The
gradient tells how to update each parameter, under the assumption that the other
layers do not change. In practice, we update all of the layers simultaneously.
When we make the update, unexpected results can happen because many functions
composed together are changed simultaneously, using updates that were computed
under the assumption that the other functions remain constant
Also, tanh is easily saturated so use only if you need it :)
There is a problem that might happen with "relu" when you have learning rates too big.
There is a high chance of all activations going to 0 and getting stuck there never to change anymore. (When they're at 0, their gradient is also 0).
Since I'm not an expert on adjusting the parameters in detail for using "relu", and my results with "relu" are always bad, I prefer using "sigmoid" or "tanh". (It's worth trying, although there might be some vanishing there...). I keep my images ranged from 0 to 1 and use "binary_crossentropy" as loss, which is a lot faster than "mae/mse" in this case.
Another thing that happened to me was an "apparently" frozen loss function. It happened that the value was changing so little, that the displayed decimals weren't enough to see the variation, but after a lot of epochs, it found a reasonable way to go down properly. (Probably some kind of saturation indeed, but for me it's still better than getting freezes or nans)
You can introduce recurrent layers like LSTM which would "trap" the errors using gating, potentially improving the situation.

Resources