Weights are not updated in Pytorch nn.Embedding - pytorch

I loaded the PyTorch's nn.Embedding module with a pre-trained embedding matrix. I set it to trainable as follows.
self.embedding_layer = nn.Embedding(self.vocab_size, self.embed_size, paddding_idx=self.padding_idx)
self.embedding_layer.weight = nn.Parameter(self.embedding)
self.embedding_layer.weight.requires_grad = True
I processed this output using Bidirectional Gated Recurrent Units network.
After the model is trained, I checked whether the weights of nn.Embedding are updated. The weights were not updated. model.embedding_layer.weight and 'self.embedding' weights are the same.
I checked the gradients of model.embedding_layer. They are all zeros.
Could you please help me? Thank you.

Related

PyTorch Lightening: load model from pre-trained; values of weights remains same

I have a pre-trained model, by changing final fc layers I have created model for downstream task. Now, I want to load weights from pertained weights. I tried self.model.load_from_checkpoint (self.pretrained_model_path). But, when I print weight values from model layers, they are exactly same which indicates weights were not loaded/updated. Note that is not giving me any warning/error.
Edit:
self.model.backbone = self.model.load_from_checkpoint (self.pretrained_model_path).backbone
updates the parameters with pre_trained weights. There might be optimal way but I found this fix.

Pretrained ResNet18 in Pytroch and deactivated Bias

I would like to load my model which is modified with the pre-trained weights of a ResNet18. I noticed that there is no bias in the state_dict of this model and bias=False in the conv2d layer. My questions are:
1 - is it possible to access to a version of these pre-trained models in PyTorch that have bias values too?
2 - Why pre-trained bias values are not available and what is the effect of having informed weight values but randomly initialized bias vectors?
I appreciate your guidance and opinions.

how does (PyTorch) model.load_state_dict() work for modified model?

I've modified BasicBlock of Resnet architecture by adding a few more FC layers at the end of the block.
I tried model.load_state_dict() on the new model and it worked perfectly.
I wonder how the weights of these layers are treated when I load pretrained weights.
Are pretrained weights assigned properly to correct layers and weights from new layers are intialized randomly?
Or model.load_state_dict() fail in this scenario and all weights of the model are intialized from the beginning?

Initializing the weights of a model from the output of another model in keras for transfer learning

I trained a LeNet architecture on a first dataset. I want to train a VGG architecture on an other dataset by initializing the weights of VGG with weights obtained from LeNet.
All initialization functions in keras are predefined and I do not find how to customize them. For example :
keras.initializers.Zeros()
Any idea how I can set the weights?
https://keras.io/layers/about-keras-layers/
According to the Keras documentation above:
layer.set_weights(weights) sets the weights of the layer from a list of Numpy arrays
layer.get_weights() returns the weights of the layer as a list of Numpy arrays
So, you can do this as follows:
model = Sequential()
model.add(Dense(32))
... building the model's layers ...
# access any nth layer by calling model.layers[n]
model.layers[0].set_weights( your_weights_here )
Of course, you'll need to make sure you are setting the weights of each layer to the appropriate shape they should be.

Replicating TF model in Keras from pbtxt and ckpts

I'm not sure if this is actually possible with the information I have, but please let me know if that's the case.
From a previously trained Tensorflow model I have the following files:
graph.pbtxt, checkpoint, model.ckpt-10000.data-00000-of-00001, model.ckpt-10000.index, and model.ckpt-10000.meta
I was told that the input size of this model was a Dense layer of size 5000 and the output was a Dense sigmoid binary classification, but I don't know how many/what size layers were in between. (I'm also not 100% positive that the input size is correct).
From this information and associated files, is there a way to replicate the TF model with trained weights into a Keras functional model?
(The idea was that this small dense network was added onto the last FC layer of VGG-16, so that'll be my end goal.)

Resources