I loaded the PyTorch's nn.Embedding module with a pre-trained embedding matrix. I set it to trainable as follows.
self.embedding_layer = nn.Embedding(self.vocab_size, self.embed_size, paddding_idx=self.padding_idx)
self.embedding_layer.weight = nn.Parameter(self.embedding)
self.embedding_layer.weight.requires_grad = True
I processed this output using Bidirectional Gated Recurrent Units network.
After the model is trained, I checked whether the weights of nn.Embedding are updated. The weights were not updated. model.embedding_layer.weight and 'self.embedding' weights are the same.
I checked the gradients of model.embedding_layer. They are all zeros.
Could you please help me? Thank you.
Related
I have a pre-trained model, by changing final fc layers I have created model for downstream task. Now, I want to load weights from pertained weights. I tried self.model.load_from_checkpoint (self.pretrained_model_path). But, when I print weight values from model layers, they are exactly same which indicates weights were not loaded/updated. Note that is not giving me any warning/error.
Edit:
self.model.backbone = self.model.load_from_checkpoint (self.pretrained_model_path).backbone
updates the parameters with pre_trained weights. There might be optimal way but I found this fix.
I would like to load my model which is modified with the pre-trained weights of a ResNet18. I noticed that there is no bias in the state_dict of this model and bias=False in the conv2d layer. My questions are:
1 - is it possible to access to a version of these pre-trained models in PyTorch that have bias values too?
2 - Why pre-trained bias values are not available and what is the effect of having informed weight values but randomly initialized bias vectors?
I appreciate your guidance and opinions.
I've modified BasicBlock of Resnet architecture by adding a few more FC layers at the end of the block.
I tried model.load_state_dict() on the new model and it worked perfectly.
I wonder how the weights of these layers are treated when I load pretrained weights.
Are pretrained weights assigned properly to correct layers and weights from new layers are intialized randomly?
Or model.load_state_dict() fail in this scenario and all weights of the model are intialized from the beginning?
I trained a LeNet architecture on a first dataset. I want to train a VGG architecture on an other dataset by initializing the weights of VGG with weights obtained from LeNet.
All initialization functions in keras are predefined and I do not find how to customize them. For example :
keras.initializers.Zeros()
Any idea how I can set the weights?
https://keras.io/layers/about-keras-layers/
According to the Keras documentation above:
layer.set_weights(weights) sets the weights of the layer from a list of Numpy arrays
layer.get_weights() returns the weights of the layer as a list of Numpy arrays
So, you can do this as follows:
model = Sequential()
model.add(Dense(32))
... building the model's layers ...
# access any nth layer by calling model.layers[n]
model.layers[0].set_weights( your_weights_here )
Of course, you'll need to make sure you are setting the weights of each layer to the appropriate shape they should be.
I'm not sure if this is actually possible with the information I have, but please let me know if that's the case.
From a previously trained Tensorflow model I have the following files:
graph.pbtxt, checkpoint, model.ckpt-10000.data-00000-of-00001, model.ckpt-10000.index, and model.ckpt-10000.meta
I was told that the input size of this model was a Dense layer of size 5000 and the output was a Dense sigmoid binary classification, but I don't know how many/what size layers were in between. (I'm also not 100% positive that the input size is correct).
From this information and associated files, is there a way to replicate the TF model with trained weights into a Keras functional model?
(The idea was that this small dense network was added onto the last FC layer of VGG-16, so that'll be my end goal.)