AllenNLP Multi-Task Model: Keep encoder weights for new heads - pytorch

I have trained a (AllenNLP) multi-task model. I would like to keep the encoder/backbone weights and continue training with new heads on new datasets. How can I do that with AllenNLP?
I have two basic ideas for how to do that:
I followed this AllenNLP tutorial to load the trained model and then instead of just making predictions I wanted to change the configuration and the model-heads to continue training on the new datasets...but I am kinda lost in how to do that.
I guess it should be possible to (a) save the state-dict of the previously trained encoder in a file and then (b) point to those weights in the configuration file for the new model (instead of pointing to "bert-base-cased"-weights for example). But looking at the PretrainedTransformerEmbedder-class I don't see how I could pass my own model-weights to that class.
As an additional question: Is it also possible to save the weights of the heads separately and initialize new heads with those weights?
Any help is appreciated :)

Your second idea is the preferred way, which you can accomplish by using a PretrainedModelInitializer. See the CopyNet model for an example of how to add this to your model.

Related

Using pretrained models in Pytorch for Semantic Segmentation, then training only the fully connected layers with our own dataset

I am learning Pytorch and trying to understand how the library works for semantic segmentation.
What I've understood so far is that we can use a pre-trained model in pytorch. I've found an article which was using this model in the .eval() mode but I have not been able to find any tutorial on using such a model for training on our own dataset. I have a very small dataset and I need transfer learning to get results. My goal is to only train the FC layers with my own data. How is that achievable in Pytorch without complicating the code with OOP or so many .py files. I have been having a hard time figuring out such repos in github as I am not the most proficient person when it comes to OOP. I have been using Keras for Deep Learning until recently and there everything is easy and straightforward. Do I have the same options in Pycharm?
I appreciate any guidance on this. I need to run a piece of code that does the semantic segmentation and I am really confused about many of the steps I need to take.
Assume you start with a pretrained model called model. All of this occurs before you pass the model any data.
You want to find the layers you want to train by looking at all of them and then indexing them using model.children(). Running this command will show you all of the blocks and layers.
list(model.children())
Suppose you have now found the layers that you want to finetune (your FC layers as you describe). If the layers you want to train are the last 5 you can grab all of the layers except for the last 5 in order to set their requires_grad params to False so they don't train when you run the training algorithm.
list(model.children())[-5:]
Remove those layers:
layer_list = list(model.children())[-5:]
Rebuild model using sequential:
model_small = nn.Sequential(*list(model.children())[:-5])
Set requires_grad params to False:
for param in model_small.parameters():
param.requires_grad = False
Now you have a model called model_small that has all of the layers except the layers you want to train. Now you can reattach the layers that your removed and they will intrinsically have the requires_grad param set to True. Now when you train the model it will only update the weights on those layers.
model_small.avgpool_1 = nn.AdaptiveAvgPool2d()
model_small.lin1 = nn.Linear()
model_small.logits = nn.Linear()
model_small.softmax = nn.Softmax()
model = model_small.to(device)

How to use a pre-trained object detection in tensorflow?

How can I use the weights of a pre-trained network in my tensorflow project?
I know some theory information about this but no information about coding in tensorflow.
As been pointed out by #Matias Valdenegro in the comments, your first question does not make sense. For your second question however, there are multiple ways to do so. The term that you're searching for is Transfer Learning (TL). TL means transferring the "knowledge" (basically it's just the weights) from a pre-trained model into your model. Now there are several types of TL.
1) You transfer the entire weights from a pre-trained model into your model and use that as a starting point to train your network.
This is done in a situation where you now have extra data to train your model but you don't want to start over the training again. Therefore you just load the weights from your previous model and resume the training.
2) You transfer only some of the weights from a pre-trained model into your new model.
This is done in a situation where you have a model trained to classify between, say, 5 classes of objects. Now, you want to add/remove a class. You don't have to re-train the whole network from the start if the new class that you're adding has somewhat similar features with (an) existing class(es). Therefore, you build another model with the same exact architecture as your previous model except the fully-connected layers where now you have different output size. In this case, you'll want to load the weights of the convolutional layers from the previous model and freeze them while only re-train the fully-connected layers.
To perform these in Tensorflow,
1) The first type of TL can be performed by creating a model with the same exact architecture as the previous model and simply loading the model using tf.train.Saver().restore() module and continue the training.
2) The second type of TL can be performed by creating a model with the same exact architecture for the parts where you want to retain the weights and then specify the name of the weights in which you want to load from the previous pre-trained weights. You can use the parameter "trainable=False" to prevent Tensorflow from updating them.
I hope this helps.

How do i retrain the model without losing the earlier model data with new set of data

for my current requirement, I'm having a dataset of 10k+ faces from 100 different people from which I have trained a model for recognizing the face(s). The model was trained by getting the 128 vectors from the facenet_keras.h5 model and feeding those vector value to the Dense layer for classifying the faces.
But the issue I'm facing currently is
if want to train one person face, I have to retrain the whole model once again.
How should I get on with this challenge? I have read about a concept called transfer learning but I have no clues about how to implement it. Please give your suggestion on this issue. What can be the possible solutions to it?
With transfer learning you would copy an existing pre-trained model and use it for a different, but similar, dataset from the original one. In your case this would be what you need to do if you want to train the model to recognize your specific 100 people.
If you already did this and you want to add another person to the database without having to retrain the complete model, then I would freeze all layers (set layer.trainable = False for all layers) except for the final fully-connected layer (or the final few layers). Then I would replace the last layer (which had 100 nodes) to a layer with 101 nodes. You could even copy the weights to the first 100 nodes and maybe freeze those too (I'm not sure if this is possible in Keras). In this case you would re-use all the trained convolutional layers etc. and teach the model to recognise this new face.
You can save your training results by saving your weights with:
model.save_weights('my_model_weights.h5')
And load them again later to resume your training after you added a new image to the dataset with:
model.load_weights('my_model_weights.h5')

extracting gradients for a model, reversing them and updating the weights in Keras

I'm trying to do domain adversarial training using gradient-reversal procedure. I have a deep-learning model architecture consisting of 5 dense layers. Now I want to extract the gradient, reverse them and then update the weights. I am not sure how to extract the gradients and finally how to add them to update the weights. I have gone through some example codes but I am still pretty doubtful regarding using Keras backend. Any help with some toy code or example with explanation is really appreciated.
Thank you,

Keras: better way to implement layer-wise training model?

I'm currently learning implementing layer-wise training model with Keras. My solution is complicated and time-costing, could someone give me some suggestions to do it in a easy way? Also could someone explain the topology of Keras especially the relations among nodes.outbound_layer, nodes.inbound_layer and how did they associated with tensors: input_tensors and output_tensors? From the topology source codes on github, I'm quite confused about:
input_tensors[i] == inbound_layers[i].inbound_nodes[node_indices[i]].output_tensors[tensor_indices[i]]
Why the inbound_nodes contain output_tensors, I'm not clear about the relations among them....If I wanna remove layers in certain positions of the API model, what should I firstly remove? Also, when adding layers to some certain places, what shall I do first?
Here is my solution to a layerwise training model. I can do it on Sequential model and now trying to implement in on the API model:
To do it, I'm simply add a new layer after finish previous training and re-compile (model.compile()) and re-fit (model.fit()).
Since Keras model requires output layer, I would always add an output layer. As a result, each time when I wanna add a new layer, I have to remove the output layer then add it back. This can be done using model.pop(), in this case model has to be a keras.Sequential() model.
The Sequential() model supports many useful functions including model.add(layer). But for customised model using model API: model=Model(input=...., output=....), those pop() or add() functions are not supported and implement them takes some time and maybe not convenient.

Resources