I have a script module like below
(this actually is a module of a pre-trained model saved as a RecursiveScriptModule in Pytorch):
RecursiveScriptModule(
original_name=out_Conv
(conv): RecursiveScriptModule(original_name=Conv2d)
(bn): RecursiveScriptModule(original_name=BatchNorm2d)
(act): RecursiveScriptModule(original_name=Sigmoid)
)
Could I delete the attribute 'act' in this module then save it as a new model?
Related
ive been trying to reset the weights in a pretrained sentence transformer model but i cant seem to find any information about this and the code ive used before to reset the weights of a pytorch model also doesnt work.
For example, the following code doesn't work for both the transformers.DistilBertModel as well as when replacing the model with a sentence_transformer.SentenceTransformer('distilbert-base-nli-mean-tokens') model.
from transformers import DistilBertTokenizer, DistilBertModel
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
outputs = model(**inputs)
for i, layer in enumerate(model.encoder.layer):
model.encoder.layer[i].apply(model._init_weights)
Output:
AttributeError: 'DistilBertModel' object has no attribute 'encoder'
When trying to iterate through the models body like this:
from sentence_transformers import SentenceTransformer, SentencesDataset, losses
from sentence_transformers.readers import InputExample
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
for name, module in model.named_children():
print('resetting ', model[int(name)])
print(module._modules)
Output:
resetting Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
OrderedDict([('auto_model', DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(...........
.......etc.......
How do i access all of these layers and reset all of their weights?
Bare Problem Statement:
I have trained a Model A, that consists of a feature Extractor FE and a classification head ACH.
I want to train a model B, that uses A's feature extractor FE and retrains it's own classification head BCH.
So far it's easy. Now I don't want to save the entire model B since the FE part of it is already saved in the model A. I only want to dump the BCH, and during inference
Load model A - do it's prediction
Load B's classification head BCH.
Swap the classification head ACH with BCH
Run prediction using this swapped state.
Reading pyTorches documentation it only talks about saving entire models. How can I achieve this?
End of problem statement
More details on the motivation of the problem:
I have a dataset of images that I want to classify, these images have can have several classes given to them. For example the same image can have the class of "Land Vehicle" (supercategory) and a class of "Car" (category) or a "Truck". Another image might have the class "Aerial Vehicle" and it can be a "Helicopter" or a "Plane".
Since the images and therefore most of the features should be the same, I wish to train one classifier for the supercategories, then freeze it's feature-extractor, and sort of transfer learn the same model for the categories using the pretrained feature extractor.
Since the weights of the feature extracting backbone is the same, I only want to save the weights of the classification head of the categories model, and thus save some precious computational resources.
In general, it's something usual to only want an access to the backbone of a model in order to reuse it for others purposes. You have several ways to perform this. But mostly, having in mind that saving a model checkpoint and loading it later means saving weights and biases and being able to load them correctly to the corresponding layers, you first need to know, from your model, what part do you want to save.
When you get the state of a model, you will obtain a dictionary. The keys will be the layers names and the values will be the weights and the biases. Let's see an example with an efficientnet classifier on how to only save the backbone of a model. Basically, an efficientnet, as in your example, is a backbone and a fully connected layer as a head, if you only want the backbone, you want every single layers, except the head that you'll fine tune later.
import torch
import torch.nn as nn
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_name("efficientnet-b0")
print(model)
It will print the model layers and some features, basic stuff.
EfficientNet(
(_conv_stem): Conv2dStaticSamePadding(
3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
(static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
)
(_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_blocks): ModuleList(
(0): MBConvBlock(
(_depthwise_conv): Conv2dStaticSamePadding(
32, 32, kernel_size=(3, 3), stride=[1, 1], groups=32, bias=False
(static_padding): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
)
(_bn1): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_se_reduce): Conv2dStaticSamePadding(
32, 8, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
(_se_expand): Conv2dStaticSamePadding(
8, 32, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
...
Now what is interesting is the final layers of this model :
...
(_bn1): BatchNorm2d(1280, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_avg_pooling): AdaptiveAvgPool2d(output_size=1)
(_dropout): Dropout(p=0.2, inplace=False)
(_fc): Linear(in_features=1280, out_features=1000, bias=True)
(_swish): MemoryEfficientSwish()
Let's say we want to reuse this model backbone, except _fcsince we would like to use the weights on another model having the same backbone but a different head, not pre-trained. In this example I'll take the same backbone and add 3 heads :
class ThreeHeadEfficientNet(torch.nn.Module):
def __init__(self,nbClasses1,nbClasses2,nbClasses3,model="efficientnet-b0",dropout_p=0.2):
super(ThreeHeadEfficientNet, self).__init__()
self.NBC1 = nbClasses1
self.NBC2 = nbClasses2
self.NBC3 = nbClasses3
self.dropout_p = dropout_p
self._dropout_layer = torch.nn.Dropout(p=self.dropout_p)
self._head1 = torch.nn.Linear(1280,self.NBC1)
self._head2 = torch.nn.Linear(1280,self.NBC2)
self._head3 = torch.nn.Linear(1280,self.NBC3)
self.model = EfficientNet.from_name(model,include_top=False) #you can notice here, I'm not loading the head, only the backbone
def forward(self,x):
features = self.model(x)
res = features.flatten(start_dim=1)
res = self._dropout_layer(res)
res1 = self._head1(res)
res2 = self._head2(res)
res3 = self._head3(res)
return res1,res2,res3
You'll notice now, if you print this ThreeHeadsModel layers, the layers name have slightly changed from _conv_stem.weight to model._conv_stem.weight since the backbone is now stored in a attribute variable model. We'll thus have to process that otherwise the keys will mismatch, create a new state dictionary that matches the expected keys of this new model and containing the pretrained weights and biases :
pretrained_dict = model.state_dict() #pretrained model keys
model_dict = new_model.state_dict() #new model keys
processed_dict = {}
for k in model_dict.keys():
decomposed_key = k.split(".")
if("model" in decomposed_key):
pretrained_key = ".".join(decomposed_key[1:])
processed_dict[k] = pretrained_dict[pretrained_key] #Here we are creating the new state dict to make our new model able to load the pretrained parameters without the head.
new_model.load_state_dict(processed_dict, strict=False) #strict here is important since the heads layers are missing from the state, we don't want this line to raise an error but load the present keys anyway.
And finally, in new_model you should have your new model with a pretrained backbone and heads to fine tune.
Now you should be able to fix your issues :)
For more pytorch information, please also check the forum.
I would like to fine-tune already fine-tuned BertForSequenceClassification model with new dataset containing just 1 additional label which hasn't been seen by model before.
By that, I would like to add 1 new label to the set of labels that model is currently able of classifying properly.
Moreover, I don't want classifier weights to be randomly initialized, I'd like to keep them intact and just update them accordingly to the dataset examples while increasing the size of classifier layer by 1.
The dataset used for further fine-tuning could look like this:
sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label
My model's current classifier layer looks like this:
Linear(in_features=768, out_features=135, bias=True)
How could I achieve it?
Is it even a good approach?
You can just extend the weights and bias of your model with new values. Please have a look at the commented example below:
#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)
#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)
#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))
#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))
#and be happy when we compare the output with our expectation
print(model(**f).logits)
Output:
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
grad_fn=<AddmmBackward>)
RobertaClassificationHead(
(dense): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895, 2.2124]],
grad_fn=<AddmmBackward>)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.7)
learn.fit_one_cycle(1, 1e-2)
I have trained fastai model as above. I can get prediction as below
preds, targets = learn.get_preds()
But instead I want penultimate layer embeddings of model learn (This practise is common for CNN models). Could you help me how to do it?
I'm not sure if you want a classifier but anyway...
learn.model gives you back the model architecture. Then learn.model[0] would be an encoder learn.model[1] the other part of the model.
Example:
To access first linear layer in SequentialEx (architecture below) you would do it using following command
learn.model[0].layers[0].ff.layers[0]
SequentialRNN(
(0): TransformerXL(
(encoder): Embedding(60004, 410)
(pos_enc): PositionalEncoding()
(drop_emb): Dropout(p=0.03)
(layers): ModuleList(
(0): DecoderLayer(
(mhra): MultiHeadRelativeAttention(
(attention): Linear(in_features=410, out_features=1230, bias=False)
(out): Linear(in_features=410, out_features=410, bias=False)
(drop_att): Dropout(p=0.03)
(drop_res): Dropout(p=0.03)
(ln): LayerNorm(torch.Size([410]), eps=1e-05, elementwise_affine=True)
(r_attn): Linear(in_features=410, out_features=410, bias=False)
)
(ff): SequentialEx(
(layers): ModuleList(
(0): Linear(in_features=410, out_features=2100, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.03)
(3): Linear(in_features=2100, out_features=410, bias=True)
(4): Dropout(p=0.03)
(5): MergeLayer()
(6): LayerNorm(torch.Size([410]), eps=1e-05, elementwise_affine=True)
)
)
)
I want to save my CNN model configuration (kernel sizes, activations, filters, etc) to txt file. By using "summary" I only get layers input, output and params, but I need more information.
I tried the following functions:
# large dictionary with all information, but have a lot of noise info
config = model.get_config()
# same dictionary, but converted to string
summaryJson = str(model.to_json())
All this solutions don't give me the most important parameters. So I found this solution that seems to give all that I need, but it doesn't work:
from keras_diagram import ascii
summary = asc(model)
But it gives me following error:
AttributeError: 'Activation' object has no attribute 'inbound_nodes'
This is my last layers:
...
hid = Conv2D(128, kernel_size=5, strides=1, padding='same')(hid)
hid = BatchNormalization(momentum=0.9)(hid)
hid = LeakyReLU(alpha=0.1)(hid)
hid = Conv2D(3, kernel_size=5, strides=1, padding="same")(hid)
out = Activation("tanh")(hid)
model = Model(input_layer, out)
summary = ascii(model)
Do you know what to do?