Fine-tuning model's classifier layer with new label - pytorch

I would like to fine-tune already fine-tuned BertForSequenceClassification model with new dataset containing just 1 additional label which hasn't been seen by model before.
By that, I would like to add 1 new label to the set of labels that model is currently able of classifying properly.
Moreover, I don't want classifier weights to be randomly initialized, I'd like to keep them intact and just update them accordingly to the dataset examples while increasing the size of classifier layer by 1.
The dataset used for further fine-tuning could look like this:
sentece,label
intent example 1,new_label
intent example 2,new_label
...
intent example 10,new_label
My model's current classifier layer looks like this:
Linear(in_features=768, out_features=135, bias=True)
How could I achieve it?
Is it even a good approach?

You can just extend the weights and bias of your model with new values. Please have a look at the commented example below:
#This is the section that loads your model
#I will just use an pretrained model for this example
import torch
from torch import nn
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jpcorb20/toxic-detector-distilroberta")
model = AutoModelForSequenceClassification.from_pretrained("jpcorb20/toxic-detector-distilroberta")
#we check the output of one sample to compare it later with the extended layer
#to verify that we kept the previous learnt "knowledge"
f = tokenizer.encode_plus("This is an example", return_tensors='pt')
print(model(**f).logits)
#Now we need to find out the name of the linear layer you want to extend
#The layers on top of distilroberta are wrapped inside a classifier section
#This name can differ for you because it can be chosen randomly
#use model.parameters instead find the classification layer
print(model.classifier)
#The output shows us that the classification layer is called `out_proj`
#We can now extend the weights by creating a new tensor that consists of the
#old weights and a randomly initialized tensor for the new label
model.classifier.out_proj.weight = nn.Parameter(torch.cat((model.classifier.out_proj.weight, torch.randn(1,768)),0))
#We do the same for the bias:
model.classifier.out_proj.bias = nn.Parameter(torch.cat((model.classifier.out_proj.bias, torch.randn(1)),0))
#and be happy when we compare the output with our expectation
print(model(**f).logits)
Output:
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895]],
grad_fn=<AddmmBackward>)
RobertaClassificationHead(
(dense): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(out_proj): Linear(in_features=768, out_features=6, bias=True)
)
tensor([[-7.3604, -9.4899, -8.4170, -9.7688, -8.4067, -9.3895, 2.2124]],
grad_fn=<AddmmBackward>)

Related

How do i reset sentence_transformers or distilbert model weights to train from scratch?

ive been trying to reset the weights in a pretrained sentence transformer model but i cant seem to find any information about this and the code ive used before to reset the weights of a pytorch model also doesnt work.
For example, the following code doesn't work for both the transformers.DistilBertModel as well as when replacing the model with a sentence_transformer.SentenceTransformer('distilbert-base-nli-mean-tokens') model.
from transformers import DistilBertTokenizer, DistilBertModel
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
outputs = model(**inputs)
for i, layer in enumerate(model.encoder.layer):
model.encoder.layer[i].apply(model._init_weights)
Output:
AttributeError: 'DistilBertModel' object has no attribute 'encoder'
When trying to iterate through the models body like this:
from sentence_transformers import SentenceTransformer, SentencesDataset, losses
from sentence_transformers.readers import InputExample
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
for name, module in model.named_children():
print('resetting ', model[int(name)])
print(module._modules)
Output:
resetting Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
OrderedDict([('auto_model', DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(...........
.......etc.......
How do i access all of these layers and reset all of their weights?

How to save and load only particular layers of a neural network with PyTorch?

Bare Problem Statement:
I have trained a Model A, that consists of a feature Extractor FE and a classification head ACH.
I want to train a model B, that uses A's feature extractor FE and retrains it's own classification head BCH.
So far it's easy. Now I don't want to save the entire model B since the FE part of it is already saved in the model A. I only want to dump the BCH, and during inference
Load model A - do it's prediction
Load B's classification head BCH.
Swap the classification head ACH with BCH
Run prediction using this swapped state.
Reading pyTorches documentation it only talks about saving entire models. How can I achieve this?
End of problem statement
More details on the motivation of the problem:
I have a dataset of images that I want to classify, these images have can have several classes given to them. For example the same image can have the class of "Land Vehicle" (supercategory) and a class of "Car" (category) or a "Truck". Another image might have the class "Aerial Vehicle" and it can be a "Helicopter" or a "Plane".
Since the images and therefore most of the features should be the same, I wish to train one classifier for the supercategories, then freeze it's feature-extractor, and sort of transfer learn the same model for the categories using the pretrained feature extractor.
Since the weights of the feature extracting backbone is the same, I only want to save the weights of the classification head of the categories model, and thus save some precious computational resources.
In general, it's something usual to only want an access to the backbone of a model in order to reuse it for others purposes. You have several ways to perform this. But mostly, having in mind that saving a model checkpoint and loading it later means saving weights and biases and being able to load them correctly to the corresponding layers, you first need to know, from your model, what part do you want to save.
When you get the state of a model, you will obtain a dictionary. The keys will be the layers names and the values will be the weights and the biases. Let's see an example with an efficientnet classifier on how to only save the backbone of a model. Basically, an efficientnet, as in your example, is a backbone and a fully connected layer as a head, if you only want the backbone, you want every single layers, except the head that you'll fine tune later.
import torch
import torch.nn as nn
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_name("efficientnet-b0")
print(model)
It will print the model layers and some features, basic stuff.
EfficientNet(
(_conv_stem): Conv2dStaticSamePadding(
3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
(static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
)
(_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_blocks): ModuleList(
(0): MBConvBlock(
(_depthwise_conv): Conv2dStaticSamePadding(
32, 32, kernel_size=(3, 3), stride=[1, 1], groups=32, bias=False
(static_padding): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
)
(_bn1): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_se_reduce): Conv2dStaticSamePadding(
32, 8, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
(_se_expand): Conv2dStaticSamePadding(
8, 32, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
...
Now what is interesting is the final layers of this model :
...
(_bn1): BatchNorm2d(1280, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_avg_pooling): AdaptiveAvgPool2d(output_size=1)
(_dropout): Dropout(p=0.2, inplace=False)
(_fc): Linear(in_features=1280, out_features=1000, bias=True)
(_swish): MemoryEfficientSwish()
Let's say we want to reuse this model backbone, except _fcsince we would like to use the weights on another model having the same backbone but a different head, not pre-trained. In this example I'll take the same backbone and add 3 heads :
class ThreeHeadEfficientNet(torch.nn.Module):
def __init__(self,nbClasses1,nbClasses2,nbClasses3,model="efficientnet-b0",dropout_p=0.2):
super(ThreeHeadEfficientNet, self).__init__()
self.NBC1 = nbClasses1
self.NBC2 = nbClasses2
self.NBC3 = nbClasses3
self.dropout_p = dropout_p
self._dropout_layer = torch.nn.Dropout(p=self.dropout_p)
self._head1 = torch.nn.Linear(1280,self.NBC1)
self._head2 = torch.nn.Linear(1280,self.NBC2)
self._head3 = torch.nn.Linear(1280,self.NBC3)
self.model = EfficientNet.from_name(model,include_top=False) #you can notice here, I'm not loading the head, only the backbone
def forward(self,x):
features = self.model(x)
res = features.flatten(start_dim=1)
res = self._dropout_layer(res)
res1 = self._head1(res)
res2 = self._head2(res)
res3 = self._head3(res)
return res1,res2,res3
You'll notice now, if you print this ThreeHeadsModel layers, the layers name have slightly changed from _conv_stem.weight to model._conv_stem.weight since the backbone is now stored in a attribute variable model. We'll thus have to process that otherwise the keys will mismatch, create a new state dictionary that matches the expected keys of this new model and containing the pretrained weights and biases :
pretrained_dict = model.state_dict() #pretrained model keys
model_dict = new_model.state_dict() #new model keys
processed_dict = {}
for k in model_dict.keys():
decomposed_key = k.split(".")
if("model" in decomposed_key):
pretrained_key = ".".join(decomposed_key[1:])
processed_dict[k] = pretrained_dict[pretrained_key] #Here we are creating the new state dict to make our new model able to load the pretrained parameters without the head.
new_model.load_state_dict(processed_dict, strict=False) #strict here is important since the heads layers are missing from the state, we don't want this line to raise an error but load the present keys anyway.
And finally, in new_model you should have your new model with a pretrained backbone and heads to fine tune.
Now you should be able to fix your issues :)
For more pytorch information, please also check the forum.

Error: "ANN Visualizer: Layer not supported for visualizing" in the function ann_viz()

So I am using the ann_visualizer for showing my keras model neural network graphically. The model works properly, but it gives this error whenever I try to visualize it via ann_viz().
"ValueError: ANN Visualizer: Layer not supported for visualizing"
I searched the internet but couldn't find a valid solution.
this is the neural network model code
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28,28)))
model.add(keras.layers.Dense(128,activation=keras.activations.relu))
model.add(keras.layers.Dense(10,activation=keras.activations.softmax))
model.compile(
optimizer="adam",
loss=keras.losses.sparse_categorical_crossentropy,
metrics=["accuracy"]
)
model.fit(train_data, train_lables, epochs=10)
test_loss, test_acc = model.evaluate(test_data, test_lables)
And this is the ann_viz() function call
from ann_visualizer.visualize import ann_viz
ann_viz(model, title="Model")
Any idea how to make it work?
I also got the same error but was able to resolve by removing the Flatten() layer.
#Flatten the input
X = X.reshape(X.shape[0], 28*28)
model = keras.Sequential()
#added flat input shape
model.add(keras.layers.Dense(128,activation=keras.activations.relu, input_shape=(28*28,)))
model.add(keras.layers.Dense(10,activation=keras.activations.softmax))
model.compile(
optimizer="adam",
loss=keras.losses.sparse_categorical_crossentropy,
metrics=["accuracy"]
)
#now you can call the ann_viz
from ann_visualizer.visualize import ann_viz
ann_viz(model, title="Model")
Basically, I Flattened the input, removed the Flatten layer and added the flat input shape in the next layer.
Don't know the exact reason though.

model.predict in keras using universal sentence encoder giving shape error

I am using keras model.predict to predict sentiments. I am using universal sentence embeddings. While predicting, I am getting the error described below.
Please provide your valuable insights.
Regards.
I have run the code for two sets of inputs. For say, input1, the prediction is obtained. While its not working for input 2.
Input 1 is the form : {(a1,[sents1]),....}
Input 2:{((a1,a2),[sents11])),...}
The input for predicting is the [sents1], [sents11] etc. extracted from this.
I could see the related question in (Keras model.predict function giving input shape error). But I don't know whether its resolved. Further, input1 is working.
import tensorflow as tf
import keras.backend as K
from keras import layers
from keras.models import Model
import numpy as np
def UniversalEmbedding(x):
return embed(tf.squeeze(tf.cast(x, tf.string)), signature="default", as_dict=True)["default"]
input_text = layers.Input(shape=(1,), dtype=tf.string)
embedding = layers.Lambda(UniversalEmbedding, output_shape=(embed_size,))(input_text)
dense = layers.Dense(256, activation='relu')(embedding)
pred = layers.Dense(category_counts, activation='softmax')(dense)
model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
sents1=list(input2.items())
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
# model.load_weights(.//)
for i,ch in enumerate(sents1):
new_text=ch[1]
if len(new_text)>1:
new_text = np.array(new_text, dtype=object)[:, np.newaxis]
predicts = model.predict(new_text, batch_size=32)
InvalidArgumentError: input must be a vector, got shape: [] [[{{node
lambda_2/module_1_apply_default/tokenize/StringSplit}} =
StringSplit[skip_empty=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](lambda_2/module_1_apply_default/RegexReplace_1,
lambda_2/module_1_apply_default/tokenize/Const)]]
Try removing trailing blanks at the start of the sentence.
new_text.strip()
USE preprocessed sentences by splitting on space, creating some empty lists from trailing spaces, which cannot be embedded.
(Hope this answer is not too late)
Also could be some missing values in sentences, without text. Need to exclude these.

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Resources