Error(s) in loading state_dict for RobertaForSequenceClassification - python-3.x

I am using a fine-tuned Roberta Model that is unbiased-toxic-roberta trained on Jigsaw Data:
https://huggingface.co/unitary/unbiased-toxic-roberta
It is fine-tuned on 16 classes.
I am writing my code for binary classification:
Metrics to calculate loss on binary labels as accuracy
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
acc = np.sum(predictions == labels) / predictions.shape[0]
return {"accuracy" : acc}
import torch.nn as nn
model = tr.RobertaForSequenceClassification.from_pretrained("/home/pc/unbiased_toxic_roberta",num_labels=2)
model.to(device)
training_args = tr.TrainingArguments(
# report_to = 'wandb',
output_dir='/home/pc/1_Proj_hate_speech/results_roberta', # output directory
overwrite_output_dir = True,
num_train_epochs=20, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=32, # batch size for evaluation
learning_rate=2e-5,
warmup_steps=1000, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs3', # directory for storing logs
logging_steps=1000,
evaluation_strategy="epoch"
,save_strategy="epoch"
,load_best_model_at_end=True
)
trainer = tr.Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_data, # training dataset
eval_dataset=val_data, # evaluation dataset
compute_metrics=compute_metrics
)
When I run this, I get an error:
loading weights file /home/pc/unbiased_toxic_roberta/pytorch_model.bin
RuntimeError: Error(s) in loading state_dict for RobertaForSequenceClassification:
size mismatch for classifier.out_proj.weight: copying a param with shape torch.Size([16, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.out_proj.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([2]).
How can I add a linear layer and solve this error ?

Load with ignore_mismatched_sizes=True:
model = tr.RobertaForSequenceClassification.from_pretrained(
"/home/pc/unbiased_toxic_roberta",
num_labels=2,
ignore_mismatched_sizes=True)
then you can finetune the model.

Related

Pytorch - skip calculating features of pretrained models for every epoch

I am used to work with tenserflow - keras but now I am forced to start working with Pytorch for flexibility issues. However, I don't seem to find a pytorch code that is focused on training only the classifciation layer of a model. Is that not a common practice ? Now I have to wait out the calculation of the feature extraction of the same data for every epoch. Is there a way to avoid that ?
# in tensorflow - keras :
from tensorflow.keras.applications import vgg16, MobileNetV2, mobilenet_v2
# Load a pre-trained
pretrained_nn = MobileNetV2(weights='imagenet', include_top=False, input_shape=(Image_size, Image_size, 3))
# Extract features of the training data only once
X = mobilenet_v2.preprocess_input(X)
features_x = pretrained_nn.predict(X)
# Save features for later use
joblib.dump(features_x, "features_x.dat")
# Create a model and add layers
model = Sequential()
model.add(Flatten(input_shape=features_x.shape[1:]))
model.add(Dense(100, activation='relu', use_bias=True))
model.add(Dense(Y.shape[1], activation='softmax', use_bias=False))
# Compile & train only the fully connected model
model.compile( loss="categorical_crossentropy", optimizer=keras.optimizers.Adam(learning_rate=0.001))
history = model.fit( features_x, Y_train, batch_size=16, epochs=Epochs)
Assuming you already have the features ìn features_x, you can do something like this to create and train the model:
# create a loader for the data
dataset = torch.utils.data.TensorDataset(features_x, Y_train)
loader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
# define the classification model
in_features = features_x.flatten(1).size(1)
model = torch.nn.Sequential(
torch.nn.Flatten(),
torch.nn.Linear(in_features=in_features, out_features=100, bias=True),
torch.nn.ReLU(),
torch.nn.Linear(in_features=100, out_features=Y.shape[1], bias=False) # Softmax is handled by CrossEntropyLoss below
)
model.train()
# define the optimizer and loss function
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = torch.nn.CrossEntropyLoss()
# training loop
for e in range(Epochs):
for batch_x, batch_y in enumerate(loader):
optimizer.zero_grad() # clear gradients from previous batch
out = model(batch_x) # forward pass
loss = loss_function(out, batch_y) # compute loss
loss.backward() # backpropagate, get gradients
optimizer.step() # update model weights

Pytorch Resnet model error if FC layer is changed in Colab

If I simply import the Resnet Model from Pytorch in Colab, and use it to train my dataset, there are no issues. However, when I try to change the last FC layer to change the output features from 1000 to 9, which is the number of classes for my datasets, the following error is obtained.
RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)
Working version:
import torchvision.models as models
#model = Net()
model=models.resnet18(pretrained=True)
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
Version with error:
import torchvision.models as models
#model = Net()
model=models.resnet18(pretrained=True)
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.07)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
model.fc = torch.nn.Linear(512, 9)
Error occurs in the stage where training occurs, aka
outputs = model(images)
How should I go about fixing this issue?
Simple error, the fc layer should be instantiated before declaring model as cuda.
I.e
model=models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 9)
if torch.cuda.is_available():
model = model.cuda()

How to specify the loss function when finetuning a model using the Huggingface TFTrainer Class?

I have followed the basic example as given below, from: https://huggingface.co/transformers/training.html
from transformers import TFBertForSequenceClassification, TFTrainer, TFTrainingArguments
model = TFBertForSequenceClassification.from_pretrained("bert-large-uncased")
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
trainer = TFTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=tfds_train_dataset, # tensorflow_datasets training dataset
eval_dataset=tfds_test_dataset # tensorflow_datasets evaluation dataset
)
trainer.train()
But there seems to be no way to specify the loss function for the classifier. For-ex if I finetune on a binary classification problem, I would use
tf.keras.losses.BinaryCrossentropy(from_logits=True)
else I would use
tf.keras.losses.CategoricalCrossentropy(from_logits=True)
My set up is as follows:
transformers==4.3.2
tensorflow==2.3.1
python==3.6.12
Trainer has this capability to use compute_loss
For more you can look into the documentation:
https://huggingface.co/docs/transformers/main_classes/trainer#:~:text=passed%20at%20init.-,compute_loss,-%2D%20Computes%20the%20loss
Here is an example of how to customize Trainer to use a weighted loss (useful when you have an unbalanced training set):
from torch import nn
from transformers import Trainer
class CustomTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels")
# forward pass
outputs = model(**inputs)
logits = outputs.get("logits")
# compute custom loss (suppose one has 3 labels with different weights)
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 2.0, 3.0]))
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
return (loss, outputs) if return_outputs else loss
create a class which inherits from PreTrainedModel and then in it's forward function create your respective loss function.

Weird behavior when calling cuda() on different tensors in pytorch

I am trying to train a pytorch neural network on a GPU device. In order to do so, I load my inputs and network onto the default cuda enabled GPU decive. However, when I load my inputs, the model's weights do not stay cuda tensors. Here is my train function
def train(network: nn.Module, name: str, learning_cycles: dict, num_epochs):
# check we have a working gpu to train on
assert(torch.cuda.is_available())
# load model onto gpu
network = network.cuda()
# load train and test data with a transform
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=128,
shuffle=True, num_workers=2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=0.001, momentum=0.9)
for epoch in range(num_epochs):
for i, data in enumerate(train_loader, 0):
inputs, labels = data
# load inputs and labels onto gpu
inputs, labels = inputs.cuda(), labels.cuda()
optimizer.zero_grad()
outputs = network(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
When calling train, I get the following error.
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
Interestingly, when I delete the line inputs, labels = inputs.cuda(), labels.cuda() I get the error RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
I would very much like to train my network, and I have searched the internet to no avail. Any good ideas?
Given that a device mismatch crops up regardless of the device the inputs are on, it's likely that some of your model's parameters are not being moved over to the GPU when you call network = network.cuda(). You have model parameters on both the CPU and the GPU.
Post your model code. It's likely you have a Pytorch module in an incorrect container.
Lists of modules should be in a nn. ModuleList. Modules in a Python list will not transfer over. Compare
layers1 = [nn.Linear(256, 256), nn.Linear(256, 256), nn.Linear(256, 256)]
layers2 = nn.ModuleList([nn.Linear(256, 256), nn.Linear(256, 256), nn.Linear(256, 256)])
If you called model.cuda() on a model with the above two lines, the layers in layer1 would remain on the CPU, while the layers in layer2 would be moved to the GPU.
Similarly, a list of nn.Parameter objects should be contained in an nn.ParameterList object.
There's also nn. ModuleDict and nn.ParameterDict for dictionary containers.

vgg pytorch is probability distribution supposed to add up to 1?

I've trained a vgg16 model to predict 102 classes of flowers.
It works however now that I'm trying to understand one of it's predictions I feel it's not acting normally.
model layout
# Imports here
import os
import numpy as np
import torch
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import json
from pprint import pprint
from scipy import misc
%matplotlib inline
data_dir = 'flower_data'
train_dir = data_dir + '/train'
test_dir = data_dir + '/valid'
json_data=open('cat_to_name.json').read()
main_classes = json.loads(json_data)
main_classes = {int(k):v for k,v in classes.items()}
train_transform_2 = transforms.Compose([transforms.RandomResizedCrop(224),
transforms.RandomRotation(30),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()])
test_transform_2= transforms.Compose([transforms.RandomResizedCrop(224),
transforms.ToTensor()])
# TODO: Load the datasets with ImageFolder
train_data = datasets.ImageFolder(train_dir, transform=train_transform_2)
test_data = datasets.ImageFolder(test_dir, transform=test_transform_2)
# define dataloader parameters
batch_size = 20
num_workers=0
# TODO: Using the image datasets and the trainforms, define the dataloaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
num_workers=num_workers, shuffle=True)
vgg16 = models.vgg16(pretrained=True)
# Freeze training for all "features" layers
for param in vgg16.features.parameters():
param.requires_grad = False
import torch.nn as nn
n_inputs = vgg16.classifier[6].in_features
# add last linear layer (n_inputs -> 102 flower classes)
# new layers automatically have requires_grad = True
last_layer = nn.Linear(n_inputs, len(classes))
vgg16.classifier[6] = last_layer
import torch.optim as optim
# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()
# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = optim.SGD(vgg16.classifier.parameters(), lr=0.001)
pre_trained_model=torch.load("model.pt")
new=list(pre_trained_model.items())
my_model_kvpair=vgg16.state_dict()
count=0
for key,value in my_model_kvpair.items():
layer_name, weights = new[count]
my_model_kvpair[key] = weights
count+=1
# number of epochs to train the model
n_epochs = 6
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity
for epoch in range(1, n_epochs+1):
# keep track of training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
# model by default is set to train
vgg16.train()
for batch_i, (data, target) in enumerate(train_loader):
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = vgg16(data)
# calculate the batch loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update training loss
train_loss += loss.item()
if batch_i % 20 == 19: # print training loss every specified number of mini-batches
print('Epoch %d, Batch %d loss: %.16f' %
(epoch, batch_i + 1, train_loss / 20))
train_loss = 0.0
######################
# validate the model #
######################
vgg16.eval() # prep model for evaluation
for data, target in test_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = vgg16(data)
# calculate the loss
loss = criterion(output, target)
# update running validation loss
valid_loss += loss.item()
# print training/validation statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.dataset)
valid_loss = valid_loss/len(test_loader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch+1,
train_loss,
valid_loss
))
# save model if validation loss has decreased
if valid_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
torch.save(vgg16.state_dict(), 'model.pt')
valid_loss_min = valid_loss
testing on a single image
tensor = torch.from_numpy(test_image)
reshaped = tensor.permute(2, 0, 1).unsqueeze(0)
floatified = reshaped.to(torch.float32) / 255
vgg16(floatified)
>>>
tensor([[ 2.5686, -1.1964, -0.0872, -1.7010, -1.6669, -1.0638, 0.4515, 0.1124,
0.0166, 0.3156, 1.1699, 1.5374, 1.8720, 2.5184, 2.9046, -0.8241,
-1.1949, -0.5700, 0.8692, -1.0485, 0.0390, -1.3783, -3.4632, -0.0143,
1.0986, 0.2667, -1.1127, -0.8515, 0.7759, -0.7528, 1.6366, -0.1170,
-0.4983, -2.6970, 0.7545, 0.0188, 0.1094, 0.5002, 0.8838, -0.0006,
-1.7993, -1.3706, 0.4964, -0.3251, -1.7313, 1.8731, 2.4963, 1.1713,
-1.5726, 1.5476, 3.9576, 0.7388, 0.0228, 0.3947, -1.7237, -1.8350,
-2.0297, 1.4088, -1.3469, 1.6128, -1.0851, 2.0257, 0.5881, 0.7498,
0.0738, 2.0592, 1.8034, -0.5468, 1.9512, 0.4534, 0.7746, -1.0465,
-0.7254, 0.3333, -1.6506, -0.4242, 1.9529, -0.4542, 0.2396, -1.6804,
-2.7987, -0.6367, -0.3599, 1.0102, 2.6319, 0.8305, -1.4333, 3.3043,
-0.4021, -0.4877, 0.9125, 0.0607, -1.0326, 1.3186, -2.5861, 0.1211,
-2.3177, -1.5040, 1.0416, 1.4008, 1.4225, -2.7291]],
grad_fn=<ThAddmmBackward>)
sum([ 2.5686, -1.1964, -0.0872, -1.7010, -1.6669, -1.0638, 0.4515, 0.1124,
0.0166, 0.3156, 1.1699, 1.5374, 1.8720, 2.5184, 2.9046, -0.8241,
-1.1949, -0.5700, 0.8692, -1.0485, 0.0390, -1.3783, -3.4632, -0.0143,
1.0986, 0.2667, -1.1127, -0.8515, 0.7759, -0.7528, 1.6366, -0.1170,
-0.4983, -2.6970, 0.7545, 0.0188, 0.1094, 0.5002, 0.8838, -0.0006,
-1.7993, -1.3706, 0.4964, -0.3251, -1.7313, 1.8731, 2.4963, 1.1713,
-1.5726, 1.5476, 3.9576, 0.7388, 0.0228, 0.3947, -1.7237, -1.8350,
-2.0297, 1.4088, -1.3469, 1.6128, -1.0851, 2.0257, 0.5881, 0.7498,
0.0738, 2.0592, 1.8034, -0.5468, 1.9512, 0.4534, 0.7746, -1.0465,
-0.7254, 0.3333, -1.6506, -0.4242, 1.9529, -0.4542, 0.2396, -1.6804,
-2.7987, -0.6367, -0.3599, 1.0102, 2.6319, 0.8305, -1.4333, 3.3043,
-0.4021, -0.4877, 0.9125, 0.0607, -1.0326, 1.3186, -2.5861, 0.1211,
-2.3177, -1.5040, 1.0416, 1.4008, 1.4225, -2.7291])
>>>
5.325799999999998
given this as how I test it on a single image (and the model as usual is trained and tested on batches it returns a prediction matrix that doesn't seem to be normalized or add up to 1.
Is this normal?
I cannot tell with certainty without seeing your training code, but it's most likely your model was trained with cross-entropy loss and as such it outputs logits rather than class probabilities. You can turn them into proper probabilities by applying the softmax function.

Resources