I am using a modified Resnet18, with my own pooling function at the end of the Resnet.
Here is my code:
resnet = resnet18().cuda() #a modified resnet
class Model():
def __init__(self, model, pool):
self.model = model
self.pool= pool #my own pool class which has trainable layers
def forward(self, sample):
output = self.model(sample)
output = self.pool(output)
output = F.normalize(output, p=2, dim=1)
return output
Now, obviously I need to train not only the resnet part, but also the pool part.
But, when I check:
model = Model(model=resnet, pool= pool)
print(list(model.parameters()))
It gives:
AttributeError: 'Model' object has no attribute 'parameters'
Can anyone help?
You need you Model to inherit torch.nn.Module:
class Model(torch.nn.Module):
def __init__(self, model, pool):
super(Model, self).__init__()
...
Related
I'm trying to access model parameters using the internal ._parameters method. When I define the model as below, I get model parameters without any issue
model = nn.Linear(10, 10)
print(model._parameters)
However, when I use this method to get parameters of a model defined as a class, I get an empty OrderedDict().
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc = nn.Linear(10, 10)
def forward(self, x):
return self.fc(x)
model = MyModel()
print(model._parameters)
Is there a solution to this using ._parameters?
NOTE: I understand that using internal methods are frowned upon.
There are three types of objects in an nn.Module: tensors stored inside _parameters, buffers inside _buffers, and modules inside _modules. All three are private (indicated by the _ prefix), as such they are not meant to be used by the end-user.
The private nn.Module attribute _parameters is an OrderedDict containing parameters of the module ("parameters" as in nn.Parameters, not nn.Modules). That is the reason why it is empty in your example. Have a look at the following module instead:
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.p = nn.Parameter(torch.rand(10))
def forward(self, x):
return self.fc(x)
>>> model = MyModel()
>>> print(model._parameters)
OrderedDict([('p', Parameter containing:
tensor([8.5576e-01, 1.4343e-01, 3.2866e-04, 9.4876e-01, 4.4837e-01, 9.7725e-02,
2.7249e-01, 6.7258e-01, 5.6823e-01, 4.0484e-01], requires_grad=True))])
I understand that using internal methods are frowned upon.
Do not use _parameters. You should instead use the appropriate API for this use case, which is nn.Module.parameters:
for p in model.parameters():
print(p)
Each linear layer in the MyModel class is a module, and can be accessed using _modules. So you can access the parameters of your model using:
for module_key in model._modules.keys():
for param_key in model._modules[module_key]._parameters:
p = model._modules[module_key]._parameters[param_key]
I implemented my custom Bert Binary Classification Model class, by adding a classifier layer on top of Bert Model (attached below). However, the accuracy/metrics are significantly different when I train with the official BertForSequenceClassification model, which makes me wonder if I am missing somehting in my class.
Few Doubts I have:
While loading the official BertForSequenceClassification from_pretrained are the classifiers weight initialized as well from pretrained model or they are randomly initialized? Because in my custom class they are randomly initialized.
class MyCustomBertClassification(nn.Module):
def __init__(self, encoder='bert-base-uncased',
num_labels,
hidden_dropout_prob):
super(MyCustomBertClassification, self).__init__()
self.config = AutoConfig.from_pretrained(encoder)
self.encoder = AutoModel.from_config(self.config)
self.dropout = nn.Dropout(hidden_dropout_prob)
self.classifier = nn.Linear(self.config.hidden_size, num_labels)
def forward(self, input_sent):
outputs = self.encoder(input_ids=input_sent['input_ids'],
attention_mask=input_sent['attention_mask'],
token_type_ids=input_sent['token_type_ids'],
return_dict=True)
pooled_output = self.dropout(outputs[1])
# for both tasks
logits = self.classifier(pooled_output)
return logits
Each model tells you via a warning message which layers are randomly initialized when you use the method from_pretrained:
from transformers import BertForSequenceClassification
b = BertForSequenceClassification.from_pretrained('bert-base-uncased')
Output:
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
The difference between your implementation and the BertForSequenceClassification is that you do not use any pretrained weights at all. The method from_config does not load the pretrained weights from a state_dict:
import torch
from transformers import AutoModelForSequenceClassification, AutoConfig
b2 = AutoModelForSequenceClassification.from_config(AutoConfig.from_pretrained('bert-base-uncased'))
b3 = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
print("Does from_config provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b2.base_model.embeddings.word_embeddings.weight)))
print("Does from_pretrained provides pretrained weights: {}".format(torch.equal(b.bert.embeddings.word_embeddings.weight, b3.base_model.embeddings.word_embeddings.weight)))
Output:
Does from_config provides pretrained weights: False
Does from_pretrained provides pretrained weights: True
Therefore you probably want to change your class to:
class MyCustomBertClassification(nn.Module):
def __init__(self, encoder='bert-base-uncased',
num_labels=2,
hidden_dropout_prob=0.1):
super(MyCustomBertClassification, self).__init__()
self.config = AutoConfig.from_pretrained(encoder)
self.encoder = AutoModel.from_pretrained(encoder)
self.dropout = nn.Dropout(hidden_dropout_prob)
self.classifier = nn.Linear(self.config.hidden_size, num_labels)
def forward(self, input_sent):
outputs = self.encoder(input_ids=input_sent['input_ids'],
attention_mask=input_sent['attention_mask'],
token_type_ids=input_sent['token_type_ids'],
return_dict=True)
pooled_output = self.dropout(outputs[1])
# for both tasks
logits = self.classifier(pooled_output)
return logits
myB = MyCustomBertClassification()
print(torch.equal(b.bert.embeddings.word_embeddings.weight, myB.encoder.embeddings.word_embeddings.weight))
Output:
True
If my model contains only nn.Module layers such as nn.Linear, nn.DataParallel works fine.
x = torch.randn(100,10)
class normal_model(torch.nn.Module):
def __init__(self):
super(normal_model, self).__init__()
self.layer = torch.nn.Linear(10,1)
def forward(self, x):
return self.layer(x)
model = normal_model()
model = nn.DataParallel(model.to('cuda:0'))
model(x)
However, when my model contains a tensor operation such as the following
class custom_model(torch.nn.Module):
def __init__(self):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
self.weight = torch.ones(5,1, device='cuda:0')
def forward(self, x):
return self.layer(x) # self.weight
model = custom_model()
model = torch.nn.DataParallel(model.to('cuda:0'))
model(x)
It gives me the following error
RuntimeError: Caught RuntimeError in replica 1 on device 1. Original
Traceback (most recent call last): File
"/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py",
line 60, in _worker
output = module(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 541, in call
result = self.forward(*input, **kwargs) File "", line 7, in forward
return self.layer(x) # self.weight RuntimeError: arguments are located on different GPUs at
/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:277
How to avoid this error when we have some tensor operations in our model?
I have no experience with DataParallel, but I think it might be because your tensor is not part of the model parameters. You can do this by writing:
torch.nn.Parameter(torch.ones(5,1))
Note that you don't have to move it to the gpu when initializing, because now when you call model.to('cuda:0') this is done automatically.
I can imagine that DataParallel uses the model parameters to move them to the appropriate gpu.
See this answer for more on the difference between a torch tensor and torch.nn.Parameter.
If you don't want the tensor values to be updated by backpropagation during training, you can add requires_grad=False.
Another way that might work is to override the to method, and initialize the tensor in the forward pass:
class custom_model(torch.nn.Module):
def __init__(self):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
def forward(self, x):
return self.layer(x) # torch.ones(5,1, device=self.device)
def to(self, device: str):
new_self = super(custom_model, self).to(device)
new_self.device = device
return new_self
or something like this:
class custom_model(torch.nn.Module):
def __init__(self, device:str):
super(custom_model, self).__init__()
self.layer = torch.nn.Linear(10,5)
self.weight = torch.ones(5,1, device=device)
def forward(self, x):
return self.layer(x) # self.weight
def to(self, device: str):
new_self = super(custom_model, self).to(device)
new_self.device = device
new_self.weight = torch.ones(5,1, device=device)
return new_self
Adding to the answer from #Elgar de Groot since OP also wanted to freeze that layer. To do so you can still use torch.nn.Parameter but then you explicitly set requires_grad to false like this:
self.layer = torch.nn.Parameter(torch.ones(5,1))
self.layer.requires_grad = False
I just wanna to implement some trainable parameters in my model with Keras. In Pytorch, we can do it by using torch.nn.Parameter() like below:
self.a = nn.Parameter(torch.ones(8))
self.b = nn.Parameter(torch.zeros(16,8))
I think by doing this in pytorch it can add some trainable parameters into the model. And now I wanna to know, how to achieve similar operations in keras?
Any suggestions or advice are welcomed!
THX! :)
p.s. I just write a custom layer in Keras as below:
class Mylayer(Layer):
def __init__(self,input_dim,output_dim,**kwargs):
self.input_dim = input_dim
self.output_dim = output_dim
super(Mylayer,self).__init__(**kwargs)
def build(self):
self.kernel = self.add_weight(name='pi',
shape=(self.input_dim,self.output_dim),
initializer='zeros',
trainable=True)
self.kernel_2 = self.add_weight(name='mean',
shape=(self.input_dim,self.output_dim),
initializer='ones',
trainable=True)
super(Mylayer,self).build()
def call(self,x):
return x,self.kernel,self.kernel_2
and I wanna to know if I haven't change the tensor which pass through the layer, should I write the function def compute_output_shape() for necessary?
You need to create the trainable weights in a custom layer:
class MyLayer(Layer):
def __init__(self, my_args, **kwargs):
#do whatever you need with my_args
super(MyLayer, self).__init__(**kwargs)
#you create the weights in build:
def build(self, input_shape):
#use the input_shape to infer the necessary shapes for weights
#use self.whatever_you_registered_in_init to help you, like units, etc.
self.kernel = self.add_weight(name='kernel',
shape=the_shape_you_calculated,
initializer='uniform',
trainable=True)
#create as many weights as necessary for this layer
#build the layer - equivalent to self.built=True
super(MyLayer, self).build(input_shape)
#create the layer operation here
def call(self, inputs):
#do whatever operations are needed
#example:
return inputs * self.kernel #make sure the shapes are compatible
#tell keras about the output shape of your layer
def compute_output_shape(self, input_shape):
#calculate the output shape based on the input shape and your layer's rules
return calculated_output_shape
Now use your layer in the model.
If you are using eager execution on with tensorflow and creating a custom training loop, you can work pretty much the same way you do with PyTorch, and you can create weights outside layers with tf.Variable, passing them as parameters to the gradient calculation methods.
In Pytorch, we load the pretrained model as follows:
net.load_state_dict(torch.load(path)['model_state_dict'])
Then the network structure and the loaded model have to be exactly the same. However, is it possible to load the weights but then modify the network/add an extra parameter?
Note:
If we add an extra parameter to the model earlier before loading the weights, e.g.
self.parameter = Parameter(torch.ones(5),requires_grad=True)
we will get Missing key(s) in state_dict: error when loading the weights.
Let's create a model and save its' state.
class Model1(nn.Module):
def __init__(self):
super(Model1, self).__init__()
self.encoder = nn.LSTM(100, 50)
def forward(self):
pass
model1 = Model1()
torch.save(model1.state_dict(), 'filename.pt') # saving model
Then create a second model which has a few layers common to the first model. Load the states of the first model and load it to the common layers of the second model.
class Model2(nn.Module):
def __init__(self):
super(Model2, self).__init__()
self.encoder = nn.LSTM(100, 50)
self.linear = nn.Linear(50, 200)
def forward(self):
pass
model1_dict = torch.load('filename.pt')
model2 = Model2()
model2_dict = model2.state_dict()
# 1. filter out unnecessary keys
filtered_dict = {k: v for k, v in model1_dict.items() if k in model2_dict}
# 2. overwrite entries in the existing state dict
model2_dict.update(filtered_dict)
# 3. load the new state dict
model2.load_state_dict(model2_dict)