How to get penultimate layer output of fastai text model? - python-3.x

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.7)
learn.fit_one_cycle(1, 1e-2)
I have trained fastai model as above. I can get prediction as below
preds, targets = learn.get_preds()
But instead I want penultimate layer embeddings of model learn (This practise is common for CNN models). Could you help me how to do it?

I'm not sure if you want a classifier but anyway...
learn.model gives you back the model architecture. Then learn.model[0] would be an encoder learn.model[1] the other part of the model.
Example:
To access first linear layer in SequentialEx (architecture below) you would do it using following command
learn.model[0].layers[0].ff.layers[0]
SequentialRNN(
(0): TransformerXL(
(encoder): Embedding(60004, 410)
(pos_enc): PositionalEncoding()
(drop_emb): Dropout(p=0.03)
(layers): ModuleList(
(0): DecoderLayer(
(mhra): MultiHeadRelativeAttention(
(attention): Linear(in_features=410, out_features=1230, bias=False)
(out): Linear(in_features=410, out_features=410, bias=False)
(drop_att): Dropout(p=0.03)
(drop_res): Dropout(p=0.03)
(ln): LayerNorm(torch.Size([410]), eps=1e-05, elementwise_affine=True)
(r_attn): Linear(in_features=410, out_features=410, bias=False)
)
(ff): SequentialEx(
(layers): ModuleList(
(0): Linear(in_features=410, out_features=2100, bias=True)
(1): ReLU(inplace)
(2): Dropout(p=0.03)
(3): Linear(in_features=2100, out_features=410, bias=True)
(4): Dropout(p=0.03)
(5): MergeLayer()
(6): LayerNorm(torch.Size([410]), eps=1e-05, elementwise_affine=True)
)
)
)

Related

How do i reset sentence_transformers or distilbert model weights to train from scratch?

ive been trying to reset the weights in a pretrained sentence transformer model but i cant seem to find any information about this and the code ive used before to reset the weights of a pytorch model also doesnt work.
For example, the following code doesn't work for both the transformers.DistilBertModel as well as when replacing the model with a sentence_transformer.SentenceTransformer('distilbert-base-nli-mean-tokens') model.
from transformers import DistilBertTokenizer, DistilBertModel
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
outputs = model(**inputs)
for i, layer in enumerate(model.encoder.layer):
model.encoder.layer[i].apply(model._init_weights)
Output:
AttributeError: 'DistilBertModel' object has no attribute 'encoder'
When trying to iterate through the models body like this:
from sentence_transformers import SentenceTransformer, SentencesDataset, losses
from sentence_transformers.readers import InputExample
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
for name, module in model.named_children():
print('resetting ', model[int(name)])
print(module._modules)
Output:
resetting Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
OrderedDict([('auto_model', DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(...........
.......etc.......
How do i access all of these layers and reset all of their weights?

Can we delete an attribute in RecursiveScriptModule (Pytorch)?

I have a script module like below
(this actually is a module of a pre-trained model saved as a RecursiveScriptModule in Pytorch):
RecursiveScriptModule(
original_name=out_Conv
(conv): RecursiveScriptModule(original_name=Conv2d)
(bn): RecursiveScriptModule(original_name=BatchNorm2d)
(act): RecursiveScriptModule(original_name=Sigmoid)
)
Could I delete the attribute 'act' in this module then save it as a new model?

How to save and load only particular layers of a neural network with PyTorch?

Bare Problem Statement:
I have trained a Model A, that consists of a feature Extractor FE and a classification head ACH.
I want to train a model B, that uses A's feature extractor FE and retrains it's own classification head BCH.
So far it's easy. Now I don't want to save the entire model B since the FE part of it is already saved in the model A. I only want to dump the BCH, and during inference
Load model A - do it's prediction
Load B's classification head BCH.
Swap the classification head ACH with BCH
Run prediction using this swapped state.
Reading pyTorches documentation it only talks about saving entire models. How can I achieve this?
End of problem statement
More details on the motivation of the problem:
I have a dataset of images that I want to classify, these images have can have several classes given to them. For example the same image can have the class of "Land Vehicle" (supercategory) and a class of "Car" (category) or a "Truck". Another image might have the class "Aerial Vehicle" and it can be a "Helicopter" or a "Plane".
Since the images and therefore most of the features should be the same, I wish to train one classifier for the supercategories, then freeze it's feature-extractor, and sort of transfer learn the same model for the categories using the pretrained feature extractor.
Since the weights of the feature extracting backbone is the same, I only want to save the weights of the classification head of the categories model, and thus save some precious computational resources.
In general, it's something usual to only want an access to the backbone of a model in order to reuse it for others purposes. You have several ways to perform this. But mostly, having in mind that saving a model checkpoint and loading it later means saving weights and biases and being able to load them correctly to the corresponding layers, you first need to know, from your model, what part do you want to save.
When you get the state of a model, you will obtain a dictionary. The keys will be the layers names and the values will be the weights and the biases. Let's see an example with an efficientnet classifier on how to only save the backbone of a model. Basically, an efficientnet, as in your example, is a backbone and a fully connected layer as a head, if you only want the backbone, you want every single layers, except the head that you'll fine tune later.
import torch
import torch.nn as nn
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_name("efficientnet-b0")
print(model)
It will print the model layers and some features, basic stuff.
EfficientNet(
(_conv_stem): Conv2dStaticSamePadding(
3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
(static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
)
(_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_blocks): ModuleList(
(0): MBConvBlock(
(_depthwise_conv): Conv2dStaticSamePadding(
32, 32, kernel_size=(3, 3), stride=[1, 1], groups=32, bias=False
(static_padding): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0)
)
(_bn1): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_se_reduce): Conv2dStaticSamePadding(
32, 8, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
(_se_expand): Conv2dStaticSamePadding(
8, 32, kernel_size=(1, 1), stride=(1, 1)
(static_padding): Identity()
)
...
Now what is interesting is the final layers of this model :
...
(_bn1): BatchNorm2d(1280, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
(_avg_pooling): AdaptiveAvgPool2d(output_size=1)
(_dropout): Dropout(p=0.2, inplace=False)
(_fc): Linear(in_features=1280, out_features=1000, bias=True)
(_swish): MemoryEfficientSwish()
Let's say we want to reuse this model backbone, except _fcsince we would like to use the weights on another model having the same backbone but a different head, not pre-trained. In this example I'll take the same backbone and add 3 heads :
class ThreeHeadEfficientNet(torch.nn.Module):
def __init__(self,nbClasses1,nbClasses2,nbClasses3,model="efficientnet-b0",dropout_p=0.2):
super(ThreeHeadEfficientNet, self).__init__()
self.NBC1 = nbClasses1
self.NBC2 = nbClasses2
self.NBC3 = nbClasses3
self.dropout_p = dropout_p
self._dropout_layer = torch.nn.Dropout(p=self.dropout_p)
self._head1 = torch.nn.Linear(1280,self.NBC1)
self._head2 = torch.nn.Linear(1280,self.NBC2)
self._head3 = torch.nn.Linear(1280,self.NBC3)
self.model = EfficientNet.from_name(model,include_top=False) #you can notice here, I'm not loading the head, only the backbone
def forward(self,x):
features = self.model(x)
res = features.flatten(start_dim=1)
res = self._dropout_layer(res)
res1 = self._head1(res)
res2 = self._head2(res)
res3 = self._head3(res)
return res1,res2,res3
You'll notice now, if you print this ThreeHeadsModel layers, the layers name have slightly changed from _conv_stem.weight to model._conv_stem.weight since the backbone is now stored in a attribute variable model. We'll thus have to process that otherwise the keys will mismatch, create a new state dictionary that matches the expected keys of this new model and containing the pretrained weights and biases :
pretrained_dict = model.state_dict() #pretrained model keys
model_dict = new_model.state_dict() #new model keys
processed_dict = {}
for k in model_dict.keys():
decomposed_key = k.split(".")
if("model" in decomposed_key):
pretrained_key = ".".join(decomposed_key[1:])
processed_dict[k] = pretrained_dict[pretrained_key] #Here we are creating the new state dict to make our new model able to load the pretrained parameters without the head.
new_model.load_state_dict(processed_dict, strict=False) #strict here is important since the heads layers are missing from the state, we don't want this line to raise an error but load the present keys anyway.
And finally, in new_model you should have your new model with a pretrained backbone and heads to fine tune.
Now you should be able to fix your issues :)
For more pytorch information, please also check the forum.

How to feed a image to Generator in GAN Pytorch

So, I'm training a DCGAN model in pytorch on celeba dataset (people). And here is the architecture of the generator:
Generator(
(main): Sequential(
(0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(11): ReLU(inplace=True)
(12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
(13): Tanh()
)
)
So after training, I want to check what generator outputs if I feed an occluded image like this:
(size: 64X64)
But as u might have guessed that the image has 3 channels and my generator accepts a latent vector of 100 channels at the starting, so what is the correct way to feed this image to the generator and check the output. (I'm expecting that the generator tries to generate only the occluded part of the image). If you want a reference code then try this demo file of pytorch. I have modified this file according to my own needs, so for referring, this will do the trick.
You just can't do that. As you said, your network expects 100 dimensional input which is normally sampled from standard normal distribution:
So the generator's job is to take this random vector and generate 3x64x64 image that is indistinguishable from real images. Input is a random 100 dimensional vector sampled from standard normal distribution. I don't see any way to input your image into the current network without modifying the architecture and retraining the new model. If you want to try a new model, you can change input to occluded images, apply some conv. / linear layers to reduce the dimensions to 100 then keep the rest of the network same. This way network will try to learn to generate images not from latent vector but from the feature vector extracted from occluded images. It may or may not work.
EDIT I've decided to give it a go and see if network can learn with this type of conditioned input vectors instead of latent vectors. I've used the tutorial example you've linked and added a couple of changes. First a new network for receiving input and reducing it to 100 dimensions:
class ImageTransformer(nn.Module):
def __init__(self):
super(ImageTransformer, self).__init__()
self.main = nn.Sequential(
nn.Conv2d(3, 1, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True)
)
self.linear = nn.Linear(32*32, 100)
def forward(self, input):
out = self.main(input).view(input.shape[0], -1)
return self.linear(out).view(-1, 100, 1, 1)
Just a simple convolution layer + relu + linear layer to map to 100 dimensions at the output. Note that you can try a much better network here as a better feature extractor, I just wanted to make a simple test.
fixed_input = next(iter(dataloader))[0][0:64, :, : ,:]
fixed_input[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
fixed_input = fixed_input.to(device)
This is how I modify the tensor to add a black patch over the input. Just sampled a batch to create a fixed input to track the process as it was done in the tutorial with a random vector.
# Create the generator
netG = Generator().to(device)
netD = Discriminator().to(device)
netT = ImageTransformer().to(device)
# Apply the weights_init function to randomly initialize all weights
# to mean=0, stdev=0.2.
netG.apply(weights_init)
netD.apply(weights_init)
netT.apply(weights_init)
# Print the model
print(netG)
print(netD)
print(netT)
Most of the steps are same, just created an instance of the new transformer network. Then finally, training loop is slightly modified where generator is not fed random vectors but it is given outputs of the new transformer network.
img_list = []
G_losses = []
D_losses = []
iters = 0
for epoch in range(num_epochs):
for i, data in enumerate(dataloader, 0):
############################
# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
###########################
## Train with all-real batch
netD.zero_grad()
transformed = data[0].detach().clone()
transformed[:, :, 20:44, 20:44] = torch.tensor(np.zeros((24,24), dtype = np.float32))
transformed = transformed.to(device)
real_cpu = data[0].to(device)
b_size = real_cpu.size(0)
label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
output = netD(real_cpu).view(-1)
errD_real = criterion(output, label)
errD_real.backward()
D_x = output.mean().item()
## Train with all-fake batch
fake = netT(transformed)
fake = netG(fake)
label.fill_(fake_label)
output = netD(fake.detach()).view(-1)
errD_fake = criterion(output, label)
errD_fake.backward()
D_G_z1 = output.mean().item()
errD = errD_real + errD_fake
optimizerD.step()
############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
label.fill_(real_label)
output = netD(fake).view(-1)
errG = criterion(output, label)
errG.backward()
D_G_z2 = output.mean().item()
optimizerG.step()
# Output training stats
if i % 50 == 0:
print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
% (epoch, num_epochs, i, len(dataloader),
errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
# Save Losses for plotting later
G_losses.append(errG.item())
D_losses.append(errD.item())
# Check how the generator is doing by saving G's output on fixed_noise
if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
with torch.no_grad():
fake = netT(fixed_input)
fake = netG(fake).detach().cpu()
img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
iters += 1
Training was somewhat okay in terms of loss reductions etc. Finally this is what I got after 5 epochs training:
So what does this result tell us? Since the generator's inputs were not randomly taken from a normal distribution, generator wasn't able to learn the distribution of faces to create varying range of output faces. And since the input is a conditioned feature vector, output images' range is limited. So in summary, random inputs are required for the generator even though it learned to remove patches :)

Get some layers in a pytorch model that is not defined by nn.Sequential

I have a network defined below.
class model_dnn_2(nn.Module):
def __init__(self):
super(model_dnn_2, self).__init__()
self.flatten = Flatten()
self.fc1 = nn.Linear(784, 200)
self.fc2 = nn.Linear(200, 100)
self.fc3 = nn.Linear(100, 100)
self.fc4 = nn.Linear(100, 10)
def forward(self, x):
x = self.flatten(x)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
x = F.relu(x)
x = self.fc4(x)
I would like to take the last two layers along with the relu functions. Using children method I get the following
>>> new_model = nn.Sequential(*list(model.children())[-2:])
>>> new_model
Sequential(
(0): Linear(in_features=100, out_features=100, bias=True)
(1): Linear(in_features=100, out_features=10, bias=True)
)
But I would like to have the Relu function present in between the layers-just like the original model, i.e the new model should be like:
>>> new_model
Sequential(
(0): Linear(in_features=100, out_features=100, bias=True)
(1): Relu()
(2): Linear(in_features=100, out_features=10, bias=True)
)
I think the children method of the model is using the class initialization to create the model and thus the problem arises.
How can I obtain the model?
The way you implemented your model, the ReLU activations are not layers, but rather functions. When listing sub-layers (aka "children") of your module you do not see the ReLUs.
You can change your implementation:
class model_dnn_2(nn.Module):
def __init__(self):
super(model_dnn_2, self).__init__()
self.layers = nn.Sequential(
nn.Flatten(),
nn.Linear(784, 200),
nn.ReLU(), # now you are using a ReLU _layer_
nn.Linear(200, 100),
nn.ReLU(), # this is a different ReLU _layer_
nn.Linear(100, 100),
nn.ReLU(),
nn.Linear(100, 10)
)
def forward(self, x):
y = self.layers(x)
return y
More on the difference between layers and functions can be found here.

Resources