Convert tensor of size 768 to 128 - pytorch

I want to make a projection to the tensor of shape [197, 1, 768] to [197,1,128] in pytorch using nn.Conv()

You could achieve this using a wide flat kernel and/or combined with a specific stride. If you stick with a dilation of 1, then the input/output spatial dimension relation is given by:
out = [(2p + x - k)/s + 1]
Where p is the padding, k is the kernel size and s is the stride. [] detonates the whole part of the quantity.
Applied here you have:
128 = [(2p + 768 - k)/s + 1]
So you would get:
p = 2*p + 768 - (128-1)*s # one off
If you impose p = 0, and s = 6 you find k = 6
>>> project = nn.Conv2d(197, 197, kernel_size=(1, 6), stride=6)
>>> project(torch.rand(1, 197, 1, 768)).shape
torch.Size([1, 197, 1, 128])
Alternatively, a more straightforward - but different - approach is to learn a mapping using a fully connected layer:
>>> project = nn.Linear(768, 128)
>>> project(torch.rand(1, 197, 1, 768)).shape
torch.Size([1, 197, 1, 128])

You could use a kernel size and stride of 6, as that’s the factor between the input and output temporal size:
x = torch.randn(197, 1, 768)
conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=6, stride=6)
out = conv(x)
print(out.shape)
> torch.Size([197, 1, 128])
Solution Source

Related

Keras ZeroPadding

All I have a input layer and I am trying to do zero padding to make it a specific dimension of TensorShape([1, 1, 104, 24])
import tensorflow as tf
import dumpy as np
input_shape = (1, 1, 1, 24)
x = np.arange(np.prod(input_shape)).reshape(input_shape) # (1, 1, 1, 24)
y = tf.keras.layers.ZeroPadding2D(padding=(0, 51))(x)
y.shape # TensorShape([1, 1, 103, 24])
# how do I make the y.shape --> [1, 1, 104, 24]??
How do I change the param of the y so that I can have a shape of [1,1,104, 24]?
You are using:
padding = (0, 51)
This means: (symmetric_height_pad, symmetric_width_pad).
I.e. you are adding 51 zeros to left and right to you number, hence you get 51 + 1 + 51 = 103 items. To get 104, you can add for example 51 to left and 52 to right using:
padding=((0, 0), (51, 52))
Here the numbers mean: ((top_pad, bottom_pad), (left_pad, right_pad))

Pytorch - Selecting n indices without replacement from dimension x

Suppose I have the following embeddings emb_user = torch.randn(64, 128, 256). From the second dimension (of length 128), I wish to pick out 16 at random at each instance. I was wondering if there was a more efficient way of doing the following:
idx = torch.multinomial(torch.ones(64, 128), 16)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
What I also find curios is that the above multinomial would not work if the weight matrix (torch.ones(64, 128)) exceeded more than 2 dimensions.
Since in your case you want an uniform distribution you could speed it up with
idx = torch.sort(torch.randint(
0, 128 - 15, (64, 16), device=device
), axis=1).values + torch.arange(0, 16, device=device).reshape(1, -1)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
Instead of
idx = torch.multinomial(torch.ones(64, 128, device=device), 16)
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
The runtimes on my machine are 427 µs and 784 µs with device='cpu'; 135 µs and 260 µs and 469 µs with device='cuda'.
How it works?
The sorted randint gives the indices for a multinomial distribution with replacement. That is increasing, adding the arange term makes it strictly increasing, thus eliminates the replacements.
Illustrating with a small case
idx = torch.sort(torch.randint(0, 7, (4,))).values
print('Indices with replacement in the range from 0 to 6: ', idx)
print('Indices without replacement in the slice: ', idx + torch.arange(4))
Indices with replacement in the range from 0 to 6: tensor([0, 5, 5, 6])
Indices without replacement in the slice: tensor([0, 6, 7, 9])
A possibly faster solution, but not from exactly the same distribution is the following:
idx = torch.cumsum(torch.diff(
torch.sort(torch.randint(
0, 128 - 16, (64, 17), device=device
), axis=1).values
, axis=1) + 1, axis=1) - 1
sampled_emb_user = emb_user[torch.arange(len(emb_user)).unsqueeze(-1), idx]
One more way, I expect to be closer to the exact method, not very rigorously analyzed.
# 1-rand() to include 1 and exclude zero.
d = torch.cumsum(1 - torch.rand(64, 17, device=device
), axis=1)
# this produces a sorted tensor with values in the range [0:128-16]
d = (((128 - 15) * d[:, :-1]) / d[:, -1:]).to(torch.long)
idx = d + torch.arange(0, 16, device=device).reshape(1, -1)
But in the end it tends to be slower than the method using sort.

Convolution and convolution transposed do not cancel each other

I'm trying to implement an autoencoder CNN. However, I have the following problem:
The last convolutional layer of my encoder is defined as follows:
Conv2d(128, 256, 3, padding=1, stride=2)
The input of this layer has shape (1, 128, 24, 24). Thus, the output has shape (1, 256, 12, 12).
After this layer, I have ReLU activation and BatchNorm. Neither of these changes the shape of the output.
Then I have a first ConvTranspose2d layer defined as:
ConvTranspose2d(256, 128, 3, padding=1, stride=2)
But the output of this layer has shape (1, 128, 23, 23).
As far as I know, if we use the same kernel size, stride, and padding in ConvTrapnpose2d as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
So, my question is: what is wrong with my understanding? And how can I fix this issue?
I would first like to note that the nn.ConvTranspose2d layer is not the inverse of nn.Conv2d as explained in its documentation page:
it is not an actual deconvolution operation as it does not compute a true inverse of convolution
As far as I know, if we use the same kernel size, stride, and padding in ConvTranspose2d as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
This is not always true! It depends on the input spatial dimensions.
In terms of spatial dimensions the 2D convolution will output:
out = [(x + 2p - d(k - 1) - 1)/s + 1]
where [x] is the whole part of x.
while the 2D transpose convolution will output:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x = input_dimension, out = output_dimension, k = kernel_size, s = stride, d = dilation, p = padding, and op = output_padding.
If you look at the convT o conv operator (i.e. convT(conv(x))) then you have:
out = (out_conv - 1)s - 2p + d(k - 1) + op + 1
= ([(x + 2p - d(k - 1) - 1)/s + 1] - 1)s - 2p + d(k - 1) + op + 1
Which equals to x only if we have [(x + 2p - d(k - 1) - 1)/s + 1] = (x + 2p - d(k - 1) - 1)/s + 1, that is: if x is odd, in this case:
out = ((x + 2p - d(k - 1) - 1)/s + 1 - 1)s - 2p + d(k - 1) + op + 1
= x + op
And out = x when op = 0.
Otherwise if x is even then:
out = x - 1 + op
And setting op = 1 gives out = x.
Here is an example:
>>> conv = nn.Conv2d(1, 1, 3, stride=2, padding=1)
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1)
>>> convT(conv(torch.rand(1, 1, 25, 25))).shape # x even
(1, 1, 25, 25) #<- out = x
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1, output_padding=1)
>>> convT(conv(torch.rand(1, 1, 24, 24))).shape # x odd
(1, 1, 24, 24) #<- out = x - 1 + op

How to change parameters of pre-trained longformer model from huggingface

I am using Hugging-face pre-trained LongformerModel model. I am using to extract embedding for sentence. I want to change the token length, max sentence length parameter but I am not able to do so. Here is the code.
model = LongformerModel.from_pretrained('allenai/longformer-base-4096',output_hidden_states = True)
tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
model.eval()
text=[" I like to play cricket"]
input_ids = torch.tensor(tokenizer.encode(text,max_length=20,padding=True,add_special_tokens=True)).unsqueeze(0)
print(tokenizer.encode(text,max_length=20,padding=True,add_special_tokens=True))
# [0, 38, 101, 7, 310, 5630, 2]
I expected encoder to give me list of size 20 with padding as I have passed a parameter max_length=20. But it returned list of size 7 only?
attention_mask = torch.ones(input_ids.shape, dtype=torch.long, device=input_ids.device)
attention_mask[:, [0,-1]] = 2
outputs = model(input_ids, attention_mask=attention_mask, return_dict=True)
hidden_states = outputs[2]
print ("Number of layers:", len(hidden_states), " (initial embeddings + 12 BERT layers)")
layer_i = 0
print ("Number of batches:", len(hidden_states[layer_i]))
batch_i = 0
print ("Number of tokens:", len(hidden_states[layer_i][batch_i]))
token_i = 0
print ("Number of hidden units:", len(hidden_states[layer_i][batch_i][token_i]))
Output:
Number of layers: 13 (initial embeddings + 12 BERT layers)
Number of batches: 1
Number of tokens: 512 # How can I change this parameter to pick up my sentence length during run-time
Number of hidden units: 768
How can I reduce number of tokens to sentence length instead of 512 ? Every-time I input a new sentence, it should pick up that length.
Question regarding padding
padding=True pads your input to the longest sequence. padding=max_length pads your input to the specified max_length (documentation):
from transformers import LongformerTokenizer
tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
text=[" I like to play cricket"]
print(tokenizer.encode(text[0],max_length=20,padding='max_length',add_special_tokens=True))
Output:
[0, 38, 101, 7, 310, 5630, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Question regarding the number of tokens of the hidden states
The Longformer implementation applies padding to your sequence to match the attention window sizes. You can see the size of the attention windows in your model config:
model.config.attention_window
Output:
[512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512]
This is the corresponding code line: link.

What's the usage for convolutional layer that output is the same as the input applied with MaxPool

what's the idea behind when using the following convolutional layers?
especially for nn.Conv2d(16, 16, 3, padding = 1)
self.conv1 = nn.Conv2d(3, 16, 3, padding = 1 )
self.conv2 = nn.Conv2d(16, 16, 3, padding = 1)
self.conv3 = nn.Conv2d(16, 32, 3, padding = 1)
self.pool = nn.MaxPool2d(2, 2)
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))
x = F.relu(self.conv3(x))
I thought Conv2d always uses a bigger size like
from (16,32) to (32,64) for example.
Is nn.Conv2d(16, 16, 3, padding = 1) merely for reducing the size?
The model architecture all depends on finally what works best for your application, and it's always going to vary.
You are right in saying that usually, you want to make your tensors deeper (in the dimension of your channels) in order to extract richer features, but there is no hard and fast rule about that. Having said that, sometimes you don't want to make your tensors too big, since more the number of channels more the number of trainable parameters making it difficult for your model to train. This again brings me back to the very first line I said - "It all depends".
And as for the line:
nn.Conv2d(16, 16, 3, padding = 1) # stride = 1 by default.
This will keep the size of the tensor the same as the input in all 3 dimensions (height, width, and number of channels).
I will also add the formula to calculate size of output tensor in a convolution for reference.
output_size = ( (input_size - filter_size + 2*padding) / stride ) + 1

Resources