How to lower the last dimension of a Tensor? - pytorch

I have an immature question.
For example, I got a tensor with the size of: torch.Size([2, 1, 80, 64]).
I need to turn it into another tensor with the size of: torch.Size([2, 1, 80, 16]).
Are there any right ways to achieve that?

There exist many functions to achieve dimensionality reduction and the following are some examples:
randomly select 16 out of the 64 features
take the mean of every four features (64/4=16)
use a dimensionality reduction technique like PCA
apply a linear transformation
apply a convolution function
To give a satisfying answer, more information about why and what you want to do is necessary.

Answered by: #ptrblck_de
slice the tensor
y = x[..., :16]
print(y.shape)
# torch.Size([2, 1, 80, 16])
index it with a stride of 4
y = x[..., ::4]
print(y.shape)
# torch.Size([2, 1, 80, 16])
use any pooling (max, avg, etc.) layer (the same would also work using adaptive pooling layers)
pool = nn.MaxPool2d((1, 2), (1, 4))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
pool = nn.AdaptiveAvgPool2d(output_size=(80, 16))
y = pool(x)
print(y.shape)
# torch.Size([2, 1, 80, 16])
or manually reduce the last dimension with any reduction op (sum, mean, max, etc.)

Related

How to use affine_grid for batch tensor in pytorch?

Following the official tutorial on affine_grid, this line (inside function stn()):
grid = F.affine_grid(theta, x.size())
gives me an error:
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [10, 3] but got: [4000, 3].
using the input as follow:
from torchinfo import summary
model = Net()
summary(model, input_size=(10, 1, 256, 256))
theta is of size (4000, 2, 3) and x is of size (10, 1, 256, 256), how do I properly manipulate theta for correct dimension to use with batch tensor?
EDIT: I honestly don't see any mistake here, and the dimension is actually according the the affine_grid doc, is there something changed in the function itself?

(Torch tenor) Subtracting different dimension matrices

matrice1 = temp.unsqueeze(0)
print(M.shape)
matrice2 = M.permute(1, 0, 2, 3)
print(matrice2.shape)
print( torch.abs(matrice1 - matrice2).shape )
#torch.Size([1, 10, 3, 256])
#torch.Size([10, 1, 3, 256])
#torch.Size([10, 10, 3, 256])
I got the outcome above. I am wondering why the subtraction between two different dimension tensors make the outcome the tensor that has the shape as [10,10,3,256].
According to the broadcast semantics of PyTorch,The two tensors are "broadcastable" in your case, so they are automatically expanded to the equal size of torch.Size([10, 10, 3, 256]).

Ignore padding class (0) during multi class classification

I have a problem where given a set of tokens, predict another token. For this task I use an embedding layer with Vocab-size + 1 as input_size. The +1 is because the sequences are padded with zeros. Eg. given a Vocab-size of 10 000 and max_sequence_len=6, x_train looks like:
array([[ 0, 0, 0, 11, 22, 4],
[ 29, 6, 12, 29, 1576, 29],
...,
[ 0, 0, 67, 8947, 7274, 7019],
[ 0, 0, 0, 15, 10000, 50]])
y_train consists of integers between 1 and 10000, with other words, this becomes a multi-class classification problem with 10000 classes.
My problem: When I specify the output size in the output layer, I would like to specify 10000, but the model will predict the classes 0-9999 if I do this. Another approach is to set output size to 10001, but then the model can predict the 0-class (padding), which is unwanted.
Since y_train is mapped from 1 to 10000, I could remap it to 0-9999, but since they share mapping with the input, this seems like an unnecessary workaround.
EDIT:
I realize, and which #Andrey pointed out in the comments, that I could allow for 10001 classes, and simply add padding to the vocabulary, although I am never interested in the network predicting 0's.
How can I tell the model to predict on the labels 1-10000, whilst at the meantime have 10000 classes, not 10001?
I would use the following approach:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=())
x = tf.keras.layers.Embedding(10001, 512)(inputs) # input shape of full vocab size [10001]
x = tf.keras.layers.Dense(10000, activation='softmax')(x) # training weights based on reduced vocab size [10000]
z = tf.zeros(tf.shape(x)[:-1])[..., tf.newaxis]
x = tf.concat([z, x], axis=-1) # add constant zero on the first position (to avoid predicting 0)
model = tf.keras.Model(inputs=inputs, outputs=x)
inputs = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32)
labels = tf.random.uniform([10, 10], 0, 10001, dtype=tf.int32)
model.compile(loss='sparse_categorical_crossentropy')
model.fit(inputs, labels)
pred = model.predict(inputs) # all zero positions filled by 0 (which is minimum value)

What's the usage for convolutional layer that output is the same as the input applied with MaxPool

what's the idea behind when using the following convolutional layers?
especially for nn.Conv2d(16, 16, 3, padding = 1)
self.conv1 = nn.Conv2d(3, 16, 3, padding = 1 )
self.conv2 = nn.Conv2d(16, 16, 3, padding = 1)
self.conv3 = nn.Conv2d(16, 32, 3, padding = 1)
self.pool = nn.MaxPool2d(2, 2)
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))
x = F.relu(self.conv3(x))
I thought Conv2d always uses a bigger size like
from (16,32) to (32,64) for example.
Is nn.Conv2d(16, 16, 3, padding = 1) merely for reducing the size?
The model architecture all depends on finally what works best for your application, and it's always going to vary.
You are right in saying that usually, you want to make your tensors deeper (in the dimension of your channels) in order to extract richer features, but there is no hard and fast rule about that. Having said that, sometimes you don't want to make your tensors too big, since more the number of channels more the number of trainable parameters making it difficult for your model to train. This again brings me back to the very first line I said - "It all depends".
And as for the line:
nn.Conv2d(16, 16, 3, padding = 1) # stride = 1 by default.
This will keep the size of the tensor the same as the input in all 3 dimensions (height, width, and number of channels).
I will also add the formula to calculate size of output tensor in a convolution for reference.
output_size = ( (input_size - filter_size + 2*padding) / stride ) + 1

MNIST Tensorflow example

def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
This is the code from the Deep MNIST for experts tutorial on Tensorflow website.
I have two questions:
1) The documentation k-size is an integer list of length greater than 4 that refers to the size of the max-pool window. Shouldn't that be just [2,2] considering that it's a 2X2 window? I mean why is it [1, 2, 2, 1] instead of [2,2] ?
2) If we are taking a stride step on size one. Why do we need a vector of 4 values, wouldn't one value suffice?
strides = [1]
3) If padding = 'SAME' why does the image size decrease by half? ( from 28 X 28 to 14 X 14 in the first convolutional process )
I'm not sure which documentation you're referring to in this question. The maxpool window is indeed 2x2.
The step size can be different depending on the dimensions. The 4 vector is the most general case where suppose you wanted to skip images in the batch, skip different height and width and potentially even skip based on channels. This is hardly used but has been left in.
If you have a stride of 2 along each direction then you skip every other pixel that you could potentially use for max pooling. If you set the skip size to be [1,1,1,1] with padding same then you would indeed return a result of the same size. The padding "SAME" refers to zero padding the image such that you add a border of height kernel hieght and a width of size kernel width to the image.

Resources