How to use affine_grid for batch tensor in pytorch? - pytorch

Following the official tutorial on affine_grid, this line (inside function stn()):
grid = F.affine_grid(theta, x.size())
gives me an error:
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [10, 3] but got: [4000, 3].
using the input as follow:
from torchinfo import summary
model = Net()
summary(model, input_size=(10, 1, 256, 256))
theta is of size (4000, 2, 3) and x is of size (10, 1, 256, 256), how do I properly manipulate theta for correct dimension to use with batch tensor?
EDIT: I honestly don't see any mistake here, and the dimension is actually according the the affine_grid doc, is there something changed in the function itself?

Related

how do I solve ValueError in Tensorflow?

I am running a cnn in Google colab and i am using tensorflow or Keras. However I received this feedback
Negative dimension size caused by subtracting 3 from 2 for '{{node conv2d_11/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], explicit_paddings=[], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](Placeholder, conv2d_11/Conv2D/ReadVariableOp)' with input shapes: [?,2,2,394], [3,3,394,394].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 2, 2, 394), dtype=float32)
does this have to do with my input data or my parameters? Thanks
Try inserting a number instead of using None when specifying the shape. As said in the documentation here, you run across not-fully-specified shapes when using None.
in this case you have defined model which consists of MaxPool layers or AvgPool layers alot, so by the images pass through model layers, images size will be decreased; i think it would be helpful if you set padding parameter in convolution layer to same and for more details you could read about conv layers parameters strides and padding.

Need help transforming a pytorch tensor

Getting this error when trying to run a version of U-net, cannot just alter the shape of the tensor, will I need to change the actual model itself?
Expected 5-dimensional input for 5-dimensional weight [16, 1, 5, 5, 5], but got 4-dimensional input of size [4, 320, 320, 24] instead
I think your model was trained on 3-D data, expecting a batch of 1-channel 3-D volume. I think your input data "squeezed" the 1-channel dimension from the 4-batch input you have of shape 4x320x320x24. Try unsqueezeing the missing dimension:
x = x.unsqueeze(dim=1) # for x of shape [4, 320, 320, 24]

pytorch modifying the input data to forward to make it suitable to my model

Here is what I want to do.
I have an individual data of shape (20,20,20) where 20 tensors of shape (1,20,20) will be used as an input for 20 separate CNN. Here's the code I have so far.
class MyModel(torch.nn.Module):
def __init__(self, ...):
...
self.features = nn.ModuleList([nn.Sequential(
nn.Conv2d(1,10, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.Conv2d(10, 14, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(14, 18, kernel_size=3, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(28*28*18, 256)
) for _ in range(20)])
self.fc_module = nn.Sequential(
nn.Linear(256*n_selected, cnn_output_dim),
nn.Softmax(dim=n_classes)
)
def forward(self, input_list):
concat_fusion = cat([cnn(x) for x,cnn in zip(input_list,self.features)], dim = 0)
output = self.fc_module(concat_fusion)
return output
The shape of the input_list in forward function is torch.Size([100, 20, 20, 20]), where 100 is the batch size.
However, there's an issue with
concat_fusion = cat([cnn(x) for x,cnn in zip(input_list,self.features)], dim = 0)
as it results in this error.
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [10, 1, 3, 3], but got 3-dimensional input of size [20, 20, 20] instead
First off, I wonder why it expects me to give 4-dimensional weight [10,1,3,3]. I've seen
"RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 3 3, but got 3-dimensional input of size [3, 224, 224] instead"?
but I'm not sure where those specific numbers are coming from.
I have an input_list which is a batch of 100 data. I'm not sure how I can deal with individual data of shape (20,20,20) so that I can actually separate this into 20 pieces to use it as an independent input to 20 CNN.
why it expects me to give 4-dimensional weight [10,1,3,3].
Note the following log means the nn.Conv2d with kernel (10, 1, 3, 3) requiring a 4 dimensional input.
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [10, 1, 3, 3]
How to separate input into 20 pieces along channels.
Iteration over input_list(100, 20, 20, 20) produces 100 tensors of shape (20, 20, 20).
If you want to split input along channel, try to slice input_list along second dimension.
concat_fusion = torch.cat([cnn(input_list[:, i:i+1]) for i, cnn in enumerate(self.features)], dim = 1)

Understanding matrix dimesions in printing CNN Heatmaps (for Class Activation Map" (CAM))

This question is related to computing the Class Activation Map (CAM) visualization.
Source code is at [ln [24 and onwards]: See ln [24] onwards and code snapshot pasted below
Model.summary()
The last convolutional layer is block5_conv3 with dimensions (14, 14, 512) and it predicts 1000 classes.
My questions are related to lines of code in this screenshot.
I included the lines of code separately also
In this line of code:
african_elephant_output = model.output[:, 386]
This model was trained on 1000 classes (the last line in the output of model.summary()). To understand the gradient calculation at a later step, I first want to understand how to print the length of vector african_elephant_outputand also the actual values in this feature vector
last_conv_layer = model.get_layer('block5_conv3')
grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]
The dimensions of last_conv_layer are (14, 14, 512). But to understand how the dot product calculated by K.gradients, I need to know the dimensions of african_elephant_output. Should these dimensions of African_elephant output be: (1, 1, 512) so that we are able to first broadcast African_elephant output and then calculated the dot product of corresponding channels? How can I print the dimensions of african_elephant_output and the dimensions and values of grads?
What does axis = (0, 1, 2) refer to this line of code:
pooled_grads = K.mean(grads, axis=(0, 1, 2))
I am assuming the grads vector in #2 above is of shape (14, 14, 512) so the axis values of 0, 1, 2 refer to width (referred to by 0), height (referred to by 1) and the channel dimension ((referred to by 2). So that the mean is calculated along the width and height and we get a vector of shape (512, ) ?

Understanding input shape to PyTorch conv1D?

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.
I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.
So my input tensor to conv1D is of shape [6, 512, 768].
input = torch.randn(6, 512, 768)
Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.
Understanding 1:
I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where
in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)
But with this assumption, I get the following error:
RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead
Understanding 2:
Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where
in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(512, 100, 2)
feature_map = convolution_layer(input)
This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?
I will be really grateful for your help.
In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].
Given an input of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose.
input = input.transpose(1, 2).contiguous()
The .contiguous() ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.
I found an answer to it (source).
So, usually, BERT outputs vectors of shape
[batch_size, sequence_length, embedding_dim].
where,
sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).
thus, input = torch.randn(batch_size, 512, 768)
Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.
So, we define a PyTorch conv1D layer as follows,
convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)
where,
in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)
thus, convolution_layer = nn.conv1d(768, 100, 2)
Now we need a connecting link between the expected input by convolution_layer and the actual input.
For this, we require to
current input shape [batch_size, 512, 768]
expected input [batch_size, 768, 512]
To achieve this expected input shape, we need to use the transpose function from PyTorch.
input_transposed = input.transpose(1, 2)
I have a suggestion for you which may not be what you asked for but might help. Because your input is (6, 512, 768) you can use conv2d instead of 1d.
All you need to do is to add a dimension of 1 at index 1: input.unsqueeze(1) which works as your channel (consider it as a grayscale image)
def forward(self, x):
x = self.embedding(x) # [Batch, seq length, Embedding] = [5, 512, 768])
x = torch.unsqueeze(x, 1) # [5, 1, 512, 768]) # like a grayscale image
and also for your conv2d layer, you can define like this:
window_size=3 # for trigrams
EMBEDDING_SIZE = 768
NUM_FILTERS = 10 # or whatever you want
self.conv = nn.Conv2d(in_channels = 1,
out_channels = NUM_FILTERS,
kernel_size = [window_size, EMBEDDING_SIZE],
padding=(window_size - 1, 0))```

Resources