I am using Python 3.8 and PyTorch 1.7.1. I saw a code which defines a Conv2d layer as follows:
Conv2d(3, 6, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
The input 'X' being passed to it is a 4D tensor-
X.shape
# torch.Size([4, 3, 6, 6])
The output volume for this conv layer is:
c1(X).shape
# torch.Size([4, 6, 3, 3])
I am trying to use the formula to compute output spatial dimensions for any conv layer: O = ((W - K + 2P)/S) + 1, where W = spatial dimension of image, K = filter/kernel size, P = zero padding & S = stride.
For 'c1' conv layer, we get, W = 6, K = 3, S = 2 & P = 1. Using the formula, you get O = ((6 - 3 + (2 x 1)) / 2) + 1 = 5/2 + 1 = 3.5.
The output volume: (4, 6, 3, 3) since number of filters used = 6.
How is the spatial output from 'c1' then (3, 3)? What am I not getting?
Thanks!
How would you have half a pixel?
You're missing the floor function:
O = floor(((W - K + 2P)/S) + 1)
So the shape of the outputted maps is (3, 3).
Here's the complete formula (with dilation) for nn.Conv2d:
Related
I want to make a projection to the tensor of shape [197, 1, 768] to [197,1,128] in pytorch using nn.Conv()
You could achieve this using a wide flat kernel and/or combined with a specific stride. If you stick with a dilation of 1, then the input/output spatial dimension relation is given by:
out = [(2p + x - k)/s + 1]
Where p is the padding, k is the kernel size and s is the stride. [] detonates the whole part of the quantity.
Applied here you have:
128 = [(2p + 768 - k)/s + 1]
So you would get:
p = 2*p + 768 - (128-1)*s # one off
If you impose p = 0, and s = 6 you find k = 6
>>> project = nn.Conv2d(197, 197, kernel_size=(1, 6), stride=6)
>>> project(torch.rand(1, 197, 1, 768)).shape
torch.Size([1, 197, 1, 128])
Alternatively, a more straightforward - but different - approach is to learn a mapping using a fully connected layer:
>>> project = nn.Linear(768, 128)
>>> project(torch.rand(1, 197, 1, 768)).shape
torch.Size([1, 197, 1, 128])
You could use a kernel size and stride of 6, as that’s the factor between the input and output temporal size:
x = torch.randn(197, 1, 768)
conv = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=6, stride=6)
out = conv(x)
print(out.shape)
> torch.Size([197, 1, 128])
Solution Source
I'm trying to implement an autoencoder CNN. However, I have the following problem:
The last convolutional layer of my encoder is defined as follows:
Conv2d(128, 256, 3, padding=1, stride=2)
The input of this layer has shape (1, 128, 24, 24). Thus, the output has shape (1, 256, 12, 12).
After this layer, I have ReLU activation and BatchNorm. Neither of these changes the shape of the output.
Then I have a first ConvTranspose2d layer defined as:
ConvTranspose2d(256, 128, 3, padding=1, stride=2)
But the output of this layer has shape (1, 128, 23, 23).
As far as I know, if we use the same kernel size, stride, and padding in ConvTrapnpose2d as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
So, my question is: what is wrong with my understanding? And how can I fix this issue?
I would first like to note that the nn.ConvTranspose2d layer is not the inverse of nn.Conv2d as explained in its documentation page:
it is not an actual deconvolution operation as it does not compute a true inverse of convolution
As far as I know, if we use the same kernel size, stride, and padding in ConvTranspose2d as in the preceding Conv2d layer, then the output of this 2 layers block must have the same shape as its input.
This is not always true! It depends on the input spatial dimensions.
In terms of spatial dimensions the 2D convolution will output:
out = [(x + 2p - d(k - 1) - 1)/s + 1]
where [x] is the whole part of x.
while the 2D transpose convolution will output:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x = input_dimension, out = output_dimension, k = kernel_size, s = stride, d = dilation, p = padding, and op = output_padding.
If you look at the convT o conv operator (i.e. convT(conv(x))) then you have:
out = (out_conv - 1)s - 2p + d(k - 1) + op + 1
= ([(x + 2p - d(k - 1) - 1)/s + 1] - 1)s - 2p + d(k - 1) + op + 1
Which equals to x only if we have [(x + 2p - d(k - 1) - 1)/s + 1] = (x + 2p - d(k - 1) - 1)/s + 1, that is: if x is odd, in this case:
out = ((x + 2p - d(k - 1) - 1)/s + 1 - 1)s - 2p + d(k - 1) + op + 1
= x + op
And out = x when op = 0.
Otherwise if x is even then:
out = x - 1 + op
And setting op = 1 gives out = x.
Here is an example:
>>> conv = nn.Conv2d(1, 1, 3, stride=2, padding=1)
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1)
>>> convT(conv(torch.rand(1, 1, 25, 25))).shape # x even
(1, 1, 25, 25) #<- out = x
>>> convT = nn.ConvTranspose2d(1, 1, 3, stride=2, padding=1, output_padding=1)
>>> convT(conv(torch.rand(1, 1, 24, 24))).shape # x odd
(1, 1, 24, 24) #<- out = x - 1 + op
I am using TF2.5 & Python3.8 where a conv layer is defined as:
Conv2D(
filters = 64, kernel_size = (3, 3),
activation='relu', kernel_initializer = tf.initializers.GlorotNormal(),
strides = (1, 1), padding = 'same',
)
Using a batch of 60 CIFAR-10 dataset as input:
x.shape
# TensorShape([60, 32, 32, 3])
Output volume of this layer preserves the spatial width and height (32, 32) and has 64 filters/kernel maps applied to the 60 images as batch-
conv1(x).shape
# TensorShape([60, 32, 32, 64])
I understand this output.
Can you explain the output of:
conv1.trainable_weights[0].shape
# TensorShape([3, 3, 3, 64])
This is the formula used to compute the number of trainable parameters in a conv layer = [{(m x n x d) + 1} x k]
where,
m -> width of filter; n -> height of filter; d -> number of channels in input volume; k -> number of filters applied in current layer.
The 1 is added as bias for each filter. But in case of TF2.X, for a conv layer, the bias term is set to False. Therefore, it doesn't appear as in formula.
I am working on an inference model of a pytorch onnx model which is why this question is being asked.
Assume, I have a image with dimensions 32 x 32 x 3 (CIFAR-10 dataset). I pass it through a Conv2d with dimensions : 3 x 192 x 5 x 5. The command I used is: Conv2d(3, 192, kernel_size=5, stride=1, padding=2)
Using the formula (stated here for reference pg12 https://arxiv.org/pdf/1603.07285.pdf) I should be getting an output image with dimensions 28 x 28 x 192 (input - kernel + 1 = 32 - 5 + 1).
Question is how has PyTorch implemented this 4d tensor 3 x 192 x 5 x 5 to get me an output of 28 x 28 x 192 ? The layer is a 4d tensor and the input image is a 2d one.
How is the kernel (5x5) spread in the image matrix 32 x 32 x 3 ? What does the kernel convolve with first -> 3 x 192 or 32 x 32?
Note : I have understood the 2d aspects of things. I am asking the above questions in 3 or more.
The input to Conv2d is a tensor of shape (N, C_in, H_in, W_in) and the output is of shape (N, C_out, H_out, W_out), where N is the batch size (number of images), C is the number of channels, H is the height and W is the width. The output height and width H_out, W_out are computed as follows (ignoring the dilation):
H_out = (H_in + 2*padding[0] - kernel_size[0]) / stride[0] + 1
W_out = (W_in + 2*padding[1] - kernel_size[1]) / stride[1] + 1
See cs231n for an explanation of how this formulas were obtained.
In your example N=1, H_in = 32, W_in = 32, C_in = 3, kernel_size = (5, 5), strides = (1, 1), padding = (0, 0), giving H_out = 28, W_out = 28.
The C_out=192 means that there are 192 different filters, each of shape (C_in, kernel_size[0], kernel_size[1]) = (3, 5, 5). Each filter independently performs convolution with the input image resulting in a 2D tensor of shape (H_out, W_out) = (28, 28), and since there are C_out = 192 filters and N = 1 images, the final output is of shape (N, C_out, H_out, W_out) = (1, 192, 28, 28).
To understand how exactly the convolution is performed see the convolution demo.
Consider you are given n data points in the form of list of tuples like S=[(x1,y1),(x2,y2),(x3,y3),(x4,y4),(x5,y5),..,(xn,yn)] and a point P=(p,q)
your task is to find 5 closest points(based on cosine distance) in S from P
Ex:
S= [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1)(6,0),(1,-1)]
P= (3,-4)
I have tried with below code
import math
data = [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1)(6,0),(1,-1)]
data.sort(key=lambda x: math.sqrt((float(x.split(",")[0]) - 3)**2 +
(float(x.split(",")[1]) -(-4))**2))
print(data)
I should get 5 closest points in S from P.
You have a missing comma in the defenition of data
You have a list of tuples but for some reason you used split as if it was a list of strings.
If you fix these 2 errors it works. You just need to grab the first 5 elements from data:
import math
data = [(1, 2), (3, 4), (-1, 1), (6, -7), (0, 6), (-5, -8), (-1, -1), (6, 0), (1, -1)]
data.sort(key=lambda x: math.sqrt((float(x[0]) - 3) ** 2 +
(float(x[1]) - (-4)) ** 2))
print(data[:5])
Outputs
[(1, -1), (6, -7), (-1, -1), (6, 0), (1, 2)]
(Next time, if you get an error please explain it in your question)
cosine_dist = []
for a, b in S:
num = a * P[0] + b * P[1]
den = math.sqrt(a * a + b * b) * math.sqrt(P[0] * P[0] + P[1] * P[1])
cosine_dist.append(math.acos(num/den))
X = cosine_dist
Y = [S for S in sorted(zip(S,X), key=lambda i:i[1])]
k = Y[:5]
for i, j in k:
print(i)
P = (3, -4)
S = [(1, 2), (3, 4), (-1, 1), (6, -7), (0, 6), (-5, -8), (-1, -1), (6, 0), (1, -1)]