PyTorch grid_sample returns zero array - pytorch

I want to sample rectangular patch from my image by affine_grid/grid_sample
I created array which contains only 255 values
canvas1 = np.zeros((128, 128), dtype=np.uint8)
canvas1[:] = 255
Also i created grid
theta = torch.FloatTensor([[
[11/2, 0, 63],
[0, 11/2, 63],
]])
grid = F.affine_grid(theta, (1, 1, 11, 11))
Grid contains values like
[[57.5000, 57.5000],
[58.6000, 57.5000],
[59.7000, 57.5000],
[60.8000, 57.5000],
[61.9000, 57.5000],
[63.0000, 57.5000],
[64.1000, 57.5000],
[65.2000, 57.5000],
[66.3000, 57.5000],
[67.4000, 57.5000],
[68.5000, 57.5000]],
...............
After that i called grid_sample
canvas1_torch = torch.FloatTensor(canvas1.astype(np.float32))
canvas1_torch = canvas1_torch.unsqueeze(0).unsqueeze(0)
sampled = F.grid_sample(canvas1_torch, grid, mode="bilinear")
Unfortunately sampled contains zero values (but canvas1_torch[0, 0, 63, 65]) is 255
What i am doing wrong?

Your grid values are outside [-1, 1].
According to https://pytorch.org/docs/stable/nn.html#torch.nn.functional.grid_sample, such values are handled as defined by padding_mode.
Default padding_mode is 'zeros', what you probably want is "border": F.grid_sample(canvas1_torch, grid, mode="bilinear", padding_mode="border") returns all values 255.

Related

Pytorch gridsample - Different Resolutions?

Does Pytorch gridsample work for this particular case.
I have an image of size [B, 100, 200] that I want to map into a smaller [B, 50, 60] space. I have the pixel 1-1 mappings stored in a [B, 100, 200, 2] tensor where most of it i guess is 0?
Is this actually possible?
The answer is yes, it is possible! But not just with the gridsample.
You can check the documentation here:
grid_sample: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
interpolate: https://pytorch.org/docs/stable/nn.functional.html#interpolate
You will need the following:
import torch.nn.F as F
warped_img = F.grid_sample(input, grid)
small_img = F.interplate(warped_img, size=(50, 60))
input that is the image you have [B, 100, 200];
grid it is the 1-1 pixel mapping [B, 100, 200, 2], the values should be normalized in the range [-1, 1] with (-1, -1) being the leftmost upper corner. Notice that, if most of the values are zeros, most pixels will be mapped to the center of the image. That is because grid takes the end_location of each pixel, it does not take the displacement.
warped_img = F.grid_sample(input, grid)
Now, to downsize the image you have to use F.interpolate
small_img = F.interpolate(warped_img, size=(56, 60))
Notice that you can also downsize first (not sure how that will impact the end-result). That is because the grid is normalized!
import torch.nn.F as F
warped_img = F.grid_sample(F.interplate(input, size=(50, 60)),
F.interplate(grid, size=(50, 60)))
Notice that grid is the end location of each pixel. And it is not the displacement. If you have a flow field (just the displacement of each pixel) you can turn that into a grid with the following:
def warp(img, flow, size):
B, C, H, W = img.size()
# mesh grid
grid_x = torch.arange(W, device=img.device)
grid_y = torch.arange(H, device=img.device)
yy, xx = torch.meshgrid(grid_y, grid_x)
grid = torch.cat((xx.unsqueeze(0), yy.unsqueeze(0)), dim=0)
vgrid = grid + flow.clamp(min=-1000., max=1000.)
# scale grid to [-1,1]
vgrid[:, 0, :, :] = 2.0 * vgrid[:, 0, :, :].clone() / max(W - 1, 1) - 1.0
vgrid[:, 1, :, :] = 2.0 * vgrid[:, 1, :, :].clone() / max(H - 1, 1) - 1.0
vgrid = vgrid.permute(0, 2, 3, 1)
warped_img = F.grid_sample(img, vgrid, align_corners=False)
small_img = F.interpolate(warped_img, size=size)
return small_img

Expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]

I read this question but it doesnt seem to answer my question :(.
So basically I'm trying to vectorize the game snake so it can run faster.
Here is my code till now:
import torch
import torch.nn.functional as F
device = torch.device("cpu")
class SnakeBoard:
def __init__(self, board=None):
if board != None:
self.channels = board
else:
# 0 - Food, 1 - Head, 2 - Body
self.channels = torch.zeros(1, 3, 15, 17,
device=device)
# Initialize game channels
self.channels[:, 0, 7, 12] = 1
self.channels[:, 1, 7, 5] = 1
self.channels[:, 2, 7, 2:6] = torch.arange(1, 5)
self.move()
def move(self):
self.channels[:, 2] -= 1
F.relu(self.channels[:, 2], inplace=True)
# Up movement test
F.conv2d(self.channels[:, 1], torch.tensor([[[0,1,0],[0,0,0],[0,0,0]]]), padding=1)
SnakeBoard()
The first dimension in channels represents batch size, second dimension represent the 3 channels of the snake game: food, head, and body, and finally the third and fourth dimensions represent the height and width of the board.
Unfortunately when running the code I get error: Expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]
How can I fix that?
The dimensions of the inputs for the convolution are not correct for a 2D convolution. Let's have a look at the dimensions you're passing to F.conv2d:
self.channels[:, 1].size()
# => torch.Size([1, 15, 17])
torch.tensor([[[0,1,0],[0,0,0],[0,0,0]]]).size()
# => torch.Size([1, 3, 3])
The correct sizes should be
input: (batch_size, in_channels , height, width)
weight: (out_channels, in_channels , kernel_height, kernel_width)
Because your weight has only 3 dimensions, it is considered to be a 1D convolution, but since you called F.conv2d the stride and padding will be tuples and therefore it won't work.
For the input you indexed the second dimension, which selects that particular element across that dimensions and eliminates that dimensions. To keep that dimension you can index it with a slice of just one element.
And for the weight you are missing one dimension as well, which can just be added directly. Also your weight is of type torch.long, since you are only using integers in the tensor creation, but the weight needs to be of type torch.float.
F.conv2d(self.channels[:, 1:2], torch.tensor([[[[0,1,0],[0,0,0],[0,0,0]]]], dtype=torch.float), padding=1)
On a different note, I don't think that convolutions are appropriate for this use case, because you're not using a key property of the convolution, which is to capture the surroundings. Those are just too many unnecessary computations to achieve what you want, most of them are multiplications with 0.
For example, a move up is much easier to achieve by removing the first row and adding a new row of zeros at the end, so everything is shifted up (assuming that the first row is the top and the last row is the bottom of the board).
head = self.channels[:, 1:2]
batch_size, channels, height, width = head.size()
# Take everything but the first row of the head
# Add a row of zeros to the end by concatenating them across the height (dimension 2)
new_head = torch.cat([head[:, :, 1:], torch.zeros(batch_size, channels, 1, width)], dim=2)
# Or if you want to wrap it around the board, it's even simpler.
# Move the first row to the end
wrap_around_head = torch.cat([head[:, :, 1:], head[:, :, 0:1]], dim=2)

Add extra dimension to an axes

I have a batch of segmentation masks of shape [5,1,100,100] (batch_size x dims x ht x wd) which I have to display in tensorboardX with an RGB image batch [5,3,100,100]. I want to add two dummy dimensions to the second axes of the segmentation mask to make it [5,3,100,100] so there will not be any dimension mismatch error when I pass it to torch.utils.make_grid. I have tried unsqueeze, expand and view but I am not able to do it. Any suggestions?
You can use expand, repeat, or repeat_interleave:
import torch
x = torch.randn((5, 1, 100, 100))
x1_3channels = x.expand(-1, 3, -1, -1)
x2_3channels = x.repeat(1, 3, 1, 1)
x3_3channels = x.repeat_interleave(3, dim=1)
print(x1_3channels.shape) # torch.Size([5, 3, 100, 100])
print(x2_3channels.shape) # torch.Size([5, 3, 100, 100])
print(x3_3channels.shape) # torch.Size([5, 3, 100, 100])
Note that, as stated in the docs:
torch.expand():
Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the stride to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory.
torch.repeat():
Unlike expand(), this function copies the tensor’s data.

Does tensorflow have shape variables?

Often, I'd like to work with variable-size data, e.g. the number of samples.
To get this data into tensorflow, I use python variables (e.g. "num_samples=2000") to define shapes.
This means I have to re-create a new graph for each number of samples.
Setting validate_shape=False is not an option to me.
Is there a Tensorflow-way of having dimension sizes as variables?
tf.placeholder() allows you to create tensors which will be filled only at runtime ; and it allows to define tensors with variable-size dimensions using None in their shape.
tf.shape() gives you the dynamic size of a tensor, itself as a tensor (actually as a tf.TensorShape, which you can use e.g. to dynamically generate other tensors). See tf.TensorShape for more detailed explanations.
An example to hopefully make things clearer:
import tensorflow as tf
import numpy as np
# Creating a placeholder for 3-channel images with undefined batche size, height and width:
images = tf.placeholder(tf.float32, shape=(None, None, None, 3))
# Dynamically obtaining the actual shape of the images:
images_shape = tf.shape(images)
# Demonstrating how this shape can be use to dynamically create other tensors:
ones = tf.ones(images_shape, dtype=images.dtype)
images_plus1 = images + ones
with tf.Session() as sess:
for i in range(2):
# Generating a random number of images with a random HxW:
num_images = np.random.randint(1, 10)
height, width = np.random.randint(10, 20), np.random.randint(10, 20)
images_zero = np.zeros((num_images, height, width, 3), dtype=np.float32)
# Running our TF operation, feeding the placeholder with the actual images:
res = sess.run(images_plus1, feed_dict={images: images_zero})
print("Shape: {} ; Pixel Val: {}".format(res.shape, res[0, 0, 0]))
# > Shape: (6, 14, 13, 3) ; Pixel Val: [1. 1. 1.]
# > Shape: (8, 11, 15, 3) ; Pixel Val: [1. 1. 1.]
# ^ As you can see, we run the same graph each time with a different number of
# images / different shape

What are padding `valid` in Keras Conv2D and how to disable padding? [duplicate]

What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
In my opinion, 'VALID' means there will be no zero padding outside the edges when we do max pool.
According to A guide to convolution arithmetic for deep learning, it says that there will be no padding in pool operator, i.e. just use 'VALID' of tensorflow.
But what is 'SAME' padding of max pool in tensorflow?
If you like ascii art:
"VALID" = without padding:
inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13)
|________________| dropped
|_________________|
"SAME" = with zero padding:
pad| |pad
inputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0
|________________|
|_________________|
|________________|
In this example:
Input width = 13
Filter width = 6
Stride = 5
Notes:
"VALID" only ever drops the right-most columns (or bottom-most rows).
"SAME" tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).
Edit:
About the name:
With "SAME" padding, if you use a stride of 1, the layer's outputs will have the same spatial dimensions as its inputs.
With "VALID" padding, there's no "made-up" padding inputs. The layer only uses valid input data.
When stride is 1 (more typical with convolution than pooling), we can think of the following distinction:
"SAME": output size is the same as input size. This requires the filter window to slip outside input map, hence the need to pad.
"VALID": Filter window stays at valid position inside input map, so output size shrinks by filter_size - 1. No padding occurs.
I'll give an example to make it clearer:
x: input image of shape [2, 3], 1 channel
valid_pad: max pool with 2x2 kernel, stride 2 and VALID padding.
same_pad: max pool with 2x2 kernel, stride 2 and SAME padding (this is the classic way to go)
The output shapes are:
valid_pad: here, no padding so the output shape is [1, 1]
same_pad: here, we pad the image to the shape [2, 4] (with -inf and then apply max pool), so the output shape is [1, 2]
x = tf.constant([[1., 2., 3.],
[4., 5., 6.]])
x = tf.reshape(x, [1, 2, 3, 1]) # give a shape accepted by tf.nn.max_pool
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
valid_pad.get_shape() == [1, 1, 1, 1] # valid_pad is [5.]
same_pad.get_shape() == [1, 1, 2, 1] # same_pad is [5., 6.]
The TensorFlow Convolution example gives an overview about the difference between SAME and VALID :
For the SAME padding, the output height and width are computed as:
out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))
And
For the VALID padding, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
Complementing YvesgereY's great answer, I found this visualization extremely helpful:
Padding 'valid' is the first figure. The filter window stays inside the image.
Padding 'same' is the third figure. The output is the same size.
Found it on this article
Visualization credits: vdumoulin#GitHub
Padding is an operation to increase the size of the input data. In case of 1-dimensional data you just append/prepend the array with a constant, in 2-dim you surround matrix with these constants. In n-dim you surround your n-dim hypercube with the constant. In most of the cases this constant is zero and it is called zero-padding.
Here is an example of zero-padding with p=1 applied to 2-d tensor:
You can use arbitrary padding for your kernel but some of the padding values are used more frequently than others they are:
VALID padding. The easiest case, means no padding at all. Just leave your data the same it was.
SAME padding sometimes called HALF padding. It is called SAME because for a convolution with a stride=1, (or for pooling) it should produce output of the same size as the input. It is called HALF because for a kernel of size k
FULL padding is the maximum padding which does not result in a convolution over just padded elements. For a kernel of size k, this padding is equal to k - 1.
To use arbitrary padding in TF, you can use tf.pad()
Quick Explanation
VALID: Don't apply any padding, i.e., assume that all dimensions are valid so that input image fully gets covered by filter and stride you specified.
SAME: Apply padding to input (if needed) so that input image gets fully covered by filter and stride you specified. For stride 1, this will ensure that output image size is same as input.
Notes
This applies to conv layers as well as max pool layers in same way
The term "valid" is bit of a misnomer because things don't become "invalid" if you drop part of the image. Sometime you might even want that. This should have probably be called NO_PADDING instead.
The term "same" is a misnomer too because it only makes sense for stride of 1 when output dimension is same as input dimension. For stride of 2, output dimensions will be half, for example. This should have probably be called AUTO_PADDING instead.
In SAME (i.e. auto-pad mode), Tensorflow will try to spread padding evenly on both left and right.
In VALID (i.e. no padding mode), Tensorflow will drop right and/or bottom cells if your filter and stride doesn't full cover input image.
I am quoting this answer from official tensorflow docs https://www.tensorflow.org/api_guides/python/nn#Convolution
For the 'SAME' padding, the output height and width are computed as:
out_height = ceil(float(in_height) / float(strides[1]))
out_width = ceil(float(in_width) / float(strides[2]))
and the padding on the top and left are computed as:
pad_along_height = max((out_height - 1) * strides[1] +
filter_height - in_height, 0)
pad_along_width = max((out_width - 1) * strides[2] +
filter_width - in_width, 0)
pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left
For the 'VALID' padding, the output height and width are computed as:
out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))
and the padding values are always zero.
There are three choices of padding: valid (no padding), same (or half), full. You can find explanations (in Theano) here:
http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html
Valid or no padding:
The valid padding involves no zero padding, so it covers only the valid input, not including artificially generated zeros. The length of output is ((the length of input) - (k-1)) for the kernel size k if the stride s=1.
Same or half padding:
The same padding makes the size of outputs be the same with that of inputs when s=1. If s=1, the number of zeros padded is (k-1).
Full padding:
The full padding means that the kernel runs over the whole inputs, so at the ends, the kernel may meet the only one input and zeros else. The number of zeros padded is 2(k-1) if s=1. The length of output is ((the length of input) + (k-1)) if s=1.
Therefore, the number of paddings: (valid) <= (same) <= (full)
VALID padding: this is with zero padding. Hope there is no confusion.
x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
print (valid_pad.get_shape()) # output-->(1, 2, 1, 1)
SAME padding: This is kind of tricky to understand in the first place because we have to consider two conditions separately as mentioned in the official docs.
Let's take input as , output as , padding as , stride as and kernel size as (only a single dimension is considered)
Case 01: :
Case 02: :
is calculated such that the minimum value which can be taken for padding. Since value of is known, value of can be found using this formula .
Let's work out this example:
x = tf.constant([[1., 2., 3.], [4., 5., 6.],[ 7., 8., 9.], [ 7., 8., 9.]])
x = tf.reshape(x, [1, 4, 3, 1])
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
print (same_pad.get_shape()) # --> output (1, 2, 2, 1)
Here the dimension of x is (3,4). Then if the horizontal direction is taken (3):
If the vertial direction is taken (4):
Hope this will help to understand how actually SAME padding works in TF.
To sum up, 'valid' padding means no padding. The output size of the convolutional layer shrinks depending on the input size & kernel size.
On the contrary, 'same' padding means using padding. When the stride is set as 1, the output size of the convolutional layer maintains as the input size by appending a certain number of '0-border' around the input data when calculating convolution.
Hope this intuitive description helps.
Based on the explanation here and following up on Tristan's answer, I usually use these quick functions for sanity checks.
# a function to help us stay clean
def getPaddings(pad_along_height,pad_along_width):
# if even.. easy..
if pad_along_height%2 == 0:
pad_top = pad_along_height / 2
pad_bottom = pad_top
# if odd
else:
pad_top = np.floor( pad_along_height / 2 )
pad_bottom = np.floor( pad_along_height / 2 ) +1
# check if width padding is odd or even
# if even.. easy..
if pad_along_width%2 == 0:
pad_left = pad_along_width / 2
pad_right= pad_left
# if odd
else:
pad_left = np.floor( pad_along_width / 2 )
pad_right = np.floor( pad_along_width / 2 ) +1
#
return pad_top,pad_bottom,pad_left,pad_right
# strides [image index, y, x, depth]
# padding 'SAME' or 'VALID'
# bottom and right sides always get the one additional padded pixel (if padding is odd)
def getOutputDim (inputWidth,inputHeight,filterWidth,filterHeight,strides,padding):
if padding == 'SAME':
out_height = np.ceil(float(inputHeight) / float(strides[1]))
out_width = np.ceil(float(inputWidth) / float(strides[2]))
#
pad_along_height = ((out_height - 1) * strides[1] + filterHeight - inputHeight)
pad_along_width = ((out_width - 1) * strides[2] + filterWidth - inputWidth)
#
# now get padding
pad_top,pad_bottom,pad_left,pad_right = getPaddings(pad_along_height,pad_along_width)
#
print 'output height', out_height
print 'output width' , out_width
print 'total pad along height' , pad_along_height
print 'total pad along width' , pad_along_width
print 'pad at top' , pad_top
print 'pad at bottom' ,pad_bottom
print 'pad at left' , pad_left
print 'pad at right' ,pad_right
elif padding == 'VALID':
out_height = np.ceil(float(inputHeight - filterHeight + 1) / float(strides[1]))
out_width = np.ceil(float(inputWidth - filterWidth + 1) / float(strides[2]))
#
print 'output height', out_height
print 'output width' , out_width
print 'no padding'
# use like so
getOutputDim (80,80,4,4,[1,1,1,1],'SAME')
Padding on/off. Determines the effective size of your input.
VALID: No padding. Convolution etc. ops are only performed at locations that are "valid", i.e. not too close to the borders of your tensor. With a kernel of 3x3 and image of 10x10, you would be performing convolution on the 8x8 area inside the borders.
SAME: Padding is provided. Whenever your operation references a neighborhood (no matter how big), zero values are provided when that neighborhood extends outside the original tensor to allow that operation to work also on border values. With a kernel of 3x3 and image of 10x10, you would be performing convolution on the full 10x10 area.
Here, W and H are width and height of input,
F are filter dimensions,
P is padding size (i.e., number of rows or columns to be padded)
For SAME padding:
For VALID padding:
Tensorflow 2.0 Compatible Answer: Detailed Explanations have been provided above, about "Valid" and "Same" Padding.
However, I will specify different Pooling Functions and their respective Commands in Tensorflow 2.x (>= 2.0), for the benefit of the community.
Functions in 1.x:
tf.nn.max_pool
tf.keras.layers.MaxPool2D
Average Pooling => None in tf.nn, tf.keras.layers.AveragePooling2D
Functions in 2.x:
tf.nn.max_pool if used in 2.x and tf.compat.v1.nn.max_pool_v2 or tf.compat.v2.nn.max_pool, if migrated from 1.x to 2.x.
tf.keras.layers.MaxPool2D if used in 2.x and
tf.compat.v1.keras.layers.MaxPool2D or tf.compat.v1.keras.layers.MaxPooling2D or tf.compat.v2.keras.layers.MaxPool2D or tf.compat.v2.keras.layers.MaxPooling2D, if migrated from 1.x to 2.x.
Average Pooling => tf.nn.avg_pool2d or tf.keras.layers.AveragePooling2D if used in TF 2.x and
tf.compat.v1.nn.avg_pool_v2 or tf.compat.v2.nn.avg_pool or tf.compat.v1.keras.layers.AveragePooling2D or tf.compat.v1.keras.layers.AvgPool2D or tf.compat.v2.keras.layers.AveragePooling2D or tf.compat.v2.keras.layers.AvgPool2D , if migrated from 1.x to 2.x.
For more information about Migration from Tensorflow 1.x to 2.x, please refer to this Migration Guide.
valid padding is no padding.
same padding is padding in a way the output has the same size as input.

Resources