Understanding the PyTorch implementation of Conv2DTranspose

Understanding the PyTorch implementation of Conv2DTranspose - pytorch

I am trying to understand an example snippet that makes use of the PyTorch transposed convolution function, with documentation here, where in the docs the author writes:
"The padding argument effectively adds dilation * (kernel_size - 1) -
padding amount of zero padding to both sizes of the input."
Consider the snippet below where a sample image of shape [1, 1, 4, 4] containing all ones is input to a ConvTranspose2D operation with arguments stride=2 and padding=1 with a weight matrix of shape (1, 1, 4, 4) that has entries from a range between 1 and 16 (in this case dilation=1 and added_padding = 1*(4-1)-1 = 2)
sample_im = torch.ones(1, 1, 4, 4).cuda()
sample_deconv = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False).cuda()
sample_deconv.weight = torch.nn.Parameter(
torch.tensor([[[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]]]]).cuda())
Which yields:
>>> sample_deconv(sample_im)
tensor([[[[ 6., 12., 14., 12., 14., 12., 14., 7.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[10., 20., 22., 20., 22., 20., 22., 11.]]]], device='cuda:0',
grad_fn=<CudnnConvolutionTransposeBackward>)
Now I have seen simple examples of transposed convolution without stride and padding. For instance, if the input is a 2x2 image [[2, 4], [0, 1]], and the convolutional filter with one output channel is [[3, 1], [1, 5]], then the resulting tensor of shape (1, 1, 3, 3) can be seen as the sum of the four colored matrices in the image below:
The problem is I can't seem to find examples that use strides and/or padding in the same visualization. As per my snippet, I am having a very difficult time understanding how the padding is applied to the sample image, or how the stride works to get this output. Any insights appreciated, even just understanding how the 6 in the (0,0) entry or the 12 in the (0,1) entry of the resulting matrix are computed would be very helpful.

The output spatial dimensions of nn.ConvTranspose2d are given by:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x is the input spatial dimension and out the corresponding output size, s is the stride, d the dilation, p the padding, k the kernel size, and op the output padding.
If we keep the following operands:
For each value of the input, we compute a buffer (of the corresponding color) by calculating the product with each element of the kernel.
Here are the visualizations for s=1, p=0, s=1, p=1, s=2, p=0, and s=2, p=1:
s=1, p=0: output is 3x3
For the blue buffer, we have (1) 2*k_top-left = 2*3 = 6; (2) 2*k_top-right = 2*1 = 2; (3) 2*k_bottom-left = 2*1 = 2; (4) 2*k_bottom-right = 2*5 = 10.
s=1, p=1: output is 1x1
s=2, p=0: output is 4x4
s=2, p=2: output is 2x2

I believe what makes things confusing is that they are not very careful about what they meant by "input" or "output" in the doc, and the overloading of the terms "stride" and "padding".
I found it easier to understand transposed convolution in PyTorch by asking myself: What arguments would I give to a normal, forward convolution layer such that it would give the tensor at hand, that I'm feeding into a transposed conv layer?
For instance, "stride" should be understood as the "stride" in a forward conv, i.e. the moving step of the sliding kernel.
In a transposed conv, "stride" actually means something different: stride-1 is the number of the interleaving empty slots in between the input units into the transposed conv layer. That's because it is the greater-than-1 "strides" in a forward conv that create such holes. See image below for an illustration:
The illustration also shows that the kernel moving step in a transposed conv layer is always 1, regardless of the value of "stride". I found it very important to keep this in mind.
Similar for the padding argument. It should be understood as the 0-padding applied to the forward conv. Because of this padding, we get some extra units in the output from the forward conv. So, if we then feed this output into a transposed conv, in order to get back to the original, non-padded length, those extra things should be removed, thus the -2p term in the equation.
See image below for an illustration.
In summary, these are designed as such that normal conv and transposed conv are "inverse" operations to each other, in the sense of tensor shape transformations. (But I do believe that the doc should be improved.)
With this principle in mind, one can also work out the dilation and output_padding arguments relatively easily. I've written a blog on this, in case anyone is interested.

Related

Need help upscaling multi-dimensional Pytorch tensors

I have several Pytorch tensors ranging from 1-dimensional (e.g. torch.Size([128]), to 4-dimensional (e.g. torch.Size([256, 128, 3, 3]). Each tensor represents a weight in a neural network.
For each of these tensors I need to upscale 1 or 2 dimensions, for example
torch.Size([128])to torch.Size([256]),
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]),
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]).
I've looked at torch.nn.Upsample or nn.functional.interpolate and similar functions but I can't find a good way to do this comprehensively for each of my problems other than hardcoding it.
In the case of the simple 1D example I'm looking for a scaled version of my original tensor, something like this:
torch.arange(0, 9, dtype=torch.float32)
t = torch.arange(0, 9, dtype=torch.float32)
# = tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])
t_up = upsample(factor=2)
# = tensor([0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5 7., 7.5, 8.])
Any help would be appreciated.

Your pattern is very irregular as:
torch.Size([128]) to torch.Size([256]) - 1D and interpolate everything
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]) - 4D and upscale first two dimensions
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]) - 3D and upscale only second dimension without the first
There is no clear way around "hard coding" in this case and "clever" approaches would probably only raise eyebrows when someone is going over your code.
Your 1D example uses linear mode with align_corners=False, not sure about 4D examples, but those would require bilinear mode at least.
size for torch.nn.functional.interpolate flattens 1 dimensions for some reason, hence only scale_factor is an option.
Some of the data has to be reshaped for interpolate
All in all, hardcoding and some comments are the best option in this case as there is no clear way to group different ways of expanding tensors you are given (and trying to be smart in this case is probably a dead end).

How to use "LeakyRelu" and Parametric Leaky Relu "PReLU" in Keras Tuner

I am using Keras Tuner and using RandomSearch() to hypertune my regression model. While I can hypertune using "relu" and "selu", I am unable to do the same for Leaky Relu. I understand that the reason "relu" and "selu" string works because, for "relu" and "selu", string aliases are available. String alias is not available for Leaky Relu. I tried passing a callable object of Leaky Relu (see my example below) but it doesn't seem to work. Can you please advise me how to do that? I have the same issue with using Parametric Leaky Relu,
Thank you in advance!
def build_model(hp):
model = Sequential()
model.add(
Dense(
units = 18,
kernel_initializer = 'normal',
activation = 'relu',
input_shape = (18, )
)
)
for i in range(hp.Int( name = "num_layers", min_value = 1, max_value = 5)):
model.add(
Dense(
units = hp.Int(
name = "units_" + str(i),
min_value = 18,
max_value = 180,
step = 18),
kernel_initializer = 'normal',
activation = hp.Choice(
name = 'dense_activation',
values=['relu', 'selu', LeakyReLU(alpha=0.01) ],
default='relu'
)
)
)
model.add( Dense( units = 1 ) )
model.compile(
optimizer = tf.keras.optimizers.Adam(
hp.Choice(
name = "learning_rate", values = [1e-2, 1e-3, 1e-4]
)
),
loss = 'mse'
)
return model

As a work-around, you can add another activation function in the tf.keras.activations.* module by modifying the source file ( which you'll see is activations.py )
Here's the code for tf.keras.activations.relu which you'll see in activations.py,
#keras_export('keras.activations.relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0., max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
Copy this code and paste it just below. Change #keras_export('keras.activations.relu') to #keras_export( 'keras.activations.leaky_relu' ) and also change the value of alpha to 0.2, like,
#keras_export('keras.activations.leaky_relu')
#dispatch.add_dispatch_support
def relu(x, alpha=0.2, max_value=None, threshold=0):
"""Applies the rectified linear unit activation function.
With default values, this returns the standard ReLU activation:
`max(x, 0)`, the element-wise maximum of 0 and the input tensor.
Modifying default parameters allows you to use non-zero thresholds,
change the max value of the activation,
and to use a non-zero multiple of the input for values below the threshold.
For example:
>>> foo = tf.constant([-10, -5, 0.0, 5, 10], dtype = tf.float32)
>>> tf.keras.activations.relu(foo).numpy()
array([ 0., 0., 0., 5., 10.], dtype=float32)
>>> tf.keras.activations.relu(foo, alpha=0.5).numpy()
array([-5. , -2.5, 0. , 5. , 10. ], dtype=float32)
>>> tf.keras.activations.relu(foo, max_value=5).numpy()
array([0., 0., 0., 5., 5.], dtype=float32)
>>> tf.keras.activations.relu(foo, threshold=5).numpy()
array([-0., -0., 0., 0., 10.], dtype=float32)
Arguments:
x: Input `tensor` or `variable`.
alpha: A `float` that governs the slope for values lower than the
threshold.
max_value: A `float` that sets the saturation threshold (the largest value
the function will return).
threshold: A `float` giving the threshold value of the activation function
below which values will be damped or set to zero.
Returns:
A `Tensor` representing the input tensor,
transformed by the relu activation function.
Tensor will be of the same shape and dtype of input `x`.
"""
return K.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)
You can use the String alias keras.activations.leaky_relu.

# Custom activation function
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
## Add leaky-relu so we can use it as a string
get_custom_objects().update({'leaky-relu': Activation(LeakyReLU(alpha=0.2))})
## Main activation functions available to use
activation_functions = ['sigmoid', 'relu', 'elu', 'leaky-relu', 'selu', 'gelu',"swish"]

PyTorch: Differentiable operations to go from coordinate tensor to grid tensor

I have a tensor that looks like
coords = torch.Tensor([[0, 0, 1, 2],
[0, 2, 2, 2]])
The first row is the x-coordinates of objects on a grid and the second row is the corresponding y-coordinates.
I need a differentiable way (i.e. gradients can flow) to go from this tensor to the corresponding "grid" tensor, where a 1 represents the presence of an object in that location (row index, column index) and 0 represents no object:
grid = torch.Tensor([[1, 0, 1],
[0, 0, 1],
[0, 0, 1]])
In general, coords can be large (the grid size is 300x300). If coords was a sparse tensor I could simply call to_dense on it, but for various reasons specific to my application I cannot store coords as sparse. Additionally, I cannot create a new sparse tensor from coords and call to_dense on it because creating a new tensor is not differentiable.
Any help is appreciated!

I'm not sure what you mean by 'differentiable', but here's a simple way to do it using advanced indexing.
coords = coords.long()
grid[coords[0],coords[1]] = 1
tensor([[1., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
I think Torch doesn't have a detailed documentation about this, but numpy has here. (probably very similar for torch)
this is also possible
coords = coords.long()
grid[coords[0],coords[1]] = torch.Tensor([1,2,3,4])
tensor([[1., 0., 2.],
[0., 0., 3.],
[0., 0., 4.]])

Say
coords = [[0, 0, 1, 2],
[0, 2, 2, 2]]
Then:
torch.stack([torch.stack(x) for x in coords])

Pytorch select values from the last tensor dimension with indices from another tenor with a smaller dimension

I have a tensor a with three dimensions. The first dimension corresponds to minibatch size, the second to the sequence length, and the third to the feature dimension. E.g.,
>>> a = torch.arange(1, 13, dtype=torch.float).view(2,2,3) # Consider the values of a to be random
>>> a
tensor([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[ 7., 8., 9.],
[10., 11., 12.]]])
I have a second, two-dimensional tensor. Its first dimension corresponds to the minibatch size and its second dimension to the sequence length. It contains values in the range of the indices of the third dimension of a. as third dimension has size 3, so b can contain values 0, 1 or 2. E.g.,
>>> b = torch.LongTensor([[0, 2],[1,0]])
>>> b
tensor([[0, 2],
[1, 0]])
I want to obtain a tensor c that has the shape of b and contains all the values of a that are referenced by b.
In the upper scenario I would like to have:
c = torch.empty(2,2)
c[0,0] = a[0, 0, b[0,0]]
c[1,0] = a[1, 0, b[1,0]]
c[0,1] = a[0, 1, b[0,1]]
c[1,1] = a[1, 1, b[1,1]]
>>> c
tensor([[ 1., 5.],
[ 8., 10.]])
How can I create the tensor c fast? Further, I also want c to be differentiable (be able to use .backprob()). I am not too familiar with pytorch, so I am not sure, if a differentiable version of this exists.
As an alternative, instead of c having the same shape as b I could also use a c with the same shape of a, having only zeros, but at the places referenced by b ones. Then I could multiply a and c to obtain a differentiable tensor.
Like follows:
c = torch.zeros(2,2,3, dtype=torch.float)
c[0,0,b[0,0]] = 1
c[1,0,b[1,0]] = 1
c[0,1,b[0,1]] = 1
c[1,1,b[1,1]] = 1
>>> a*c
tensor([[[ 1., 0., 0.],
[ 0., 5., 0.]],
[[ 0., 8., 0.],
[10., 0., 0.]]])

Lets declare necessary variables first: (notice requires_grad in a's initialization, we will use it to ensure differentiability)
a = torch.arange(1,13,dtype=torch.float32,requires_grad=True).reshape(2,2,3)
b = torch.LongTensor([[0, 2],[1,0]])
Lets reshape a and squash minibatch and sequence dimensions:
temp = a.reshape(-1,3)
so temp now looks like:
tensor([[ 1., 2., 3.],
[ 4., 5., 6.],
[ 7., 8., 9.],
[10., 11., 12.]], grad_fn=<AsStridedBackward>)
Notice now each value of b can be used in each row of temp to get desired output. Now we do:
c = temp[range(len(temp )),b.view(-1)].view(b.size())
Notice how we index temp, range(len(temp )) to select each row and 1D b i.e b.view(-1) to get corresponding columns. Lastly .view(b.size()) brings this array to the same size as b.
If we print c now:
tensor([[ 1., 6.],
[ 8., 10.]], grad_fn=<ViewBackward>)
The presence of grad_fn=.. shows that c requires gradient i.e. its differentiable.

PyTorch: new_ones vs ones

In PyTorch what is the difference between new_ones() vs ones(). For example,
x2.new_ones(3,2, dtype=torch.double)
vs
torch.ones(3,2, dtype=torch.double)

For the sake of this answer, I am assuming that your x2 is a previously defined torch.Tensor. If we then head over to the PyTorch documentation, we can read the following on new_ones():
Returns a Tensor of size size filled with 1. By default, the
returned Tensor has the same torch.dtype and torch.device as this
tensor.
Whereas ones()
Returns a tensor filled with the scalar value 1, with the shape
defined by the variable argument sizes.
So, essentially, new_ones allows you to quickly create a new torch.Tensor on the same device and data type as a previously existing tensor (with ones), whereas ones() serves the purpose of creating a torch.Tensor from scratch (filled with ones).

new_ones()
# defining the tensor along with device to run on. (Assuming CUDA hardware is available)
x = torch.rand(5, 3, device="cuda")
new_ones() works with existing tensor. y will inherit the datatype from x and it will run on same device as defined in x
y = x.new_ones(2, 2)
print(y)
Output:
tensor([[1., 1.],
[1., 1.]], device='cuda:0')
ones()
# defining tensor. By default it will run on CPU.
x = torch.ones(5, 3)
print(x)
Output:
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
ones() is used to define tensor with 1. (as shown in example) of given size and is not dependent on the existing tensor, whereas new_ones() works with existing tensor which inherits properties like datatype and device from existing tensor and define the tensor with given size.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string