I have several Pytorch tensors ranging from 1-dimensional (e.g. torch.Size([128]), to 4-dimensional (e.g. torch.Size([256, 128, 3, 3]). Each tensor represents a weight in a neural network.
For each of these tensors I need to upscale 1 or 2 dimensions, for example
torch.Size([128])to torch.Size([256]),
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]),
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]).
I've looked at torch.nn.Upsample or nn.functional.interpolate and similar functions but I can't find a good way to do this comprehensively for each of my problems other than hardcoding it.
In the case of the simple 1D example I'm looking for a scaled version of my original tensor, something like this:
torch.arange(0, 9, dtype=torch.float32)
t = torch.arange(0, 9, dtype=torch.float32)
# = tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])
t_up = upsample(factor=2)
# = tensor([0., 0.5, 1., 1.5, 2., 2.5, 3., 3.5, 4., 4.5, 5., 5.5, 6., 6.5 7., 7.5, 8.])
Any help would be appreciated.
Your pattern is very irregular as:
torch.Size([128]) to torch.Size([256]) - 1D and interpolate everything
torch.Size([256, 128, 3, 3]) to torch.Size([512, 256, 3, 3]) - 4D and upscale first two dimensions
torch.Size([3, 256, 1, 1]) to torch.Size([3, 512, 1, 1]) - 3D and upscale only second dimension without the first
There is no clear way around "hard coding" in this case and "clever" approaches would probably only raise eyebrows when someone is going over your code.
Your 1D example uses linear mode with align_corners=False, not sure about 4D examples, but those would require bilinear mode at least.
size for torch.nn.functional.interpolate flattens 1 dimensions for some reason, hence only scale_factor is an option.
Some of the data has to be reshaped for interpolate
All in all, hardcoding and some comments are the best option in this case as there is no clear way to group different ways of expanding tensors you are given (and trying to be smart in this case is probably a dead end).
Related
I'm working on a classification problem. The number of classes is 5. I have a ground truth vector that has the shape (3) instead of 1. The values in this target vector are the possible classes and the predicted vector is of the shape (1x5) which holds the softmax scores for all the classes.
For example:
predicted_vector = tensor([0.0669, 0.1336, 0.3400, 0.3392, 0.1203]
ground_truth = tensor([3,2,5])
For the above illustration, a typical argmax operation would result in declaring class 3 as the predicted class (0.34) but I want the model to reward even if the argmax class is any of 3,2, or 5.
Which loss function is recommended for such a use case?
As jodag pointed out in the comments you can try to treat it as a multi-label classification problem.
So [[0, 1, 2], [0, 2, 4], [3, 3, 3]] will be transformed into:
tensor([[1., 1., 1., 0., 0.],
[1., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
Here is an example of how this can be implemented:
import torch
from torch.nn import BCELoss
predicted_vector = torch.rand((3, 5))
ground_truth = torch.LongTensor([[0, 1, 2], [0, 2, 4], [3, 3, 3]])
labels_onehot = torch.zeros_like(predicted_vector)
labels_onehot.scatter_(1, ground_truth, 1)
loss_fn = BCELoss()
loss = loss_fn(predicted_vector, labels_onehot)
Also you can add different weights to different labels
For this problem, a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay so the model isn't penalised that heavily.
This is a typical single-label, multi-class problem, but with probabilistic (“soft”) labels, and CrossEntropyLoss should be used (and not use softmax()).
In this example, the (soft) target might be a probability of 0.7 for class 3, a probability of 0.2 for class 2, and a probability of 0.1 for class 5 (and zero for everything else).
I am trying to understand an example snippet that makes use of the PyTorch transposed convolution function, with documentation here, where in the docs the author writes:
"The padding argument effectively adds dilation * (kernel_size - 1) -
padding amount of zero padding to both sizes of the input."
Consider the snippet below where a sample image of shape [1, 1, 4, 4] containing all ones is input to a ConvTranspose2D operation with arguments stride=2 and padding=1 with a weight matrix of shape (1, 1, 4, 4) that has entries from a range between 1 and 16 (in this case dilation=1 and added_padding = 1*(4-1)-1 = 2)
sample_im = torch.ones(1, 1, 4, 4).cuda()
sample_deconv = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False).cuda()
sample_deconv.weight = torch.nn.Parameter(
torch.tensor([[[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.],
[13., 14., 15., 16.]]]]).cuda())
Which yields:
>>> sample_deconv(sample_im)
tensor([[[[ 6., 12., 14., 12., 14., 12., 14., 7.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[12., 24., 28., 24., 28., 24., 28., 14.],
[20., 40., 44., 40., 44., 40., 44., 22.],
[10., 20., 22., 20., 22., 20., 22., 11.]]]], device='cuda:0',
grad_fn=<CudnnConvolutionTransposeBackward>)
Now I have seen simple examples of transposed convolution without stride and padding. For instance, if the input is a 2x2 image [[2, 4], [0, 1]], and the convolutional filter with one output channel is [[3, 1], [1, 5]], then the resulting tensor of shape (1, 1, 3, 3) can be seen as the sum of the four colored matrices in the image below:
The problem is I can't seem to find examples that use strides and/or padding in the same visualization. As per my snippet, I am having a very difficult time understanding how the padding is applied to the sample image, or how the stride works to get this output. Any insights appreciated, even just understanding how the 6 in the (0,0) entry or the 12 in the (0,1) entry of the resulting matrix are computed would be very helpful.
The output spatial dimensions of nn.ConvTranspose2d are given by:
out = (x - 1)s - 2p + d(k - 1) + op + 1
where x is the input spatial dimension and out the corresponding output size, s is the stride, d the dilation, p the padding, k the kernel size, and op the output padding.
If we keep the following operands:
For each value of the input, we compute a buffer (of the corresponding color) by calculating the product with each element of the kernel.
Here are the visualizations for s=1, p=0, s=1, p=1, s=2, p=0, and s=2, p=1:
s=1, p=0: output is 3x3
For the blue buffer, we have (1) 2*k_top-left = 2*3 = 6; (2) 2*k_top-right = 2*1 = 2; (3) 2*k_bottom-left = 2*1 = 2; (4) 2*k_bottom-right = 2*5 = 10.
s=1, p=1: output is 1x1
s=2, p=0: output is 4x4
s=2, p=2: output is 2x2
I believe what makes things confusing is that they are not very careful about what they meant by "input" or "output" in the doc, and the overloading of the terms "stride" and "padding".
I found it easier to understand transposed convolution in PyTorch by asking myself: What arguments would I give to a normal, forward convolution layer such that it would give the tensor at hand, that I'm feeding into a transposed conv layer?
For instance, "stride" should be understood as the "stride" in a forward conv, i.e. the moving step of the sliding kernel.
In a transposed conv, "stride" actually means something different: stride-1 is the number of the interleaving empty slots in between the input units into the transposed conv layer. That's because it is the greater-than-1 "strides" in a forward conv that create such holes. See image below for an illustration:
The illustration also shows that the kernel moving step in a transposed conv layer is always 1, regardless of the value of "stride". I found it very important to keep this in mind.
Similar for the padding argument. It should be understood as the 0-padding applied to the forward conv. Because of this padding, we get some extra units in the output from the forward conv. So, if we then feed this output into a transposed conv, in order to get back to the original, non-padded length, those extra things should be removed, thus the -2p term in the equation.
See image below for an illustration.
In summary, these are designed as such that normal conv and transposed conv are "inverse" operations to each other, in the sense of tensor shape transformations. (But I do believe that the doc should be improved.)
With this principle in mind, one can also work out the dilation and output_padding arguments relatively easily. I've written a blog on this, in case anyone is interested.
Problem
I have a list of indices and a list of values like so:
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3])
I want to define a (3x3 for the example) matrix which contains the values v at the indices i (1 at position (2,2), 2 at position (2, 0) and 3 at position (1,2)):
tensor([[0, 0, 0],
[0, 0, 3],
[2, 0, 1]])
What I have tried
I can do it using a trick, with torch.sparse and .to_dense() but I feel that it's not the "pytorchic" way to do that nor the most efficient:
f = torch.sparse.FloatTensor(indices, values, torch.Size([3, 3]))
print(f.to_dense())
Any idea for a better solution ?
Ideally I would appreciate a solution at least as fast than the one provided above.
Of course this was just an example, no particular structure in tensors i and v are assumed (neither for the dimension).
There is an alternative, as below:
import torch
i = torch.tensor([[2, 2, 1], [2, 0, 2]])
v = torch.tensor([1, 2, 3], dtype=torch.float) # enforcing same data-type
target = torch.zeros([3,3], dtype=torch.float) # enforcing same data-type
target.index_put_(tuple([k for k in i]), v)
print(target)
The target tensor will be as follows:
tensor([[0., 0., 0.],
[0., 0., 3.],
[2., 0., 1.]])
This medium.com blog article provides a comprehensive list of all index functions for PyTorch Tensors.
I have a tensor that looks like
coords = torch.Tensor([[0, 0, 1, 2],
[0, 2, 2, 2]])
The first row is the x-coordinates of objects on a grid and the second row is the corresponding y-coordinates.
I need a differentiable way (i.e. gradients can flow) to go from this tensor to the corresponding "grid" tensor, where a 1 represents the presence of an object in that location (row index, column index) and 0 represents no object:
grid = torch.Tensor([[1, 0, 1],
[0, 0, 1],
[0, 0, 1]])
In general, coords can be large (the grid size is 300x300). If coords was a sparse tensor I could simply call to_dense on it, but for various reasons specific to my application I cannot store coords as sparse. Additionally, I cannot create a new sparse tensor from coords and call to_dense on it because creating a new tensor is not differentiable.
Any help is appreciated!
I'm not sure what you mean by 'differentiable', but here's a simple way to do it using advanced indexing.
coords = coords.long()
grid[coords[0],coords[1]] = 1
tensor([[1., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]])
I think Torch doesn't have a detailed documentation about this, but numpy has here. (probably very similar for torch)
this is also possible
coords = coords.long()
grid[coords[0],coords[1]] = torch.Tensor([1,2,3,4])
tensor([[1., 0., 2.],
[0., 0., 3.],
[0., 0., 4.]])
Say
coords = [[0, 0, 1, 2],
[0, 2, 2, 2]]
Then:
torch.stack([torch.stack(x) for x in coords])
I am familiarizing myself with the Pytorch unfold method from https://pytorch.org/docs/stable/tensors.html#torch.Tensor.unfold
I looked at their example which is
>>> x = torch.arange(1., 8)
>>> x
tensor([ 1., 2., 3., 4., 5., 6., 7.])
>>> x.unfold(0, 2, 1)
tensor([[ 1., 2.],
[ 2., 3.],
[ 3., 4.],
[ 4., 5.],
[ 5., 6.],
[ 6., 7.]])
I understand above that when we unfold in dimension 0, we take chunks of size 2 at a time with stride 1 and therefore, the result is an arrangement of different chunks, which are [1., 2.], [2., 3.] and so on. As we have 6 chunks at the end, the chunks will be put together and the final shape is (6,2).
However, I have another example I ran as shown below.
In [115]: s = torch.arange(20).view(1,10,2)
In [116]: s
Out[116]:
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17],
[18, 19]]])
In [117]: s.unfold(0,1,1)
Out[117]:
tensor([[[[ 0],
[ 1]],
[[ 2],
[ 3]],
[[ 4],
[ 5]],
[[ 6],
[ 7]],
[[ 8],
[ 9]],
[[10],
[11]],
[[12],
[13]],
[[14],
[15]],
[[16],
[17]],
[[18],
[19]]]])
In [119]: s.unfold(0,1,1).shape
Out[119]: torch.Size([1, 10, 2, 1])
So you see my original tensor was of shape (1,10,2) and I asked for an unfolding operation with parameters s.unfold(0, 1, 1).
Going by original understanding from the previous example, I assumed this means in the dimension 0, we take 1 chunk at a time with stride 1. Thus, as we have go into dimension 0, we see that we have only one chunk of size (10,2). So the output should have just taken this chunk and may be it should have just added a dimension to wrap this chunk and given me an output of size (1, 10, 2).
However, it gives me an output of size (1, 10, 2, 1). Why does it have an extra dimension at the last? Can someone elaborate intuitively please?
The documentation states:
An additional dimension of size size is appended in the returned tensor.
where size is the size of the chunks you specified (second argument). By definition, it always adds an additional dimension, which makes it consistent no matter what size you choose. Just because a dimension has size 1, doesn't mean it should be omitted automatically.
Regarding the intuition behind this, let's consider that instead of returning a tensor where the last dimension represents the chunks, we create a list of all chunks. For simplicity, we'll limit it to the first dimension with a step of 1.
import torch
from typing import List
def list_chunks(tensor: torch.Tensor, size: int) -> List[torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
return chunks
x = torch.arange(1.0, 8)
s = torch.arange(20).view(1, 10, 2)
# As expected, a list with 6 elements, as there are 6 chunks.
list_chunks(x, 2)
# => [tensor([1., 2.]),
# tensor([2., 3.]),
# tensor([3., 4.]),
# tensor([4., 5.]),
# tensor([5., 6.]),
# tensor([6., 7.])]
# The list has only a single element, as there is only a single chunk.
# But it's still a list.
list_chunks(s, 1)
# => [tensor([[[ 0, 1],
# [ 2, 3],
# [ 4, 5],
# [ 6, 7],
# [ 8, 9],
# [10, 11],
# [12, 13],
# [14, 15],
# [16, 17],
# [18, 19]]])]
I've deliberately included type annotations to make it clearer what we are expecting from the function. If there is only a single chunk, it will be a list with one element, as it is always a list of chunks.
You were expecting a different behaviour, namely when there is a single chunk, you want the single chunk instead of a list. Which would change the implementation as follows.
from typing import List, Union
def list_chunks(tensor: torch.Tensor, size: int) -> Union[List[torch.Tensor], torch.Tensor]:
chunks = []
for i in range(tensor.size(0) - size + 1):
chunks.append(tensor[i : i + size])
# If it's a single chunk, return just the chunk itself
if len(chunks) == 1:
return chunks[0]
else:
return chunks
With that change, anyone that uses this function, now needs to take two cases into consideration. If you don't distinguish between a list and a single chunk (tensor), you will get unexpected results, e.g. looping over the chunks would instead loop over the first dimension of the tensor.
The programmatically intuitive approach is to always return a list of chunks and torch.unfold does the same, but instead of a list of chunks, it's a tensor where the last dimension can be seen as the listing of the chunks.