I have a torch tensor with 3 channels, and I want it to be 1 channel (all other dimensions should stay the same).
So if my current dimensions are torch.Size([6, 3, 512, 512]) I want it to be torch.Size([6, 1, 512, 512])
How can I do that?
Does this solve your problem?
a = torch.ones(6, 3, 512, 512)
b = a[:, 0:1, :, :]
print(b.size()) # torch.Size([6, 1, 512, 512])
Related
I want to experiment with creating a modified Loss function for 4 channel image data.
What is the best way to split torch.Size([64, 4, 128, 128])
to
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
You can either slice the second axis and extract two tensors:
>>> a, b = x[:, :3], x[:, 3:]
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
Alternatively you can apply torch.split on the first dimension:
>>> a, b = x.split(3, dim=1)
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
I was able to resolve this myself by using the Split function.
Given an Image based Tensor like: torch.Size([64, 4, 128, 128])
You can split on dim 1 and given a static length.
self.E1 = torch.split(self.E, 3, 1)
print(self.E1[0].shape);
print(self.E1[1].shape);
Gives:
torch.Size([64, 4, 128, 128])
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
I have a tensor of shape (60, 3, 32, 32) and a boolean mask of shape (60, 32, 32). I want to apply this mask to the tensor. The output tensor should have shape (60, 3, 32, 32), and values are kept if the mask is 1, else 0.
How can I do that fast?
Let t be the tensor and m be the mask. You can use:
t * m.unsqueeze(1)
What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])
Pytorch code:
up = nn.ConvTranspose2d(3, 128, 2, stride=2)
conv = nn.Conv2d(3, 128, 2)
inputs = Variable(torch.rand(1, 3, 64, 64))
print('up conv output size:', up(inputs).size())
inputs = Variable(torch.rand(1, 3, 64, 64))
print('conv output size:', conv(inputs).size())
print('up conv weight size:', up.weight.data.shape)
print('conv weight size:', conv.weight.data.shape)
Result:
up conv output size: torch.Size([1, 128, 128, 128])
conv output size: torch.Size([1, 128, 63, 63])
up conv weight size: torch.Size([3, 128, 2, 2])
conv weight size: torch.Size([128, 3, 2, 2])
Why the orders are different between ConvTranspose2d (3,128) and Conv2d (128, 3)?
Is it supposed to behave like this?
I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112