Obtaining a specific shape using nn.Conv2d - pytorch

Starting with an input shape like (64, 1, 103, 8) how should I set the parameters of nn.Conv2d to arrive at a shape of (64, 32, 43, 8)?
Currently I'm using the following
nn.Conv2d(in_channels=1,out_channels=32, stride=(2,1),kernel_size=(3,3),padding=(0,1),dilation=(9,1))
But I'm afraid that dilation parameter may cause bad performance.

You can use padding = (13, 1), stride = (3, 1) and kernel_size = 3:
nn.Conv2d(1, 32, 3, stride = (3, 1), padding = (13, 1))

Related

Fastest, best (fastest) way to modify data in in a pytorch loss function?

I want to experiment with creating a modified Loss function for 4 channel image data.
What is the best way to split torch.Size([64, 4, 128, 128])
to
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
You can either slice the second axis and extract two tensors:
>>> a, b = x[:, :3], x[:, 3:]
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
Alternatively you can apply torch.split on the first dimension:
>>> a, b = x.split(3, dim=1)
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
I was able to resolve this myself by using the Split function.
Given an Image based Tensor like: torch.Size([64, 4, 128, 128])
You can split on dim 1 and given a static length.
self.E1 = torch.split(self.E, 3, 1)
print(self.E1[0].shape);
print(self.E1[1].shape);
Gives:
torch.Size([64, 4, 128, 128])
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])

What output_padding does in nn.ConvTranspose2d?

What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])

Trying to perform transposed convolution but missing a pixel

def get_unet(input_img, n_filters=16, dropout=0.5, batchnorm=True):
# contracting path
c1 = conv2d_block(input_img, n_filters=n_filters * 1, kernel_size=3, batchnorm=batchnorm)
p1 = MaxPooling2D((2, 2))(c1)
p1 = Dropout(dropout * 0.5)(p1)
c2 = conv2d_block(p1, n_filters=n_filters * 2, kernel_size=3, batchnorm=batchnorm)
p2 = MaxPooling2D((2, 2))(c2)
p2 = Dropout(dropout)(p2)
c3 = conv2d_block(p2, n_filters=n_filters * 4, kernel_size=3, batchnorm=batchnorm)
p3 = MaxPooling2D((2, 2))(c3)
p3 = Dropout(dropout)(p3)
c4 = conv2d_block(p3, n_filters=n_filters * 8, kernel_size=3, batchnorm=batchnorm)
p4 = MaxPooling2D(pool_size=(2, 2))(c4)
p4 = Dropout(dropout)(p4)
c5 = conv2d_block(p4, n_filters=n_filters * 16, kernel_size=3, batchnorm=batchnorm)
# expansive path
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides=(2, 2), padding='same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters=n_filters * 8, kernel_size=3, batchnorm=batchnorm)
u7 = Conv2DTranspose(n_filters * 4, (3, 3), strides=(2, 2), padding='same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block(u7, n_filters=n_filters * 4, kernel_size=3, batchnorm=batchnorm)
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides=(2, 2), padding='same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block(u8, n_filters=n_filters * 2, kernel_size=3, batchnorm=batchnorm)
u9 = Conv2DTranspose(n_filters * 1, (3, 3), strides=(2, 2), padding='same')(c8)
u9 = concatenate([u9, c1], axis=3)
u9 = Dropout(dropout)(u9)
c9 = conv2d_block(u9, n_filters=n_filters * 1, kernel_size=3, batchnorm=batchnorm)
outputs = Conv2D(1, (1, 1), activation='sigmoid')(c9)
model = Model(inputs=[input_img], outputs=[outputs])
return model
I got this model for Keras from here. I seem to be getting the error:
File "train.py", line 87, in get_unet
u8 = concatenate([u8, c2])
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 256, 184, 32), (None, 256, 185, 32)]
So I printed the values of each of these Tensors, and I got:
c1: Tensor("activation_2/Relu:0", shape=(?, 512, 370, 16), dtype=float32)
c2: Tensor("activation_4/Relu:0", shape=(?, 256, 185, 32), dtype=float32)
c3: Tensor("activation_6/Relu:0", shape=(?, 128, 92, 64), dtype=float32)
c4: Tensor("activation_8/Relu:0", shape=(?, 64, 46, 128), dtype=float32)
c5: Tensor("activation_10/Relu:0", shape=(?, 32, 23, 256), dtype=float32)
u6: Tensor("dropout_5/cond/Merge:0", shape=(?, 64, 46, 256), dtype=float32)
u7: Tensor("dropout_6/cond/Merge:0", shape=(?, 128, 92, 128), dtype=float32)
u8: Tensor("conv2d_transpose_3/BiasAdd:0", shape=(?, ?, ?, 32), dtype=float32)
What happened at C2? Why is the second dimension of u8 184, while the second dimension of C2 seems to be 185. Furthermore, C3s second dimension seems to to be maxpooled by a factor of 2 from 184 (probably due to a floor function)
How would I combat this? Do I have to change the size of the images that are being inputted, or do I have to engineer something while doing the transpose convolution? Do I need to perform interpolation for the one extra pixel?
That's happening because your second dimension is not even when you divide it by 2 in your C2 layer.
You are maxpooling 185 by a factor of 2, which gives you 92.5 -> floor to 92
But when you do the operation in the other way, you are upsampling 92 by a factor of 2 which gives you 184.
To avoid this you can simply zeropad U8 to be compatible with C2, like this :
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides=(2, 2), padding='same')(c7)
u8 = ZeroPadding2D(padding=((0, 0), (0, 1)))(u8)
u8 = concatenate([u8, c2])
If you don't want to zeropad, you can reshape your input images in order to have a dimension corresponding to a power of 2 or a dimension that can be divided by two multiple times without giving an odd number, like 224 (can be divided by two 5 times before giving 7).
Hope that will help you !

Model parallelism not working? All GPUs not being used?

I have been stuck with a problem like this for a while now. I have an AWS setup with 500 GB of ram and about 7 GPUs. Now the issue is that each time I try to run my keras with tensorflow as back-end code, it runs out of memory. I have found out the reason for this as well. The reason is that each you GPU just has 12gb of memory whereas my model needs more than that. So, how can I run the model such that it uses the memory of all the GPUs combined to load the model and not just rely on the memory of one GPU for loading the entire model and running out of memory? I have tried model parallelism with keras and it seems to be set-up correctly as on printing the layers , each layer is assigned to the programmed GPU but the model is still trying to load into a single GPU's memory I.e. just 11gb and soon runs out of memory.
Any idea what's going on?
with tf.device('/gpu:0'):
x = conv2d_bn(img_input, 32, 3, 3, strides=(2, 2), padding='valid')
x = conv2d_bn(x, 32, 3, 3, padding='valid')
x = conv2d_bn(x, 64, 3, 3)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = conv2d_bn(x, 80, 1, 1, padding='valid')
x = conv2d_bn(x, 192, 3, 3, padding='valid')
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
# mixed 0, 1, 2: 35 x 35 x 256
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 32, 1, 1)
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=channel_axis,
name='mixed0')
print(x)
with tf.device('/gpu:1'):
# mixed 1: 35 x 35 x 256
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=channel_axis,
name='mixed1')
# mixed 2: 35 x 35 x 256
branch1x1 = conv2d_bn(x, 64, 1, 1)
branch5x5 = conv2d_bn(x, 48, 1, 1)
branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
branch3x3dbl = conv2d_bn(x, 64, 1, 1)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
x = layers.concatenate(
[branch1x1, branch5x5, branch3x3dbl, branch_pool],
axis=channel_axis,
name='mixed2')
print(x)
Edit:
Here's the link to the code . Also it may be noted that currently, I am just feeding one image to the model so as to test whether my GPU can handle it. Hence, reducing the batch size is not a possible solution.

TensorFlow: Why does avg_pool ignore one stride dimension?

I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112

Resources