What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])
Related
Here is what I want to do.
I have an individual data of shape (20,20,20) where 20 tensors of shape (1,20,20) will be used as an input for 20 separate CNN. Here's the code I have so far.
class MyModel(torch.nn.Module):
def __init__(self, ...):
...
self.features = nn.ModuleList([nn.Sequential(
nn.Conv2d(1,10, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.Conv2d(10, 14, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(14, 18, kernel_size=3, padding=1),
nn.ReLU(),
nn.Flatten(),
nn.Linear(28*28*18, 256)
) for _ in range(20)])
self.fc_module = nn.Sequential(
nn.Linear(256*n_selected, cnn_output_dim),
nn.Softmax(dim=n_classes)
)
def forward(self, input_list):
concat_fusion = cat([cnn(x) for x,cnn in zip(input_list,self.features)], dim = 0)
output = self.fc_module(concat_fusion)
return output
The shape of the input_list in forward function is torch.Size([100, 20, 20, 20]), where 100 is the batch size.
However, there's an issue with
concat_fusion = cat([cnn(x) for x,cnn in zip(input_list,self.features)], dim = 0)
as it results in this error.
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [10, 1, 3, 3], but got 3-dimensional input of size [20, 20, 20] instead
First off, I wonder why it expects me to give 4-dimensional weight [10,1,3,3]. I've seen
"RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 3 3, but got 3-dimensional input of size [3, 224, 224] instead"?
but I'm not sure where those specific numbers are coming from.
I have an input_list which is a batch of 100 data. I'm not sure how I can deal with individual data of shape (20,20,20) so that I can actually separate this into 20 pieces to use it as an independent input to 20 CNN.
why it expects me to give 4-dimensional weight [10,1,3,3].
Note the following log means the nn.Conv2d with kernel (10, 1, 3, 3) requiring a 4 dimensional input.
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [10, 1, 3, 3]
How to separate input into 20 pieces along channels.
Iteration over input_list(100, 20, 20, 20) produces 100 tensors of shape (20, 20, 20).
If you want to split input along channel, try to slice input_list along second dimension.
concat_fusion = torch.cat([cnn(input_list[:, i:i+1]) for i, cnn in enumerate(self.features)], dim = 1)
I built the bellow convolution autoencoder and trying to tune it to get encoder output shape (x_encoder) of [NxHxW] = 1024 without increasing loss. Currently my output shape is [4, 64, 64] Any ideas?
# define the NN architecture
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
## encoder layers ##
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
## decoder layers ##
## a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
self.t_conv2 = nn.ConvTranspose2d(16, 1, 2, stride=2)
def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x) # compressed representation
x_encoder = x
## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
# output layer (with sigmoid for scaling from 0 to 1)
x = F.sigmoid(self.t_conv2(x))
return x, x_encoder
If you want to keep the number of your parameters, adding an nn.AdaptiveAvgPool2d((N, H, W)) or nn.AdaptiveMaxPool2d((N, H, W))layer, in place of after the pooling layer (self.pool) can force the output of the decoder to have shape [NxHxW].
This should work assuming the shape of the x_encoder is (torch.Size([1, 4, 64, 64]). You can add a Conv. layer with stride set to 2 or a Conv. layer followed by a pooling layer. Check the code below:
# conv layer (depth from in --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 16, 3, padding=1)
# conv layer (depth from 16 --> 4), 3x3 kernels
self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
# The changes
self.conv3 = nn.Conv2d(4, 1, 1, 2)
# or
self.conv3 = nn.Conv2d(4, 1, 1)
self.maxpool2d = nn.MaxPool2d((2, 2))
Pytorch code:
up = nn.ConvTranspose2d(3, 128, 2, stride=2)
conv = nn.Conv2d(3, 128, 2)
inputs = Variable(torch.rand(1, 3, 64, 64))
print('up conv output size:', up(inputs).size())
inputs = Variable(torch.rand(1, 3, 64, 64))
print('conv output size:', conv(inputs).size())
print('up conv weight size:', up.weight.data.shape)
print('conv weight size:', conv.weight.data.shape)
Result:
up conv output size: torch.Size([1, 128, 128, 128])
conv output size: torch.Size([1, 128, 63, 63])
up conv weight size: torch.Size([3, 128, 2, 2])
conv weight size: torch.Size([128, 3, 2, 2])
Why the orders are different between ConvTranspose2d (3,128) and Conv2d (128, 3)?
Is it supposed to behave like this?
I am trying to make a classifier based on capsule network, and I am able to train this network but have a problem with making a prediction. How my code should look like to make a prediction given an arbitrary image (I need an example). One important thing is that I'm using EM routing. In dynamic routing its enough to calculate length of vector of last capsule layer to get predicted class but how it is in EM routing?
Here is my code:
poses, activations = m_capsules.nets.capsules_net(images, num_classes=10, iterations=3,
batch_size=batch_size, name='capsules_em')
global_step = tf.train.get_or_create_global_step()
loss = m_capsules.nets.spread_loss(
labels, activations, iterations_per_epoch, global_step, name='spread_loss'
)
tf.summary.scalar('losses/spread_loss', loss)
optimizer = tf.train.AdamOptimizer(learning_rate=0.0005)
train_tensor = slim.learning.create_train_op(
loss, optimizer, global_step=global_step, clip_gradient_norm=4.0
)
slim.learning.train(
train_tensor,
logdir="./log/train",
log_every_n_steps=1,
save_summaries_secs=60,
saver=tf.train.Saver(max_to_keep=2),
save_interval_secs=600,
)
So far I was trying to write estimator but having trouble. Below is a code:
mnist_classifier = tf.estimator.Estimator(model_fn=m_capsules.nets.capsules_net, model_dir=dir)
prediction = mnist_classifier.predict(input_fn=words_input_fn)
And my model looks like:
def capsules_net(inputs, num_classes, iterations, batch_size, name='ocr-caps'):
"""Define the Capsule Network model
"""
with tf.variable_scope(name) as scope:
# ReLU Conv1
# Images shape (24, 28, 28, 1) -> conv 5x5 filters, 32 output channels, strides 2 with padding, ReLU
# nets -> (?, 14, 14, 32)
nets = conv2d(
inputs,
kernel=5, out_channels=26, stride=2, padding='SAME',
activation_fn=tf.nn.relu, name='relu_conv1'
)
# PrimaryCaps
# (?, 14, 14, 32) -> capsule 1x1 filter, 32 output capsule, strides 1 without padding
# nets -> (poses (?, 14, 14, 32, 4, 4), activations (?, 14, 14, 32))
nets = primary_caps(
nets,
kernel_size=1, out_capsules=26, stride=1, padding='VALID',
pose_shape=[4, 4], name='primary_caps'
)
# ConvCaps1
# (poses, activations) -> conv capsule, 3x3 kernels, strides 2, no padding
# nets -> (poses (24, 6, 6, 32, 4, 4), activations (24, 6, 6, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 2, 2, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps1'
)
# ConvCaps2
# (poses, activations) -> conv capsule, 3x3 kernels, strides 1, no padding
# nets -> (poses (24, 4, 4, 32, 4, 4), activations (24, 4, 4, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 1, 1, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps2'
)
# Class capsules
# (poses, activations) -> 1x1 convolution, 10 output capsules
# nets -> (poses (24, 10, 4, 4), activations (24, 10))
nets = class_capsules(nets, num_classes, iterations=iterations,
batch_size=batch_size, name='class_capsules')
# poses (24, 10, 4, 4), activations (24, 10)
poses, activations = nets
return poses, activations
I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112