TensorFlow: Why does avg_pool ignore one stride dimension? - python-3.x

I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.

The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.

Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112

Related

How to upsample a PyTorch tensor?

I have a PyTorch tensor of size (1, 4, 128, 128) (batch, channel, height, width), and I want to 'upsample' it to (1, 3, 256, 256)
I thought to use interpolate (a function in nn.functional)
However, reading the documentation, and applying this function I am able to get in output a shape (1, 4, 256, 256), so maybe it is not the function that I am looking for. The code that I used is the following:
import torch.nn as nn
#x.shape -> (1,4,128,128)
x_0 = nn.functional.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
#x_0.shape -> (1,4,256,256)
How can I do that (from (1, 4, 128, 128) to (1, 3, 256, 256))?
To follow there is the network that I am trying to replicate, but I got stack in the upsample layer.
What about PyTorch's nn.Upsample function:
upsample = nn.Upsample(scale_factor=2)
x = upsample(x)
Not sure if that's what you are looking for since you want the second dimension to change from 4 to 3.

Obtaining a specific shape using nn.Conv2d

Starting with an input shape like (64, 1, 103, 8) how should I set the parameters of nn.Conv2d to arrive at a shape of (64, 32, 43, 8)?
Currently I'm using the following
nn.Conv2d(in_channels=1,out_channels=32, stride=(2,1),kernel_size=(3,3),padding=(0,1),dilation=(9,1))
But I'm afraid that dilation parameter may cause bad performance.
You can use padding = (13, 1), stride = (3, 1) and kernel_size = 3:
nn.Conv2d(1, 32, 3, stride = (3, 1), padding = (13, 1))

Fastest, best (fastest) way to modify data in in a pytorch loss function?

I want to experiment with creating a modified Loss function for 4 channel image data.
What is the best way to split torch.Size([64, 4, 128, 128])
to
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
You can either slice the second axis and extract two tensors:
>>> a, b = x[:, :3], x[:, 3:]
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
Alternatively you can apply torch.split on the first dimension:
>>> a, b = x.split(3, dim=1)
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
I was able to resolve this myself by using the Split function.
Given an Image based Tensor like: torch.Size([64, 4, 128, 128])
You can split on dim 1 and given a static length.
self.E1 = torch.split(self.E, 3, 1)
print(self.E1[0].shape);
print(self.E1[1].shape);
Gives:
torch.Size([64, 4, 128, 128])
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])

What output_padding does in nn.ConvTranspose2d?

What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])

tensorflow capsule network em routing predict

I am trying to make a classifier based on capsule network, and I am able to train this network but have a problem with making a prediction. How my code should look like to make a prediction given an arbitrary image (I need an example). One important thing is that I'm using EM routing. In dynamic routing its enough to calculate length of vector of last capsule layer to get predicted class but how it is in EM routing?
Here is my code:
poses, activations = m_capsules.nets.capsules_net(images, num_classes=10, iterations=3,
batch_size=batch_size, name='capsules_em')
global_step = tf.train.get_or_create_global_step()
loss = m_capsules.nets.spread_loss(
labels, activations, iterations_per_epoch, global_step, name='spread_loss'
)
tf.summary.scalar('losses/spread_loss', loss)
optimizer = tf.train.AdamOptimizer(learning_rate=0.0005)
train_tensor = slim.learning.create_train_op(
loss, optimizer, global_step=global_step, clip_gradient_norm=4.0
)
slim.learning.train(
train_tensor,
logdir="./log/train",
log_every_n_steps=1,
save_summaries_secs=60,
saver=tf.train.Saver(max_to_keep=2),
save_interval_secs=600,
)
So far I was trying to write estimator but having trouble. Below is a code:
mnist_classifier = tf.estimator.Estimator(model_fn=m_capsules.nets.capsules_net, model_dir=dir)
prediction = mnist_classifier.predict(input_fn=words_input_fn)
And my model looks like:
def capsules_net(inputs, num_classes, iterations, batch_size, name='ocr-caps'):
"""Define the Capsule Network model
"""
with tf.variable_scope(name) as scope:
# ReLU Conv1
# Images shape (24, 28, 28, 1) -> conv 5x5 filters, 32 output channels, strides 2 with padding, ReLU
# nets -> (?, 14, 14, 32)
nets = conv2d(
inputs,
kernel=5, out_channels=26, stride=2, padding='SAME',
activation_fn=tf.nn.relu, name='relu_conv1'
)
# PrimaryCaps
# (?, 14, 14, 32) -> capsule 1x1 filter, 32 output capsule, strides 1 without padding
# nets -> (poses (?, 14, 14, 32, 4, 4), activations (?, 14, 14, 32))
nets = primary_caps(
nets,
kernel_size=1, out_capsules=26, stride=1, padding='VALID',
pose_shape=[4, 4], name='primary_caps'
)
# ConvCaps1
# (poses, activations) -> conv capsule, 3x3 kernels, strides 2, no padding
# nets -> (poses (24, 6, 6, 32, 4, 4), activations (24, 6, 6, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 2, 2, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps1'
)
# ConvCaps2
# (poses, activations) -> conv capsule, 3x3 kernels, strides 1, no padding
# nets -> (poses (24, 4, 4, 32, 4, 4), activations (24, 4, 4, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 1, 1, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps2'
)
# Class capsules
# (poses, activations) -> 1x1 convolution, 10 output capsules
# nets -> (poses (24, 10, 4, 4), activations (24, 10))
nets = class_capsules(nets, num_classes, iterations=iterations,
batch_size=batch_size, name='class_capsules')
# poses (24, 10, 4, 4), activations (24, 10)
poses, activations = nets
return poses, activations

Resources