How do I change the number of input channels in the torchvision ConvNeXt model? I am working with grayscale images and want 1 input channel instead of 3.
import torch
from torchvision.models.convnext import ConvNeXt, CNBlockConfig
# this is the given configuration for the 'tiny' model
block_setting = [
CNBlockConfig(96, 192, 3),
CNBlockConfig(192, 384, 3),
CNBlockConfig(384, 768, 9),
CNBlockConfig(768, None, 3),
]
model = ConvNeXt(block_setting)
# my sample image (N, C, W, H) = (16, 1, 50, 50)
im = torch.randn(16, 1, 50, 50)
# forward pass
model(im)
output:
RuntimeError: Given groups=1, weight of size [96, 3, 4, 4], expected input[16, 1, 50, 50] to have 3 channels, but got 1 channels instead
However, if I change my input shape to (16, 3, 50, 50) it seems to work fine.
The torchvision source code seems to be based of their github implementation but where do I specify in_chans with the torchvision interface?
You can rewrite the whole input layer, model._modules["features"][0][0] is
nn.Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
Then, you only need to change the in_channels
>>> model._modules["features"][0][0] = nn.Conv2d(1, 96, kernel_size=(4, 4), stride=(4, 4))
>>> model(im)
tensor([[-0.4854, -0.1925, 0.1051, ..., -0.2310, -0.8830, -0.0251],
[ 0.3332, -0.4205, -0.3007, ..., 0.8530, 0.1429, -0.3819],
[ 0.1794, -0.7546, -0.7835, ..., -0.8072, -0.0972, 0.7413],
...,
[ 0.1356, 0.0868, 0.6135, ..., -0.1382, -0.2001, 0.2415],
[-0.1612, -0.4812, 0.1271, ..., -0.6594, 0.2706, 1.0833],
[ 0.0243, -0.5039, -0.4086, ..., 0.4233, 0.0389, 0.2787]],
grad_fn=<AddmmBackward0>)
Related
I want to experiment with creating a modified Loss function for 4 channel image data.
What is the best way to split torch.Size([64, 4, 128, 128])
to
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
You can either slice the second axis and extract two tensors:
>>> a, b = x[:, :3], x[:, 3:]
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
Alternatively you can apply torch.split on the first dimension:
>>> a, b = x.split(3, dim=1)
>>> a.shape, b.shape
(64, 3, 128, 128), (64, 1, 128, 128)
I was able to resolve this myself by using the Split function.
Given an Image based Tensor like: torch.Size([64, 4, 128, 128])
You can split on dim 1 and given a static length.
self.E1 = torch.split(self.E, 3, 1)
print(self.E1[0].shape);
print(self.E1[1].shape);
Gives:
torch.Size([64, 4, 128, 128])
torch.Size([64, 3, 128, 128])
torch.Size([64, 1, 128, 128])
What is the working of Output_padding in Conv2dTranspose? Please Help me to understand this?
Conv2dTranspose(1024, 512, kernel_size=3, stride=2, padding=1, output_padding=1)
According to documentation here: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html when applying Conv2D operation with Stride > 1 you can get same output dimensions with different inputs. For example, 7x7 and 8x8 inputs would both return 3x3 output with Stride=2:
import torch
conv_inp1 = torch.rand(1,1,7,7)
conv_inp2 = torch.rand(1,1,8,8)
conv1 = torch.nn.Conv2d(1, 1, kernel_size = 3, stride = 2)
out1 = conv1(conv_inp1)
out2 = conv1(conv_inp2)
print(out1.shape) # torch.Size([1, 1, 3, 3])
print(out2.shape) # torch.Size([1, 1, 3, 3])
And when applying the transpose convolution, it is ambiguous that which output shape to return, 7x7 or 8x8 for stride=2 transpose convolution. Output padding helps pytorch to determine 7x7 or 8x8 output with output_padding parameter. Note that, it doesn't pad zeros or anything to output, it is just a way to determine the output shape and apply transpose convolution accordingly.
conv_t1 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2)
conv_t2 = torch.nn.ConvTranspose2d(1, 1, kernel_size=3, stride=2, output_padding=1)
transposed1 = conv_t1(out1)
transposed2 = conv_t2(out2)
print(transposed1.shape) # torch.Size([1, 1, 7, 7])
print(transposed2.shape) # torch.Size([1, 1, 8, 8])
I have 10,000 images in RGB in an ndarray the size of (10000, 32, 32, 3).
I'd like to efficiently compress the images (take the means of colors) to 2x2, 4x4 etc. using numpy. The only idea I've got so far is to manually split the images, compress, and put together the pieces within the loops. Is there a more elegant solution?
You could do something like this, using scipy.ndimage.zoom:
import numpy as np
import scipy.ndimage as si
def resample(img, dims):
orig = img.shape[1]
new_imgs = []
for dim in dims:
factor = dim / orig
new_img = si.zoom(img, zoom=[1, factor, factor, 1])
new_imgs.append(new_img)
return new_imgs
For example, with random data:
>>> img = np.random.random((100, 32, 32, 3))
>>> resample(img, dims = [2, 4, 8, 16, 32])
>>> [img.shape for img in new_imgs]
[(100, 2, 2, 3),
(100, 4, 4, 3),
(100, 8, 8, 3),
(100, 16, 16, 3),
(100, 32, 32, 3)]
Note from the comment (below) that you might need to adjust the mode parameter in the zoom function.
You can use SciKit image's view_as_blocks and np.mean():
import numpy as np
import skimage
images = np.random.rand(10000, 32, 32, 3)
images_rescaled = skimage.util.view_as_blocks(images, (1, 4, 4, 1)).mean(axis=(-2, -3)).squeeze()
images_rescaled.shape
# (10000, 8, 8, 3)
I am trying to make a classifier based on capsule network, and I am able to train this network but have a problem with making a prediction. How my code should look like to make a prediction given an arbitrary image (I need an example). One important thing is that I'm using EM routing. In dynamic routing its enough to calculate length of vector of last capsule layer to get predicted class but how it is in EM routing?
Here is my code:
poses, activations = m_capsules.nets.capsules_net(images, num_classes=10, iterations=3,
batch_size=batch_size, name='capsules_em')
global_step = tf.train.get_or_create_global_step()
loss = m_capsules.nets.spread_loss(
labels, activations, iterations_per_epoch, global_step, name='spread_loss'
)
tf.summary.scalar('losses/spread_loss', loss)
optimizer = tf.train.AdamOptimizer(learning_rate=0.0005)
train_tensor = slim.learning.create_train_op(
loss, optimizer, global_step=global_step, clip_gradient_norm=4.0
)
slim.learning.train(
train_tensor,
logdir="./log/train",
log_every_n_steps=1,
save_summaries_secs=60,
saver=tf.train.Saver(max_to_keep=2),
save_interval_secs=600,
)
So far I was trying to write estimator but having trouble. Below is a code:
mnist_classifier = tf.estimator.Estimator(model_fn=m_capsules.nets.capsules_net, model_dir=dir)
prediction = mnist_classifier.predict(input_fn=words_input_fn)
And my model looks like:
def capsules_net(inputs, num_classes, iterations, batch_size, name='ocr-caps'):
"""Define the Capsule Network model
"""
with tf.variable_scope(name) as scope:
# ReLU Conv1
# Images shape (24, 28, 28, 1) -> conv 5x5 filters, 32 output channels, strides 2 with padding, ReLU
# nets -> (?, 14, 14, 32)
nets = conv2d(
inputs,
kernel=5, out_channels=26, stride=2, padding='SAME',
activation_fn=tf.nn.relu, name='relu_conv1'
)
# PrimaryCaps
# (?, 14, 14, 32) -> capsule 1x1 filter, 32 output capsule, strides 1 without padding
# nets -> (poses (?, 14, 14, 32, 4, 4), activations (?, 14, 14, 32))
nets = primary_caps(
nets,
kernel_size=1, out_capsules=26, stride=1, padding='VALID',
pose_shape=[4, 4], name='primary_caps'
)
# ConvCaps1
# (poses, activations) -> conv capsule, 3x3 kernels, strides 2, no padding
# nets -> (poses (24, 6, 6, 32, 4, 4), activations (24, 6, 6, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 2, 2, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps1'
)
# ConvCaps2
# (poses, activations) -> conv capsule, 3x3 kernels, strides 1, no padding
# nets -> (poses (24, 4, 4, 32, 4, 4), activations (24, 4, 4, 32))
nets = conv_capsule(
nets, shape=[3, 3, 26, 26], strides=[1, 1, 1, 1], iterations=iterations,
batch_size=batch_size, name='conv_caps2'
)
# Class capsules
# (poses, activations) -> 1x1 convolution, 10 output capsules
# nets -> (poses (24, 10, 4, 4), activations (24, 10))
nets = class_capsules(nets, num_classes, iterations=iterations,
batch_size=batch_size, name='class_capsules')
# poses (24, 10, 4, 4), activations (24, 10)
poses, activations = nets
return poses, activations
I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112