Pooling over channels in pytorch - conv-neural-network

In tensorflow, I can pool over the depth dimension which would reduce the channels and leave the spatial dimensions unchanged. I'm trying to do the same in pytorch but the documentation seems to say pooling can only be done over the height and width dimensions. Is there a way I can pool over channels in pytorch?
I've a tensor of shape [1,512,50,50] I'm trying to use pooling to bring the number of channels down to 3.
I saw this question but did not find the answer helpful.

The easiest way to reduce the number of channels is using a 1x1 kernel:
import torch
x = torch.rand(1, 512, 50, 50)
conv = torch.nn.Conv2d(512, 3, 1)
y = conv(x)
print(y.size())
# torch.Size([1, 3, 50, 50])
If you really need to perform pooling along the channels dimension due to some reason, you may want to permute the dimensions so that the channels dimension is swapped with some other dimension (e.g. width).
This idea was referenced here.

Related

Mutli variate time series prediction - Conv1D and loss function - pytorch

I have a couple of questions.
I have data of the following shape:
(32, 64, 11)
where 32 is the batch size, 64 is a sequence length and 11 is the number of features. each sample of mine is 64X11, and has a label of 0 or 1.
I’d like to predict when a sequence has a label of “1”.
I’m trying to use a simple architecture with
conv1D → ReLU → flatten → linear → sigmoid.
For the Conv1D I thought that since it is a multi variate time series prediction, and each row in my data is a second, I think that the number of in channels should be the number of features, since that way it will process all of the features concurrently, (I don’t have any spatial things in my data, it doesn’t matter if a column is in index 0 or 9, as it is important in image with pixels.
I can't get to decide how to “initialize” the conv1D parameters. Currently I think the number of channels should be the number of features and not 1, as the reason I just explained, but unsure of it.
Secondly, should the loss function be BCELOSS or something else? assuming that my labels are 0 or 1, and the prediction for me is I want the model to provide a probability of belonging to class with label 1.
A lot of thanks.

Which data format convention in Keras (channels_last or channels_first) should be used when?

I'm a newbie in deep learning and confused which data format convention should be used when. According to the https://keras.io/backend/, there are two data format conventions.
channels_last for 2D data: (rows, cols, channels)
channels_first: for 2D data: (channels, rows, cols)
Why there is a channels_first option in Keras? When should I use it? Is there any historical reason like BGR usage in OpenCV?
" BGR was a choice made for historical reasons and now we have to live with it. In other words, BGR is the horse’s ass in OpenCV."
https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
I believe the reason that there are two data formats, is that Keras supports Theano as another backend too. In Theano, the first axis represents the channels.
Tensorflow
data_format accepts 2 values- channels_last (default) or channels_first.
Its represents the ordering of the dimensions in the inputs.
channels_last corresponds to inputs with shape (batch_size, height, width, channels)
channels_first corresponds to inputs with shape (batch_size, channels, height, width).
Code: tf.keras.layers.ZeroPadding2D(padding=(3, 3), input_shape=(64, 64, 3), data_format='channels_last')
No batch size
Look at Tensorflow document

CNN-LSTM Image Classification

Is it possible to reshape 512x512 rgb image to (timestep, dim)? Otherwards, I am trying to convert this reshape layer: Reshape((23, 3887)) to 512 vice 299. Also, is there any documentation explaining how to determine input_dim and timestep for Keras?
It seems like your problem is similar to one that i had earlier today. Look at it here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
Now to add to the answer from the question i linked too. Let number_of_images be n. In your case the original data format would be (n, 512, 512, 3). All you then need to do decide how many images you want per sequence. Say you want a sequence of 5 images and have gotten 5000 images in total. Then reshaping to (1000, 5, 512, 512, 3) should do. This way the model sees 1000 sequences of 5 images.

In 'ResNet50 model for Keras', why use 1x1 convolution with stride = 2?

ResNet50 is here: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
In a 'conv_block', the first layer is like this:
x = Conv2D(filters1 = 64, # number of filters
kernel_size=(1, 1), # height/width of filters
strides=(2, 2) # stride
)(input_tensor)
My question is:
Isn't this layer going to miss some pixels?
This 1x1 convolutions only look at 1 pixel, and then move 2 pixels(stride=2).
It was mentioned in the original paper of Resnet:
The convolutional layers mostly have 3×3 filters and
follow two simple design rules: (i) for the same output
feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by
convolutional layers that have a stride of 2
So you may consider it as a replacement for Pooling layer, and it also reduces calculation complexity of the whole model comparing to calculate the whole activation map and then pooling it.

Tensorflow: Convolve each image with a different kernel

In TensorFlow, how do I convolve each image in a minibatch with a different 2D kernel? Each minibatch of images has size [10000, 32, 32] and the corresponding filter has size [10000, 2, 2]---10000 kernels, each 2 pixels x 2 pixels. I'd like to get output with size [10000, 31, 31]. (I plan to set the stride lengths all to 1 and to use the "VALID" option to turn off padding, so the output images would have size 31x31 while the input images have size 32x32.)
In a related question, the solution was to add a "depth" dimension to the minibatch of images, and then to use conv3d rather than conv2d. But in that problem, the op seemed content to get just one image back as output, rather than one image as output for each sample in the minibatch.
Ah, the tf.nn.depthwise_conv2d function does exactly what I wanted. I don't think there was any way to use conv2d or conv3d for the task.

Resources