In TensorFlow, how do I convolve each image in a minibatch with a different 2D kernel? Each minibatch of images has size [10000, 32, 32] and the corresponding filter has size [10000, 2, 2]---10000 kernels, each 2 pixels x 2 pixels. I'd like to get output with size [10000, 31, 31]. (I plan to set the stride lengths all to 1 and to use the "VALID" option to turn off padding, so the output images would have size 31x31 while the input images have size 32x32.)
In a related question, the solution was to add a "depth" dimension to the minibatch of images, and then to use conv3d rather than conv2d. But in that problem, the op seemed content to get just one image back as output, rather than one image as output for each sample in the minibatch.
Ah, the tf.nn.depthwise_conv2d function does exactly what I wanted. I don't think there was any way to use conv2d or conv3d for the task.
Related
Apologies if this has been answered already but I couldn't find it.
I have a binary classification problem which I have been using CrossEntropyLoss for which expects the following input and target tensor dimensions:
I have switched to using BCEWithLogitsLoss with pre-calculated class weights to address a label imbalance problem. The issue is BCEWithLogitsLoss expects the following dimensions:
How would I go about shaping my 1D tensor, ie. tensor([[0, 0, 0, ..., 1, 0, 0]]) to the shape of my inputs which is (No. of X samples, 2)? I have tried .unsqueeze(1) but this gives me (#X, 1). To clarify, my input shape in the prior problem was [32,2] with target shape [32] as per above documentation and I am looking for [32,2] for both input and target dims.
I ended up figuring the problem out, didn't realise there was built-in one-hot encoding functionality in Pytorch.
I used the line torch.nn.functional.one_hot(targets) which reshaped my target variable to torch.Size([32, 2]).
I have a network where a vector of size [1,28] is generated for every input in a batch (of size 128). Now I want to access the values in these vectors for the whole batch, so I want to return all 28 values for the 128 inputs in the batch. In other words, I want to loop over a tensor of size [128, 1, 28]. I could use multiple for-loops, but is there a nice pythonic way to achieve this?
Edit:
I'm asking because I want to eventually compute the gradients of each of the z_i in the vector z with respect to one of my network layer's output. I'm working with a variational autoencoder, where the encoder network outputs a mean and variance which is reparameterized to a latent vector z of size [1, 28]. Now given this vector z, I want to compute gradients for each element z_i wrt to the output of an earlier layer in the encoder. I hope this clarifies things!
I have a sequence of multi-band images, say each sample is a tensor of size (50, 6, 30, 30) where 50 is the number of image frames in sequence, 6 is number of bands per pixel, and 30x30 is the spatial dimension of the image. The ground truth map is of size 30x30, but it is one-hot encoded (to use crossentropy loss) o 7 classes, so it is a tensor of size (1, 7, 30, 30).I want to use a combination of convolutional and LSTM (or use an integrated ConvLSTM2D layer) for my classification task, but there are below problems:
1- Not every point has a valid label at the output map (i.e. some one-hot vectors are all-zero),
2- Not every pixel has a valid value in every time stamp. So, at every given time stamp, some of the pixels may have zero value (means invalid) for all of their band values.
I read many Q&As on how to handle this issue and I think I should use sample_weights option to mask the invalid points and classes but I am really uncertain how to do it. Sample_weights should be applied to every pixel and each timestamp independently. I think I can manage it if I didn't have the convolution part (a 2D approach). But don't understand how it works when convolution is in place, because some pixel values in convolution window are valid and some are invalid.If I mask those invalid pixels at a specific time (that still I don't know how to do it), what will happen to the chain of forward and backward propagation and loss calculation? I think it will be ruined!
Looking for comments and help.
Possible solution:
Problem 1- For pixels where do not have class at all you can introduce a new class with a label for example noise,
it means not in your one hot encode you have value for that as well and weights will be generated accordingly for those pixels for noise class
this is an indirect way to achieve the same thing you do with sample weight
cause in the sample_weight technique you tell keras or sklearn that what is the weightage of the parameter or sample ratio of the weights.
Problem 2- To answer part 2 consider the possible use cases for example for these invalid values class value can be there in hot encode vector or it will be all zeros?
or you can preprocess and add these to the noise class as well then point 2 will be handled by point 1 automatically.
I'm a newbie in deep learning and confused which data format convention should be used when. According to the https://keras.io/backend/, there are two data format conventions.
channels_last for 2D data: (rows, cols, channels)
channels_first: for 2D data: (channels, rows, cols)
Why there is a channels_first option in Keras? When should I use it? Is there any historical reason like BGR usage in OpenCV?
" BGR was a choice made for historical reasons and now we have to live with it. In other words, BGR is the horse’s ass in OpenCV."
https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
I believe the reason that there are two data formats, is that Keras supports Theano as another backend too. In Theano, the first axis represents the channels.
Tensorflow
data_format accepts 2 values- channels_last (default) or channels_first.
Its represents the ordering of the dimensions in the inputs.
channels_last corresponds to inputs with shape (batch_size, height, width, channels)
channels_first corresponds to inputs with shape (batch_size, channels, height, width).
Code: tf.keras.layers.ZeroPadding2D(padding=(3, 3), input_shape=(64, 64, 3), data_format='channels_last')
No batch size
Look at Tensorflow document
Is it possible to reshape 512x512 rgb image to (timestep, dim)? Otherwards, I am trying to convert this reshape layer: Reshape((23, 3887)) to 512 vice 299. Also, is there any documentation explaining how to determine input_dim and timestep for Keras?
It seems like your problem is similar to one that i had earlier today. Look at it here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
Now to add to the answer from the question i linked too. Let number_of_images be n. In your case the original data format would be (n, 512, 512, 3). All you then need to do decide how many images you want per sequence. Say you want a sequence of 5 images and have gotten 5000 images in total. Then reshaping to (1000, 5, 512, 512, 3) should do. This way the model sees 1000 sequences of 5 images.