Apologies if this has been answered already but I couldn't find it.
I have a binary classification problem which I have been using CrossEntropyLoss for which expects the following input and target tensor dimensions:
I have switched to using BCEWithLogitsLoss with pre-calculated class weights to address a label imbalance problem. The issue is BCEWithLogitsLoss expects the following dimensions:
How would I go about shaping my 1D tensor, ie. tensor([[0, 0, 0, ..., 1, 0, 0]]) to the shape of my inputs which is (No. of X samples, 2)? I have tried .unsqueeze(1) but this gives me (#X, 1). To clarify, my input shape in the prior problem was [32,2] with target shape [32] as per above documentation and I am looking for [32,2] for both input and target dims.
I ended up figuring the problem out, didn't realise there was built-in one-hot encoding functionality in Pytorch.
I used the line torch.nn.functional.one_hot(targets) which reshaped my target variable to torch.Size([32, 2]).
Related
I have a sequece labeling task.
So as input, I have a sequence of elements with shape [batch_size, sequence_length] and where each element of this sequence should be assigned with some class.
And as a loss function during training a neural net, I use a Cross-entropy.
How should I correctly use it?
My variable target_predictions has shape [batch_size, sequence_length, number_of_classes] and target has shape [batch_size, sequence_length].
Documentation says:
I know if I use CrossEntropyLoss(target_predictions.permute(0, 2, 1), target), everything will work fine. But I have concerns that torch is intepreting my sequence_length as d_1 variable as on screenshot and will think that it is a multidimential loss, which is not the case.
How should I correctly do it?
Using CE Loss will give you loss instead of labels. By default mean will be taken which is what you are probably after and the snippet with permute will be fine (using this loss you can train your nn via backward).
To get predicted class just take argmax across appropriate dimension, in the case without permutation it would be:
labels = torch.argmax(target_predictions, dim=-1)
This will give you (batch, sequence_length) output containing classes.
I'm a newbie in deep learning and confused which data format convention should be used when. According to the https://keras.io/backend/, there are two data format conventions.
channels_last for 2D data: (rows, cols, channels)
channels_first: for 2D data: (channels, rows, cols)
Why there is a channels_first option in Keras? When should I use it? Is there any historical reason like BGR usage in OpenCV?
" BGR was a choice made for historical reasons and now we have to live with it. In other words, BGR is the horse’s ass in OpenCV."
https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
I believe the reason that there are two data formats, is that Keras supports Theano as another backend too. In Theano, the first axis represents the channels.
Tensorflow
data_format accepts 2 values- channels_last (default) or channels_first.
Its represents the ordering of the dimensions in the inputs.
channels_last corresponds to inputs with shape (batch_size, height, width, channels)
channels_first corresponds to inputs with shape (batch_size, channels, height, width).
Code: tf.keras.layers.ZeroPadding2D(padding=(3, 3), input_shape=(64, 64, 3), data_format='channels_last')
No batch size
Look at Tensorflow document
Is it possible to reshape 512x512 rgb image to (timestep, dim)? Otherwards, I am trying to convert this reshape layer: Reshape((23, 3887)) to 512 vice 299. Also, is there any documentation explaining how to determine input_dim and timestep for Keras?
It seems like your problem is similar to one that i had earlier today. Look at it here: Keras functional API: Combine CNN model with a RNN to to look at sequences of images
Now to add to the answer from the question i linked too. Let number_of_images be n. In your case the original data format would be (n, 512, 512, 3). All you then need to do decide how many images you want per sequence. Say you want a sequence of 5 images and have gotten 5000 images in total. Then reshaping to (1000, 5, 512, 512, 3) should do. This way the model sees 1000 sequences of 5 images.
In TensorFlow, how do I convolve each image in a minibatch with a different 2D kernel? Each minibatch of images has size [10000, 32, 32] and the corresponding filter has size [10000, 2, 2]---10000 kernels, each 2 pixels x 2 pixels. I'd like to get output with size [10000, 31, 31]. (I plan to set the stride lengths all to 1 and to use the "VALID" option to turn off padding, so the output images would have size 31x31 while the input images have size 32x32.)
In a related question, the solution was to add a "depth" dimension to the minibatch of images, and then to use conv3d rather than conv2d. But in that problem, the op seemed content to get just one image back as output, rather than one image as output for each sample in the minibatch.
Ah, the tf.nn.depthwise_conv2d function does exactly what I wanted. I don't think there was any way to use conv2d or conv3d for the task.
This layer in not ready documented very well and I'm having a bit of trouble figuring out exactly how to use it.
I'm Trying something like:
input_img = Input(shape=(1, h, w))
x = Convolution2D(16, 7, 7, activation='relu', border_mode='valid')(input_img)
d = Deconvolution2D(1, 7, 7, (None, 1, 2*h, 2*w))
x = d(x)
but when I try to write d.output_shape, I get the original shape of the image instead of twice that size (which is what I was expecting).
Any help will be greatly appreciated!
Short answer: you need to add subsample=(2,2) to Deconvolution2D if you wish the output to truly be twice as large as the input.
Longer answer: Deconvolution2D is severely undocumented and you have to go through its code to understand how to use it.
First, you must understand how the deconvolution layer works (skip this if you already know all the details). Deconvolution, unlike what its name suggest, is simply applying the back-propgation (gradient calculation method) of a standard convolution layer on the input to the deconvolution layer. The "kernel size" of the deconvolution layer is actually the kernel size of the virtual convolution layer of the backprop step mentioned above. While given the size of a convolution kernel and its stride, it is straightforward to compute the output shape of the convolution layer (assuming no padding it's (input - kernel) // stride + 1), but the reverse is not true. In fact, there can be more than one possible input shapes that matches a given output shape of the convolution layer (this is because integer division isn't invertible). This means that for a deconvolution layer, the output shape cannot be directly determined simply from the input shape (which is implicitly known), kernel size and stride - this is why we need to know the output shape when we initialize the layer. Of course, because of the way the deconvolution layer is defined, for some input shapes you'll get holes in its output which are undefined, and if we forbid these cases then we actually can deduce the output shape.
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).