PyTorch high-dimensional tensor through linear layer - pytorch

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?

In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

Related

Pytorch semantic segmentation loss function

I’m new to segmentation model.
I would like to use the deeplabv3_resnet50 model.
My image has shape (256, 256, 3) and my label has shape (256, 256). Each pixel in my label has a class value(0-4). And the batch size set in the DataLoader is 32.
Therefore, the shape of my input batch is [32, 3, 256, 256] and the shape of corresponding target is [32, 256, 256]. I believe this is correct.
I was trying to use nn.BCEWithLogitsLoss().
Is this the correct loss function for my case? Or should I use
CrossEntropy instead?
If this is the right one, the output of my model is [32, 5, 256, 256]. Each image prediction has the shape [5,256, 256], does layer 0 means the unnomarlized probabilities of class 0? In order to make a [32, 256, 256] tensor to match the target to feed into the BCEWithLogitsLoss, do I need to transform the unnomarlized probabilities to classes?
If I should use CrossEntropy, what the size of my output and label should be?
Thank you everyone.
You are using the wrong loss function.
nn.BCEWithLogitsLoss() stands for Binary Cross-Entropy loss: that is a loss for Binary labels. In your case, you have 5 labels (0..4).
You should be using nn.CrossEntropyLoss: a loss designed for discrete labels, beyond the binary case.
Your models should output a tensor of shape [32, 5, 256, 256]: for each pixel in the 32 images of the batch, it should output a 5-dim vector of logits. The logits are the "raw" scores for each class, to be later on normalize to class probabilities using softmax function.
For numerical stability and computational efficiency, nn.CrossEntropyLoss does not require you to explicitly compute the softmax of the logits, but does it internally for you. As the documentation read:
This criterion combines LogSoftmax and NLLLoss in one single class.
Given you are dealing with 5 classes, you should use CrossEntropyLoss. Binary cross-entropy, as the name suggests is a loss function you use when you have a binary segmentation map.
The CrossEntropy function, in PyTorch, expects the output from your model to be of the shape - [batch, num_classes, H, W](pass this directly to your loss function) and the ground truth to be of shape [batch, H, W] where H, W in your case is 256, 256. Also please make sure the ground truth is of type long by calling .long() on the tensor

What exactly does tf.keras.layers.Dense do?

My question
I'm using the Keras to build a convolutional neural network. I ran across the following:
model = tf.keras.Sequential()
model.add(layers.Dense(10*10*256, use_bias=False, input_shape=(100,)))
I'm curious - what exactly mathematically is going on here?
My best guess
My guess is that for input of size [100,N], the network will be evaluated N times, once for each training example. The Dense layer created by layers.Dense contains (10*10*256) * (100) parameters that will be updated during backpropagation.
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
Note: If the input to the layer has a rank greater than 2, then it is
flattened prior to the initial dot product with kernel.
Example:
# as first layer in a sequential model:
model = Sequential()
model.add(Dense(32, input_shape=(16,)))
# now the model will take as input arrays of shape (*, 16)
# and output arrays of shape (*, 32)
# after the first layer, you don't need to specify
# the size of the input anymore:
model.add(Dense(32))
Arguments :
> units: Positive integer, dimensionality of the output space.
> activation: Activation function to use. If you don't specify anything,
> no activation is applied (ie. "linear" activation: a(x) = x).
> use_bias: Boolean, whether the layer uses a bias vector.
> kernel_initializer: Initializer for the kernel weights matrix.
> bias_initializer: Initializer for the bias vector.
>kernel_regularizer:Regularizer function applied to the kernel weights matrix.
> bias_regularizer: Regularizer function applied to the bias vector.
> activity_regularizer: Regularizer function applied to the output of the layer (its "activation")..
>kernel_constraint: Constraint function applied to the kernel weights matrix.
>bias_constraint: Constraint function applied to the bias vector.
Input shape:
N-D tensor with shape: (batch_size, ..., input_dim). The most common situation would be a 2D input with shape (batch_size, input_dim).
Output shape:
N-D tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units).

PyTorch: Convolving a single channel image using torch.nn.Conv2d

I am trying to use a convolution layer to convolve a grayscale (single layer) image (stored as a numpy array). Here is the code:
conv1 = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 33)
tensor1 = torch.from_numpy(img_gray)
out_2d_np = conv1(tensor1)
out_2d_np = np.asarray(out_2d_np)
I want my kernel to be 33x33 and the number of output layers should be equal to the number of input layers, which is 1 as the image's RGB channels are summed. Whenout_2d_np = conv1(tensor1) is run it yields the following runtime error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 1 1 33 33, but got 2-dimensional input of size [246, 248] instead
Any idea on how I can solve this? I specifically want to use the torch.nn.Conv2d() class/function.
Thanks in advance for any help!
pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions: mini-batch dim, channel dim, and the two spatial dimensions.
Your input tensor has only two spatial dimensions and it lacks the mini-batch and channel dimensions. In your case these two dimensions are actually singelton dimensions (dimensions with size=1).
try:
conv1(tensor1[None, None, ...])

Does 1D Convolutional layer support variable sequence lengths?

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.
Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

How to handle variable shape bias in TensorFlow?

I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results
You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)

Resources