PyTorch: Convolving a single channel image using torch.nn.Conv2d - pytorch

I am trying to use a convolution layer to convolve a grayscale (single layer) image (stored as a numpy array). Here is the code:
conv1 = torch.nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 33)
tensor1 = torch.from_numpy(img_gray)
out_2d_np = conv1(tensor1)
out_2d_np = np.asarray(out_2d_np)
I want my kernel to be 33x33 and the number of output layers should be equal to the number of input layers, which is 1 as the image's RGB channels are summed. Whenout_2d_np = conv1(tensor1) is run it yields the following runtime error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 1 1 33 33, but got 2-dimensional input of size [246, 248] instead
Any idea on how I can solve this? I specifically want to use the torch.nn.Conv2d() class/function.
Thanks in advance for any help!

pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions: mini-batch dim, channel dim, and the two spatial dimensions.
Your input tensor has only two spatial dimensions and it lacks the mini-batch and channel dimensions. In your case these two dimensions are actually singelton dimensions (dimensions with size=1).
try:
conv1(tensor1[None, None, ...])

Related

PyTorch high-dimensional tensor through linear layer

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?
In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

why do we reshape grayscale image to (x,y,1)?

I noticed that when training CNN with a grayscale image. The dimensions of the image is reshaped to (x,y,1). I thought that this shouldn't be necessary but when i try with shape (x,y). I get an error
ValueError: Input 0 of layer conv2d is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 28, 28]
As i understand the only reason we are doing this because keras implemented this way. Or is there any other reason for this?
The input shape of of Conv2D layer in keras is: batch_size + (rows, cols, channels). So, the layer expects number of channels as the final input shape which is 1 for grayscale image. For RGB images this would be 3.

Weird output for weights/filters in CNN

My task is to visualize the plotted weights in a cnn layer, now when I passed parameters, filters = 32 and kernel_size = (3, 3), I am expecting the output to be 32 matrices each of 3x3 size by using .get_weights() function(to extract weights and biases), but I am getting a very weird nested output,
the output is as follows:
a = model.layers[0].get_weights()
a[0][0][0]
array([[ 2.87332404e-02, -2.80513391e-02,
**... 32 values ...**,
-1.55516148e-01, -1.26494586e-01, -1.36454999e-01,
1.61165968e-02, 7.63138831e-02],
[-5.21791205e-02, 3.13560963e-02, **... 32 values ...**,
-7.63987377e-02, 7.28923678e-02, 8.98564830e-02,
-3.02852653e-02, 4.07049060e-02],
[-7.04478994e-02, 1.33816227e-02,
**... 32 values ...**, -1.99537817e-02,
-1.67200342e-01, 1.15980692e-02]], dtype=float32)
I want to know that why I am getting this type of weird output and how can I get the weights in the perfect shape. Thanks in advance.
Weights in neural network are values that represent connection strength between input nodes and output nodes(or nodes in next layer).
Conv2D layer's weights usually have shape of (H, W, I, O), where:-
H is kernel height
W is kernel width
I is number of input channels
O is number of output channels
Conv2D weights can be interpreted as connection strength between a patch of input channels and nodes in output filter/feature map. This way you would have weights of shape(H, W) between each Input channels and each Output Channels. It should be noted that the weights are shared among different patches of the same channel.
Consider the following convolution of (8, 8, 1) input with (2, 2) kernel and output with (8, 8, 1). The weights of this layer has shape (2, 2, 1, 1)
The same input can be used to produce 2 feature map using 2 (2, 2) filters as follows. Now the shape of the weights would be (2, 2,1, 2).
Hope this will clarify how to interpret the shape of convolutional layers.
The shape of the kernel weights from a Conv2D layer is (kernel_size[0], kernel_size[1], n_input_channels, filters). So in your case
a = model.layers[0].get_weights()
print(a[0].shape)
# should print (3,3,z,32) if your input has shape (x, y, z)
If you want to print the weights from one of the filters, you can do
a[0][:,:,:,0]

Does 1D Convolutional layer support variable sequence lengths?

I have a series of processed audio files I am using as input into a CNN using Keras. Does the Keras 1D Convolutional layer support variable sequence lengths? The Keras documentation makes this unclear.
https://keras.io/layers/convolutional/
At the top of the documentation it mentions you can use (None, 128) for variable-length sequences of 128-dimensional vectors. Yet at the bottom it declares that the input shape must be a
3D tensor with shape: (batch_size, steps, input_dim)
Given the following example how should I input sequences of variable length into the network
Lets say I have two examples (a and b) containing X 1 dimensional vectors of length 100 that I want to feed into the 1DConv layer as input
a.shape = (100, 100)
b.shape = (200, 100)
Can I use an input shape of (2, None, 100)? Do I need to concatenate these tensors into c where
c.shape = (300, 100)
Then reshape it to be something
c_reshape.shape = (3, 100, 100)
Where 3 is the batch size, 100, is the number of steps, and the second 100 is the input size? The documentation on the input vector is not very clear.
Keras supports variable lengths by using None in the respective dimension when defining the model.
Notice that often input_shape refers to the shape without the batch size.
So, the 3D tensor with shape (batch_size, steps, input_dim) suits perfectly a model with input_shape=(steps, input_dim).
All you need to make this model accept variable lengths is use None in the steps dimension:
input_shape=(None, input_dim)
Numpy limitation
Now, there is a numpy limitation about variable lengths. You cannot create a numpy array with a shape that suits variable lengths.
A few solutions are available:
Pad your sequences with dummy values until they all reach the same size so you can put them into a numpy array of shape (batch_size, length, input_dim). Use Masking layers to disconsider the dummy values.
Train with separate numpy arrays of shape (1, length, input_dim), each array having its own length.
Group your images by sizes into smaller arrays.
Be careful with layers that don't support variable sizes
In convolutional models using variable sizes, you can't for instance, use Flatten, the result of the flatten would have a variable size if this were possible. And the following Dense layers would not be able to have a constant number of weights. This is impossible.
So, instead of Flatten, you should start using GlobalMaxPooling1D or GlobalAveragePooling1D layers.

InvalidArgumentError: logits and labels must have the same first dimension seq2seq Tensorflow

I am getting this error in seq2seq.sequence_loss even though first dim of logits and labels has same dimension, i.e. batchSize
I have created a seq2seq model in TF 1.0 version. My loss function is as follows :
logits = self.decoder_logits_train
targets = self.decoder_train_targets
self.loss = seq2seq.sequence_loss(logits=logits, targets=targets, weights=self.loss_weights)
self.train_op = tf.train.AdamOptimizer().minimize(self.loss)
I am getting following error on running my network while training :
InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [1280,150000] and labels shape [1536]
[[Node: sequence_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](sequence_loss/Reshape, sequence_loss/Reshape_1)]]
I confirm the shapes of logits and targets tensors as follows :
a,b = sess.run([model.decoder_logits_train, model.decoder_train_targets], feed_dict)
print(np.shape(a)) # (128, 10, 150000) which is (BatchSize, MaxSeqSize, Vocabsize)
print(np.shape(b)) # (128, 12) which is (BatchSize, Max length of seq including padding)
So, since the first dimension of targets and logits are same then why I am getting this error ?
Interestingly, in error u can observe that the dimension of logits is mentioned as (1280, 150000), which is (128 * 10, 150000) [product of first two dimension, vocab_size], and same for targets i.e. (1536), which is (128*12), again product of first two dimension ?
Note : Tensorflow 1.0 CPU version
maybe your way of padding wrong. if you padded _EOS to the end of target seq, then the max_length(real length of target sentence) should add 1 to be [batch, max_len+1]. Since you padded _GO and _EOS, your target sentence length should add 2, which makes it equals 12.
I read some other people's implementation of NMT, they only padded _EOS for target sentence, while _GO for input of decoder. Tell me if I'm wrong.
I had the same error as you and I understood the problem:
The problem:
You run the decoder using this parameters:
targets are the decoder_inputs. They have length max_length because of padding. Shape: [batch_size, max_length]
sequence_length are the non-padded-lengths of all the targets of your current batch. Shape: [batch_size]
Your logits, that are the output tf.contrib.seq2seq.dynamic_decode has shape:
[batch_size, longer_sequence_in_this_batch, n_classes]
Where longer_sequence_in_this_batch is equal to tf.reduce_max(sequence_length)
So, you have a problem when computing the loss because you try to use both:
Your logits with 1st dimension shape longer_sequence_in_this_batch
Your targets with 1st dimension shape max_length
Note that longer_sequence_in_this_batch <= max_length
How to fix it:
You can simply apply some padding to your logits.
logits = self.decoder_logits_train
targets = self.decoder_train_targets
paddings = [[0, 0], [0, max_length-tf.shape(logits)[1]], [0, 0]]
padded_logits = tf.pad(logits, paddings, 'CONSTANT', constant_values=0)
self.loss = seq2seq.sequence_loss(logits=padded_logits, targets=targets,
weights=self.loss_weights)
Using this method,you ensure that your logits will be padded as the targets and will have dimension [batch_size, max_length, n_classes]
For more information about the pad function, visit
Tensorflow's documentation
The error message seems to be a bit misleading, as you actually need first and second dimensions to be the same. This is written here:
logits: A Tensor of shape [batch_size, sequence_length,
num_decoder_symbols] and dtype float. The logits correspond to the
prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype
int. The target represents the true class at each timestep.
This also makes sense, as logits are probability vectors, while targets represent the real output, so they need to be of the same length.

Resources