How to upsample a PyTorch tensor? - pytorch

I have a PyTorch tensor of size (1, 4, 128, 128) (batch, channel, height, width), and I want to 'upsample' it to (1, 3, 256, 256)
I thought to use interpolate (a function in nn.functional)
However, reading the documentation, and applying this function I am able to get in output a shape (1, 4, 256, 256), so maybe it is not the function that I am looking for. The code that I used is the following:
import torch.nn as nn
#x.shape -> (1,4,128,128)
x_0 = nn.functional.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
#x_0.shape -> (1,4,256,256)
How can I do that (from (1, 4, 128, 128) to (1, 3, 256, 256))?
To follow there is the network that I am trying to replicate, but I got stack in the upsample layer.

What about PyTorch's nn.Upsample function:
upsample = nn.Upsample(scale_factor=2)
x = upsample(x)
Not sure if that's what you are looking for since you want the second dimension to change from 4 to 3.

Related

Understanding input shape to PyTorch conv1D?

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.
I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.
So my input tensor to conv1D is of shape [6, 512, 768].
input = torch.randn(6, 512, 768)
Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.
Understanding 1:
I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where
in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)
But with this assumption, I get the following error:
RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead
Understanding 2:
Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where
in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(512, 100, 2)
feature_map = convolution_layer(input)
This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?
I will be really grateful for your help.
In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].
Given an input of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose.
input = input.transpose(1, 2).contiguous()
The .contiguous() ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.
I found an answer to it (source).
So, usually, BERT outputs vectors of shape
[batch_size, sequence_length, embedding_dim].
where,
sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).
thus, input = torch.randn(batch_size, 512, 768)
Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.
So, we define a PyTorch conv1D layer as follows,
convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)
where,
in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)
thus, convolution_layer = nn.conv1d(768, 100, 2)
Now we need a connecting link between the expected input by convolution_layer and the actual input.
For this, we require to
current input shape [batch_size, 512, 768]
expected input [batch_size, 768, 512]
To achieve this expected input shape, we need to use the transpose function from PyTorch.
input_transposed = input.transpose(1, 2)
I have a suggestion for you which may not be what you asked for but might help. Because your input is (6, 512, 768) you can use conv2d instead of 1d.
All you need to do is to add a dimension of 1 at index 1: input.unsqueeze(1) which works as your channel (consider it as a grayscale image)
def forward(self, x):
x = self.embedding(x) # [Batch, seq length, Embedding] = [5, 512, 768])
x = torch.unsqueeze(x, 1) # [5, 1, 512, 768]) # like a grayscale image
and also for your conv2d layer, you can define like this:
window_size=3 # for trigrams
EMBEDDING_SIZE = 768
NUM_FILTERS = 10 # or whatever you want
self.conv = nn.Conv2d(in_channels = 1,
out_channels = NUM_FILTERS,
kernel_size = [window_size, EMBEDDING_SIZE],
padding=(window_size - 1, 0))```

how to process a batch data when batch_size=None in Keras

suppose I have data with shape (None, 32, 24) in Keras, means batch_size=None, I have a shared weight W with shape (24, 12), for each example X with shape (32, 24) in a batch , I want do matrix multiply Y = X # W, shape of Y is (32, 12), the shape of final output is (None, 32, 12).
how can I get the final output in Keras?
I try to get the batch_size with inputs.get_shape()[0] and use it in range() function to do the matrix multiply, use K.stack to get the final output, but batch_size is NoneType, range() doesn't support it.

How to change shape of keras tensors during model building?

I have an autoencoder model that looks like:
input_img = Input(shape=(128, 128, 1))
x = Conv2D(...)(input_img)
x = MaxPooling2D(...)(x)
...
out = Conv2D(1, (3, 3), activation='sigmoid')(x)
my_model = Model(input_img, decoded)
my_model.compile(optimizer='adadelta', loss=my_custom_loss)
Currently, the shape of out is the same as input_img, let's say (None, 128, 128, 1). I would like to manipulate out to have shape (None, 128, 128, 2). The contents of that extra block are irrelevant.
In case anyone is wondering why I want to do this: it's because I intend to have some extra information in the ground truth values y_true so that I can define a specific cost function, my_custom_loss. It seems like in order to do this, the shape of out must match the shape of y_true when defining the cost function...
If the contents are truly irrelevant you can pad it with zeros. It could look something like:
zeros = Lambda(lambda x: K.zeros_like(x))(out) # (None, 128, 128, 1)
out = Concatenate([out, zeros]) # (None, 128, 128, 2)
Make sure you ignore the extra padding you are adding.

Concatenation of Keras parallel layers changes wanted target shape

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the first model (without the LSTMs) I get the following error:
"ValueError: Error when checking target: expected dense_3 to have shape (None, 120, 40) but got array with shape (8, 40, 1)"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim=T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
# the window sizes (seq_length?) are 40, 1075, 465, 72 and 1246 for the kettle, dish washer,
# fridge, microwave, oven and washing machine, respectively.
def ae_net(T):
input_layer = Input(shape= (T,))
branch_a = layers.Conv1D(32, 3, activation= 'linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation= 'linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation= 'linear', padding='same', strides=1)(input_layer)
merge_layer = layers.concatenate([branch_a, branch_b, branch_c], axis=1)
dense_1 = layers.Dense(128, activation='relu')(merge_layer)
dense_2 =layers.Dense(128, activation='relu')(dense_1)
output_dense = layers.Dense(T, activation='linear')(dense_2)
model = Model(input_layer, output_dense)
return model
model = ae_net(40)
model.compile(loss= 'mean_absolute_error', optimizer='rmsprop')
model.fit(X, y, batch_size= 8)
where X and y are numpy arrays of 8 sequences of a length of 40 values. So X.shape and y.shape are (8, 40, 1). It's actually one batch of data. The thing is I cannot understand how the output would be of shape (None, 120, 40) and what these sizes would mean.
As you noted, your shapes contain batch_size, length and channels: (8,40,1)
Your three convolutions are, each one, creating a tensor like (8,40,32).
Your concatenation in the axis=1 creates a tensor like (8,120,32), where 120 = 3*40.
Now, the dense layers only work on the last dimension (the channels in this case), leaving the length (now 120) untouched.
Solution
Now, it seems you do want to keep the length at the end. So you won't need any flatten or reshape layers. But you will need to keep the length 40, though.
You're probably doing the concatenation in the wrong axis. Instead of the length axis (1), you should concatenate in the channels axis (2 or -1).
So, this should be your concatenate layer:
merge_layer = layers.Concatenate()([branch_a, branch_b, branch_c])
#or layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
This will output (8, 40, 96), and the dense layers will transform the 96 in something else.

TensorFlow: Why does avg_pool ignore one stride dimension?

I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.
The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.
Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112

Resources