Multi-dimensional tensor dot product in pytorch - pytorch

I have two tensors of shapes (8, 1, 128) as follows.
q_s.shape
Out[161]: torch.Size([8, 1, 128])
p_s.shape
Out[162]: torch.Size([8, 1, 128])
Above two tensors represent a batch of eight 128 dimensional vectors. I want the dot product of batch q_s with batch p_s. How can I do this? I tried to use torch.tensordot function as follows. It works as expected as well. But it also does the extra work, which I don't want it to do. See the following example.
dt = torch.tensordot(q_s, p_s, dims=([1,2], [1,2]))
dt
Out[176]:
tensor([[0.9051, 0.9156, 0.7834, 0.8726, 0.8581, 0.7858, 0.7881, 0.8063],
[1.0235, 1.5533, 1.2155, 1.2048, 1.3963, 1.1310, 1.1724, 1.0639],
[0.8762, 1.3490, 1.2923, 1.0926, 1.4703, 0.9566, 0.9658, 0.8558],
[0.8136, 1.0611, 0.9131, 1.1636, 1.0969, 0.9443, 0.9587, 0.8521],
[0.6104, 0.9369, 0.9576, 0.8773, 1.3042, 0.7900, 0.8378, 0.6136],
[0.8623, 0.9678, 0.8163, 0.9727, 1.1161, 1.6464, 0.9765, 0.7441],
[0.6911, 0.8392, 0.6931, 0.7325, 0.8239, 0.7757, 1.0456, 0.6657],
[0.8493, 0.8174, 0.8041, 0.9013, 0.8003, 0.7451, 0.7408, 1.1771]],
grad_fn=<AsStridedBackward>)
dt.shape
Out[177]: torch.Size([8, 8])
As we can see, this produces the tensor of size (8,8) with the dot products I want lying on the diagonal. Is there any different way to obtain a smaller required tensor of shape (8,1), which just contains the elements lying on the diagonal in above result. To be more clear, the elements lying on the diagonal are the correct required dot products we want as a dot product of two batches. Element at index [0][0] is dot product of q_s[0] and p_s[0]. Element at index [1][1] is dot product of q_s[1] and p_s[1] and so on.
Is there a better way to obtain the desired dot product in pytorch?

You can do it directly:
a = torch.rand(8, 1, 128)
b = torch.rand(8, 1, 128)
torch.sum(a * b, dim=(1, 2))
# tensor([29.6896, 30.4994, 32.9577, 30.2220, 33.9913, 35.1095, 32.3631, 30.9153])
torch.diag(torch.tensordot(a, b, dim=([1,2], [1,2])))
# tensor([29.6896, 30.4994, 32.9577, 30.2220, 33.9913, 35.1095, 32.3631, 30.9153])
If you set axis=2 in the sum you will get a tensor with shape (8, 1).

Related

Interleaving FFT real & complex parts in a PyTorch tensor

I have a use-case where I have to do FFT for a given tensor as. Here, FFT is applied to each of the 10 rows, in a column-wise manner which gives the dimension (10, 11) post FFT.
# Random data-
x = torch.rand((10, 20))
# Compute RFFT of 'x'-
x_fft = torch.fft.rfft(x)
# Sanity check-
x.shape, x_fft.shape
# (torch.Size([10, 20]), torch.Size([10, 11]))
# FFT for the first 2 rows are-
x_fft[:2, :]
'''
tensor([[12.2561+0.0000j, 0.7551-1.2075j, 1.1119-0.0458j, -0.2814-1.5266j,
1.4083-0.7302j, 0.6648+0.3311j, 0.3969+0.0632j, -0.8031-0.1904j,
-0.4206+0.9066j, -0.2149+0.9160j, 0.4800+0.0000j],
[ 9.8967+0.0000j, -0.5100-0.2377j, -0.6344+2.2406j, 0.4584-1.0705j,
0.2235+0.4788j, -0.3923+0.8205j, -1.0372-0.0292j, -1.6368+0.5517j,
1.5093+0.0419j, 0.5755-1.2133j, 2.9269+0.0000j]])
'''
# The goal is to have for each row, 1-D vector (of size = 11) as follows:
# So, for first row, the desired 1-D vector (size = 11) is-
[12.2561, 0.0000, 0.7551, -1.2075, 1.1119, -0.0458, -0.2814, -1.5266,
1.4083, -0.7302, 0.6648, 0.3311, 0.3969, 0.0632, -0.8031, -0.1904,
-0.4206, 0.9066, -0.2149, 0.9160, 0.4800, 0.0000]
'''
Here, you are taking the real and imaginary components and placing them adjacent to each other.
Adjacent means:
[a_1_real, a_1_imag, a_2_real, a_2_imag, a_3_real, a_3_imag, ....., a_n_real, a_n_imag]
Since for each row, you get 11 FFT complex numbers, a_n = a_11.
How to go about it?
Your question seems to come down to: how to interleave two tensors together. Given x and y the two tensors. You can do so with a combination of transpose and reshape.
>>> torch.stack((x,y),1).transpose(1,2).reshape(2,-1)
tensor([[ 1.1547e+01, 0.0000e+00, 1.3786e+00, -8.1970e-01, -3.2118e-02,
-2.3900e-02, -3.2898e-01, -3.4610e-01, -1.7916e-01, 1.2308e+00,
-5.4203e-01, 1.2580e-01, 8.5273e-01, 8.9980e-01, -2.7096e+00,
-3.8060e-01, 3.0016e-01, -4.5240e-01, -7.7809e-02, 4.5630e-01,
-4.5805e-03, 0.0000e+00],
[ 1.1106e+01, 0.0000e+00, 1.3362e-01, 1.3830e-01, -7.4233e-01,
7.7570e-01, -9.9461e-01, 1.0834e+00, 1.6952e+00, 5.2920e-01,
-1.1884e+00, -2.5970e-01, -8.7958e-01, 4.3180e-01, -9.3039e-01,
8.8130e-01, -1.0048e+00, 1.2823e+00, 2.0595e-01, -6.5170e-01,
1.7209e+00, 0.0000e+00]])

Expand the tensor by several dimensions

In PyTorch, given a tensor of size=[3], how to expand it by several dimensions to the size=[3,2,5,5] such that the added dimensions have the corresponding values from the original tensor. For example, making size=[3] vector=[1,2,3] such that the first tensor of size [2,5,5] has values 1, the second one has all values 2, and the third one all values 3.
In addition, how to expand the vector of size [3,2] to [3,2,5,5]?
One way to do it I can think is by means of creating a vector of the same size with ones-Like and then einsum but I think there should be an easier way.
You can first unsqueeze the appropriate number of singleton dimensions, then expand to a view at the target shape with torch.Tensor.expand:
>>> x = torch.rand(3)
>>> target = [3,2,5,5]
>>> x[:, None, None, None].expand(target)
A nice workaround is to use torch.Tensor.reshape or torch.Tensor.view to do perform multiple unsqueezing:
>>> x.view(-1, 1, 1, 1).expand(target)
This allows for a more general approach to handle any arbitrary target shape:
>>> x.view(len(x), *(1,)*(len(target)-1)).expand(target)
For an even more general implementation, where x can be multi-dimensional:
>>> x = torch.rand(3, 2)
# just to make sure the target shape is valid w.r.t to x
>>> assert list(x.shape) == list(target[:x.ndim])
>>> x.view(*x.shape, *(1,)*(len(target)-x.ndim)).expand(target)

Convolution with RGB images - what values does a RGB filter hold?

Convolution for a grayscale image is straightforward. You have a filter of shape nxnx1and convolve the input image to extract whatever features you desire.
I also understand how convolution would work for a RGB image. The filter would have a shape of nxnx3. However, would all 3 'layers' in the filter hold the same kernel? For example, if the 0th layer a map as shown below, would layer 1 and 2 also hold the exact values? I am asking in regards to Convolutional Neural Networks and not conventional image processing. I understand the weights of each filter are learned and are randomized initially, am I correct in thinking that each layer would have different randomized values?
Would all 3 'layers' in the filter hold the same kernel?
The short answer is no. The longer answer is, there isn't a kernel per layer, but instead just one kernel which handles all input and output layer at once.
The code below shows step by step how one would calculate each convolution manually, and from this we can see that at a high level the calculation goes like this:
take a patch from the batch of images (BatchSize x 3x3x3 in your case)
flatten it [BatchSize, 27]
matrix multiply it by the reshaped kernel [27, output_filters]
add in the bias of shape [output_filters]
All the colors are processed at once using matrix multiplication with the kernel matrix. If we think about the kernel matrix, we can see that the values in the kernel matrix that are used to generate the first filter are in the first column, and the values to generate the second filter are in the second column. So, indeed, the values are different and not reused, but they are not stored or applied separately.
The code walkthrough
import tensorflow as tf
import numpy as np
# Define a 3x3 kernel that after convolution will create an image with 2 filters (channels)
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=3)
# Lets create a random input image
starting_image = np.array( np.random.rand(1,4,4,3), dtype=np.float32)
# and process it
result = conv_layer(starting_image)
weight, bias = conv_layer.get_weights()
print('size of weight', weight.shape)
print('size of bias', bias.shape)
size of weight (3, 3, 3, 2)
size of bias (2,)
# The output of the convolution of the 4x4x3 image input
# is a 2x2x2 output (because we don't have padding)
result.numpy()
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
# Now let's see how we can recreate this using the weights
# The way convolution is done is to extract a patch
# the size of the kernel (3x3 in this case)
# We will use the first patch, the first three rows and columns and all the colors
patch = starting_image[0,:3,:3,:]
print('patch.shape' , patch.shape)
# Then we flatten the patch
flat_patch = np.reshape( patch, [1,-1] )
print('New shape is', flat_patch.shape)
patch.shape (3, 3, 3)
New shape is (1, 27)
# next we take the weight and reshape it to be [-1,filters]
flat_weight = np.reshape( weight, [-1,2] )
print('flat_weight shape is ',flat_weight.shape)
flat_weight shape is (27, 2)
# we have the patch of shape [1,27] and the weight of [27,2]
# doing a matric multiplication of the two shapes [1,27]*[27,2] = a shape of [1,2]
# which is the output we want, 2 filter outputs for this patch
output_for_patch = np.matmul(flat_patch,flat_weight)
# but we haven't added the bias yet, so lets do that
output_for_patch = output_for_patch + bias
# Finally, we can see that our manual calculation matches
# what Conv2D does exactly for the first patch
output_for_patch
array([[-0.34940773, -0.64269245]], dtype=float32)
If we compare this to the full convolution above, we can see that this is exactly the first patch
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
We would repeat this process for each patch. If we want to optimize this code some more, instead of passing only one image patch at a time [1,27] we can pass [batch_number,27] patches at a time and the kernel will process them all at once returning [batch_number,filter_size].

Wrap two tensors in pytorch to get size of new tensor as 2

I have two tensors say x and y:
x has shape: [21314, 3, 128, 128]
y has shape: [21314]
Can I get new tensor of shape : [ [21314, 3, 128, 128], [21314] ], basically of shape 2
I believe it's not possible, if you require to save it as a tensor object. Of course, you can use a list or a tuple for this case, but I guess that was not what you meant.
First, a tensor is simply a generalization of a matrix for n dimentions instead of two. But let's simplify this for a matrix for now, for example 4x3. The first dimention is of size 4, that means 4 entries. A second dimention of 3 means that each of the 4 first dimention entries will have exactly (and not less then) 3 entries. That is, you must have full list of 3 elements in each nested list. In this simple example, note that you cannot have a matrix like that one:
[[1,2,3]
[1,2]
[1] ]
while this is a nested list it's not a matrix and also not a tensor of 2d. What i'm trying to say is that the shape your requested - [ [21314, 3, 128, 128], [21314] ] - is actually not a tensor.
But, you could have think of it as a tensor of size two, with data type of tensor in each entry (what you probably ment when asking the question). Though this is not possible since tensors in pytorch holds only numbers of types: float32, float64, float16, uint8, int8, int16, int32, int64, bool.
Nevertheless, in most cases you can achieve what you need with assigning two tensors to a list or tuple.

Expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]

I read this question but it doesnt seem to answer my question :(.
So basically I'm trying to vectorize the game snake so it can run faster.
Here is my code till now:
import torch
import torch.nn.functional as F
device = torch.device("cpu")
class SnakeBoard:
def __init__(self, board=None):
if board != None:
self.channels = board
else:
# 0 - Food, 1 - Head, 2 - Body
self.channels = torch.zeros(1, 3, 15, 17,
device=device)
# Initialize game channels
self.channels[:, 0, 7, 12] = 1
self.channels[:, 1, 7, 5] = 1
self.channels[:, 2, 7, 2:6] = torch.arange(1, 5)
self.move()
def move(self):
self.channels[:, 2] -= 1
F.relu(self.channels[:, 2], inplace=True)
# Up movement test
F.conv2d(self.channels[:, 1], torch.tensor([[[0,1,0],[0,0,0],[0,0,0]]]), padding=1)
SnakeBoard()
The first dimension in channels represents batch size, second dimension represent the 3 channels of the snake game: food, head, and body, and finally the third and fourth dimensions represent the height and width of the board.
Unfortunately when running the code I get error: Expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]
How can I fix that?
The dimensions of the inputs for the convolution are not correct for a 2D convolution. Let's have a look at the dimensions you're passing to F.conv2d:
self.channels[:, 1].size()
# => torch.Size([1, 15, 17])
torch.tensor([[[0,1,0],[0,0,0],[0,0,0]]]).size()
# => torch.Size([1, 3, 3])
The correct sizes should be
input: (batch_size, in_channels , height, width)
weight: (out_channels, in_channels , kernel_height, kernel_width)
Because your weight has only 3 dimensions, it is considered to be a 1D convolution, but since you called F.conv2d the stride and padding will be tuples and therefore it won't work.
For the input you indexed the second dimension, which selects that particular element across that dimensions and eliminates that dimensions. To keep that dimension you can index it with a slice of just one element.
And for the weight you are missing one dimension as well, which can just be added directly. Also your weight is of type torch.long, since you are only using integers in the tensor creation, but the weight needs to be of type torch.float.
F.conv2d(self.channels[:, 1:2], torch.tensor([[[[0,1,0],[0,0,0],[0,0,0]]]], dtype=torch.float), padding=1)
On a different note, I don't think that convolutions are appropriate for this use case, because you're not using a key property of the convolution, which is to capture the surroundings. Those are just too many unnecessary computations to achieve what you want, most of them are multiplications with 0.
For example, a move up is much easier to achieve by removing the first row and adding a new row of zeros at the end, so everything is shifted up (assuming that the first row is the top and the last row is the bottom of the board).
head = self.channels[:, 1:2]
batch_size, channels, height, width = head.size()
# Take everything but the first row of the head
# Add a row of zeros to the end by concatenating them across the height (dimension 2)
new_head = torch.cat([head[:, :, 1:], torch.zeros(batch_size, channels, 1, width)], dim=2)
# Or if you want to wrap it around the board, it's even simpler.
# Move the first row to the end
wrap_around_head = torch.cat([head[:, :, 1:], head[:, :, 0:1]], dim=2)

Resources