unable to use `torch.gather` with 3D index and 3D input - pytorch

I have an idx tensor with shape torch.Size([2, 80000, 3]), which corresponds to batch, points, and indices of 3 elements (from 0 to 512) from the feat tensor with shape torch.Size([2, 513, 2]).
I cant seem to find a way to use torch.gather to index feat with idx with a resulting tensor with shape [2,8000,3,2].

Related

Need clear concept of the dimensions of output and hidden from LSTM layers

I know that the output carries all hiddens from the last layer of all the time steps and the hidden is the last time step hiddens of all the layers.
This context has each document with 850 tokens. Each token is embedded into 100 dimension. I took a 2-layer LSTM with 100 dim hidden.
I thought it would take a token at a time step and produce 100 dim hidden. For 850 tokens in a document it will produce output = [1, 850, 100], hidden [1, 2, 100] and cell [1, 2, 100]. But the hidden and cell are [2, 850, 100].
input_dim = len(tok2indx) # size of the vocabulary
emb_dim = 100 # Embedding of each word
hid_dim = 100 # The dimention of each hiddenstate comming out from a time step
n_layers = 2 # LSTM layers
class Encoder(nn.Module):
def __init__(self):
super().__init__()
self.hid_dim = hid_dim
self.n_layers = n_layers
self.embedding = nn.Embedding(input_dim, emb_dim)
self.rnn = nn.LSTM(emb_dim, hid_dim, n_layers, dropout = dropout, device=device)
self.dropout = nn.Dropout(dropout)
def forward(self, X):
embedded = self.embedding(X).to(device)
outputs, (hidden, cell) = self.rnn(embedded)
return outputs, hidden, cell
If the encoder is passed a single document
enc = Encoder()
encd = enc.forward(train_x[:1])
print(encd[0].shape, encd[1].shape, encd[2].shape)
Output:
torch.Size([1, 850, 100]) torch.Size([2, 850, 100]) torch.Size([2, 850, 100])
With ten documents
encd = enc.forward(train_x[:10])
print(encd[0].shape, encd[1].shape, encd[2].shape)
Output:
torch.Size([10, 850, 100]) torch.Size([2, 850, 100]) torch.Size([2, 850, 100])
What's tripping you up is the input format to LSTM. The default input shape to a LSTM layer is Sequence (L), batch (N), features (H). While in you code you are sending input as NLH (batch, sequence, features). To use this correctly set the parameter batch_first=True (to the LSTM layer), then the input and output will be as you expect.
But there is a catch here too. Only the output (1st of the outputs) will be NLH while both hidden and cell (2nd and 3rd of the outputs) will still be LNH format.
The second thing to note here is the hidden cell will have dimensionality equal to the number of layers ie 2 in your example (each layer will require fill of its own hidden weights), hence the output [2, 850, 100] instead of [1, 850, 100].

How to use affine_grid for batch tensor in pytorch?

Following the official tutorial on affine_grid, this line (inside function stn()):
grid = F.affine_grid(theta, x.size())
gives me an error:
RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [10, 3] but got: [4000, 3].
using the input as follow:
from torchinfo import summary
model = Net()
summary(model, input_size=(10, 1, 256, 256))
theta is of size (4000, 2, 3) and x is of size (10, 1, 256, 256), how do I properly manipulate theta for correct dimension to use with batch tensor?
EDIT: I honestly don't see any mistake here, and the dimension is actually according the the affine_grid doc, is there something changed in the function itself?

Understanding input shape to PyTorch conv1D?

This seems to be one of the common questions on here (1, 2, 3), but I am still struggling to define the right shape for input to PyTorch conv1D.
I have text sequences of length 512 (number of tokens per sequence) with each token being represented by a vector of length 768 (embedding). The batch size I am using is 6.
So my input tensor to conv1D is of shape [6, 512, 768].
input = torch.randn(6, 512, 768)
Now, I want to convolve over the length of my sequence (512) with a kernel size of 2 using the conv1D layer from PyTorch.
Understanding 1:
I assumed that "in_channels" are the embedding dimension of the conv1D layer. If so, then a conv1D layer will be defined in this way where
in_channels = embedding dimension (768)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(768, 100, 2)
feature_map = convolution_layer(input)
But with this assumption, I get the following error:
RuntimeError: Given groups=1, weight of size 100 768 2, expected input `[4, 512, 768]` to have 768 channels, but got 512 channels instead
Understanding 2:
Then I assumed that "in_channels" is the sequence length of the input sequence. If so, then a conv1D layer will be defined in this way where
in_channels = sequence length (512)
out_channels = 100 (arbitrary number)
kernel = 2
convolution_layer = nn.conv1D(512, 100, 2)
feature_map = convolution_layer(input)
This works fine and I get an output feature map of dimension [batch_size, 100, 767]. However, I am confused. Shouldn't the convolutional layer convolve over the sequence length of 512 and output a feature map of dimension [batch_size, 100, 511]?
I will be really grateful for your help.
In pytorch your input shape of [6, 512, 768] should actually be [6, 768, 512] where the feature length is represented by the channel dimension and sequence length is the length dimension. Then you can define your conv1d with in/out channels of 768 and 100 respectively to get an output of [6, 100, 511].
Given an input of shape [6, 512, 768] you can convert it to the correct shape with Tensor.transpose.
input = input.transpose(1, 2).contiguous()
The .contiguous() ensures the memory of the tensor is stored contiguously which helps avoid potential issues during processing.
I found an answer to it (source).
So, usually, BERT outputs vectors of shape
[batch_size, sequence_length, embedding_dim].
where,
sequence_length = number of words or tokens in a sequence (max_length sequence BERT can handle is 512)
embedding_dim = the vector length of the vector describing each token (768 in case of BERT).
thus, input = torch.randn(batch_size, 512, 768)
Now, we want to convolve over the text sequence of length 512 using a kernel size of 2.
So, we define a PyTorch conv1D layer as follows,
convolution_layer = nn.conv1d(in_channels, out_channels, kernel_size)
where,
in_channels = embedding_dim
out_channels = arbitrary int
kernel_size = 2 (I want bigrams)
thus, convolution_layer = nn.conv1d(768, 100, 2)
Now we need a connecting link between the expected input by convolution_layer and the actual input.
For this, we require to
current input shape [batch_size, 512, 768]
expected input [batch_size, 768, 512]
To achieve this expected input shape, we need to use the transpose function from PyTorch.
input_transposed = input.transpose(1, 2)
I have a suggestion for you which may not be what you asked for but might help. Because your input is (6, 512, 768) you can use conv2d instead of 1d.
All you need to do is to add a dimension of 1 at index 1: input.unsqueeze(1) which works as your channel (consider it as a grayscale image)
def forward(self, x):
x = self.embedding(x) # [Batch, seq length, Embedding] = [5, 512, 768])
x = torch.unsqueeze(x, 1) # [5, 1, 512, 768]) # like a grayscale image
and also for your conv2d layer, you can define like this:
window_size=3 # for trigrams
EMBEDDING_SIZE = 768
NUM_FILTERS = 10 # or whatever you want
self.conv = nn.Conv2d(in_channels = 1,
out_channels = NUM_FILTERS,
kernel_size = [window_size, EMBEDDING_SIZE],
padding=(window_size - 1, 0))```

How do I change a torch tensor to concat with another tensor

I'm trying to concatenate a tensor of numerical data with the output tensor of a resnet-50 model. The output of that model is tensor shape torch.Size([10,1000]) and the numerical data is tensor shape torch.Size([10, 110528,8]) where the 10 is the batch size, 110528 is the number of observations in a data frame sense, and 8 is the number of columns (in a dataframe sense). I need to reshape the numerical tensor to torch.Size([10,8]) so it will concatenate properly.
How would I reshape the tensor?
Starting tensors.
a = torch.randn(10, 1000)
b = torch.randn(10, 110528, 8)
New tensor to allow concatenate.
c = torch.zeros(10,1000,7)
Check shapes.
a[:,:,None].shape, c.shape
(torch.Size([10, 1000, 1]), torch.Size([10, 1000, 7]))
Alter tensor a to allow concatenate.
a = torch.cat([a[:,:,None],c], dim=2)
Concatenate in dimension 1.
torch.cat([a,b], dim=1).shape
torch.Size([10, 111528, 8])

How can I select single indices over a dimension in pytorch?

Assume I have a tensor sequences of shape [8, 12, 2]. Now I would like to make a selection of that tensor for each first dimension which results in a tensor of shape [8, 2]. The selection over dimension 1 is specified by indices stored in a long tensor indices of shape [8].
I tried this, however it selects each index in indices for each first dimension in sequences instead of only one.
sequences[:, indices]
How can I make this query without a slow and ugly for loop?
sequences[torch.arange(sequences.size(0)), indices]
torch.index_select solves your problem more easily than torch.gather since you don't have to adapt the dimensions of the indeces. Indeces must be a tensor. For your case
indeces = [0,2]
a = torch.rand(size=(3,3,3))
torch.index_select(a,dim=1,index=torch.tensor(indeces,dtype=torch.long))
This should be doable by torch.gather, but you need to convert your index tensor first by
unsqueeze it to match the number of dimension of your input tensor
repeat_interleave it to match the size of last dimension
Here is an example based on your description:
# original indices dimension [8]
# after first unsueeze, dimension is [8, 1]
indices = torch.unsqueeze(indices, 1)
# after second unsueeze, dimension is [8, 1, 1]
indices = torch.unsqueeze(indices, 2)
# after repeat, dimension is [8, 1, 2]
indices = torch.repeat_interleave(indices, 2, dim=2)
# now you have the right dimension for torch.gather
# don't forget to squeeze the redundant dimension
# result has dimension [8, 2]
result = torch.gather(sequences, 1, indices).squeeze()
It can be done using torch.Tensor.gather
sequences = torch.randn(8,12,2)
# defining the indices as a 1D Tensor of random integers and reshaping it to use with Tensor.gather
indices = torch.randint(sequences.size(1),(sequences.size(0),)).unsqueeze(-1).unsqueeze(-1).repeat(1,1,sequences.size(-1))
# indices shape: (8, 1, 2)
output = sequences.gather(1,indices).squeeze(1)
# output shape: (8, 2)

Resources