Broadcasting element wise multiplication in pytorch - pytorch

I have a tensor in pytorch with size torch.Size([1443747, 128]). Let's name it tensor A. In this tensor, 128 represents a batch size. I have another 1D tensor with size torch.Size([1443747]). Let's call it B. I want to do element wise multiplication of B with A, such that B is multiplied with all 128 columns of tensor A (obviously in an element wise manner). In other words, I want to broadcast the element wise multiplication along dimension=1.
How can I achieve this in pytorch?
It I didn't have a batch size involved in the tensor A (batch size = 1), then normal * operator would do the multiplication easily. A*B then would have generated resultant tensor of size torch.Size([1443747]). However, I don't understand why pytorch is not broadcasting the tensor multiplication along dimension 1? Is there any way to do this?
What I want is, B should be multiplied with all 128 columns of A in an element wise manner. So, the resultant tensors' size would be torch.Size([1443747, 128]).

The dimensions should match, it should work if you transpose A or unsqueeze B:
C = A.transpose(1,0) * B # shape: [128, 1443747]
C = A * B.unsqueeze(dim=1) # shape: [1443747, 128]
Note that the shapes of the two solutions are different.


Convolutional layer: does the filter convolves also trough the nlayers_in or it take all the dimensions?

In the leading DeepLearning libraries, does the filter (aka kernel or weight) in the convolutional layer convolves also across the "channel" dimension or does it take all the channels at once?
To make an example, if the input dimension is (60,60,10) (where the last dimension is often referred as "channels") and the desired output number of channels is 5, can the filter be (5,5,5,5) or should it be (5,5,10,5) instead ?
It should be (5, 5, 10, 5). Conv2d operation is just like Linear if you ignore the spatial dimensions.
From TensorFlow documentation [link]:
Given an input tensor of shape batch_shape + [in_height, in_width, in_channels] and a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels], this op performs the following:
Flattens the filter to a 2-D matrix with shape [filter_height * filter_width * in_channels, output_channels].
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, out_height, out_width, filter_height * filter_width * in_channels].
For each patch, right-multiplies the filter matrix and the image patch vector.
It takes all channels at once, so 5×5×10×5 should be right.
julia> using Flux
julia> c = Conv((5,5), 10 => 5); # make a layer, 10 channels to 5
julia> c.weight |> summary
"5×5×10×5 Array{Float32, 4}"
julia> c(randn(Float32, 60, 60, 10, 1)) |> summary # check it works
"56×56×5×1 Array{Float32, 4}"
julia> Conv(rand(Float32, (5,5,5,5))) # different weight size
Conv((5, 5), 5 => 5) # 630 parameters

Torch tensor filter by index but keep the shape

I have an input tensor of shape: data (x,y,z).
I have a binary mask tensor of shape: mask (x,y).
When I do data[mask > 0] I obtain a new tensor of shape (q,z) where q is the number of ones in the mask tensor.
I would instead want to get a tensor of original shape (x,y,z) but for the values for which we have zeros in mask being eliminated and instead being padded at the end with 0 in data tensor so we keep the original shape ( the reason it doesn't do that now is because we would have variable length across second dimension).
Of course this can be easily done in python with basic matrix operations, but is there an efficient tensor-way to do it in pytorch?
Example (imagine a,b,c...are 1D tesnors):
Ideal output:
[d,f, junk]]
Here, the missing stuff is padded with some "junk" to keep the original shape.

PyTorch high-dimensional tensor through linear layer

I have a tensor of size (32, 128, 50) in PyTorch. These are 50-dim word embeddings with a batch size of 32. That is, the three indices in my size correspond to number of batches, maximum sequence length (with 'pad' token), and the size of each embedding. Now, I want to pass this through a linear layer to get an output of size (32, 128, 1). That is, for every word embedding in every sequence, I want to make it one dimensional. I tried adding a linear layer to my network going from 50 to 1 dimension, and my output tensor is of the desired shape. So I think this works, but I would like to understand how PyTorch deals with this issue, since I did not explicitly tell it which dimension to apply the linear layer to. I played around with this and found that:
If I input a tensor of shape (32, 50, 50) -- thus creating ambiguity by having two dimensions along which the linear layer could be applied to (two 50s) -- it only applies it to the last dim and gives an output tensor of shape (32, 50, 1).
If I input a tensor of shape (32, 50, 128) it does NOT output a tensor of shape (32, 1, 128), but rather gives me an error.
This suggests that a linear layer in PyTorch applies the transformation to the last dimension of your tensor. Is that the case?
In the nn.Linear docs, it is specified that the input of this module can be any tensor of size (*, H_in) and the output will be a tensor of size (*, H_out), where:
* means any number of dimensions
H_in is the number of in_features
H_out is the number of out_features
To understand this better, for a tensor of size (n, m, 50) can be processed by a Linear module with in_features=50, while a tensor of size (n, 50, m) can be processed by a Linear module with in_features=m (in your case 128).

Division in batches of a 3D tensor (Pytorch)

I have a 3D tensor of size say 100x5x2 and mean of the tensor across axis=1 which gives shape 100x2.
100 here is the batch size. Normally without batch, the division of tensor of shape 5x2 and 2 works perfectly but in the case of the 3D tensor with batch, I’m receiving error.
a = torch.rand(5,2)
b = torch.rand(2)
gives me expected answer.
a = torch.rand(100,5,2)
b = torch.rand(100,2)
Gives me the following error.
The size of tensor a (5) must match the size of tensor b (100) at non-singleton dimension 1.
How to divide these tensors such that my output is of shape 100x5x2 ? Something like bmm for division?
Simply do:
z = a / b.unsqueeze(1)
This adds an extra dimension in b and makes it of shape (100, 1, 2) which is compatible for broadcasting with a.

ValueError: Expected target size (128, 44), got torch.Size([128, 100]), LSTM Pytorch

I want to build a model, that predicts next character based on the previous characters.
I have spliced text into sequences of integers with length = 100(using dataset and dataloader).
Dimensions of my input and target variables are:
inputs dimension: (batch_size,sequence length). In my case (128,100)
targets dimension: (batch_size,sequence length). In my case (128,100)
After forward pass I get dimension of my predictions: (batch_size, sequence_length, vocabulary_size) which is in my case (128,100,44)
but when I calculate my loss using nn.CrossEntropyLoss() function:
batch_size = 128
sequence_length = 100
number_of_classes = 44
# creates random tensor of your output shape
output = torch.rand(batch_size,sequence_length, number_of_classes)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
I get an error:
ValueError: Expected target size (128, 44), got torch.Size([128, 100])
Question is: how should I handle calculation of the loss function for many-to-many LSTM prediction? Especially sequence dimension? According to nn.CrossEntropyLoss Dimension must be(N,C,d1,d2...dN), where N is batch_size,C - number of classes. But what is D? Is it related to sequence length?
As a general comment, let me just say that you have asked many different questions, which makes it difficult for someone to answer. I suggest asking just one question per StackOverflow post, even if that means making several posts. I will answer just the main question that I think you are asking: "why is my code crashing and how to fix it?" and hopefully that will clear up your other questions.
Per your code, the output of your model has dimensions (128, 100, 44) = (N, D, C). Here N is the minibatch size, C is the number of classes, and D is the dimensionality of your input. The cross entropy loss you are using expects the output to have dimension (N, C, D) and the target to have dimension (N, D). To clear up the documentation that says (N, C, D1, D2, ..., Dk), remember that your input can be an arbitrary tensor of any dimensionality. In your case inputs have length 100, but nothing is to stop someone from making a model with, say, a 100x100 image as input. (In that case the loss would expect output to have dimension (N, C, 100, 100).) But in your case, your input is one dimensional, so you have just a single D=100 for the length of your input.
Now we see the error, outputs should be (N, C, D), but yours is (N, D, C). Your targets have the correct dimensions of (N, D). You have two paths the fix the issue. First is to change the structure of your network so that its output is (N, C, D), this may or may not be easy or what you want in the context of your model. The second option is to transpose your axes at the time of loss computation using torch.transpose
batch_size = 128
sequence_length = 100
number_of_classes = 44
# creates random tensor of your output shape (N, D, C)
output = torch.rand(batch_size,sequence_length, number_of_classes)
# transposes dimensionality to (N, C, D)
tansposed_output = torch.transpose(output, 1, 2)
# creates tensor with random targets
target = torch.randint(number_of_classes, (batch_size,sequence_length)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(transposed_output, target)
