I need to construct a 4-dimensional PyTorch Tensor where one of the dimensions comes from multiplying a constant sparse matrix with a dense vector. The dense vector, and the resulting 4D Tensor, require gradients to be tracked. Since PyTorch only supports sparse matrices, I can't express the whole thing as a Tensor-Tensor multipllication, and I think I have to do the matrix multiplication part of the construction in a loop. In that case, I'd at least like to preallocate the result 4D Tensor and let the sparse mm fill in one dimension in a loop.
How do I in that case keep track of the resulting 4D Tensor's gradient requirements? Can I manually attach it into the gradient graph once it's been created?
My current approach is extremely inefficient, essentially building up one dimension at a time in a list, can cating.
Related
I am implementing a custom CNN with some custom modules in it. I have implemented only the forward pass for the custom modules and left their backward pass to autograd.
I have manually computed the correct formulae for backpropagation through the parameters of the custom modules, and I wished to see whether they match with the formulae used internally by autograd to compute the gradients.
Is there any way to see this?
Thanks
Edit (To add a test case) :-
I have a complex affine layer where the weights and inputs are complex-valued matrices, and the operation is a matrix multiplication of the weight and input matrices.
The multiplication of two complex numbers is given by -
(a+ib)(c+id) = (ac-bd)+i(ad+bc)
I computed the backpropagation formula for this layer given we have the incoming gradient from the higher layer.
It comes out to be dL/dI(n) = (hermitian(W(n))).matmul(dL/dI(n+1))
where I(n) and W(n) are the input and weight of nth layer and I(n+1) is input of (n+1)th layer.
So I wished to check whether autograd is also computing dL/dI(n) using the same formula that I derived.
(Since Pytorch doesn't support complex-valued tensors backpropagation as for now, I have created my own representation of complex numbers by dealing with separate real and imaginary tensors)
I don't believe there is such a feature in pytorch, even because it would be quite unreadable. What you can do is to implement a custom backward method for your layer with the formula you derived, then know by design that the backpropagation is what you want.
I have a 3D torch tensor with dimension of [Batch_size, n, n] which is the out put of a layer of my network and a constant 2D torch tensor with size of [n, n]. How can I perform element wise multiplication over the batch size which should resulted in a torch tensor with size of [Batch_size, n, n]?
I know it is possible to implement this operation using explicit loop but I am interested in the most efficient way.
One option is that you can expand your weight matrix to have a matching batch dimension (without using any additional memory). E.g. twoDTensor.expand((batch_size, n, n)) returns the same underlying data, but representing a 3D tensor. You can see that the stride for the batch dim is zero.
I have a sequence of multi-band images, say each sample is a tensor of size (50, 6, 30, 30) where 50 is the number of image frames in sequence, 6 is number of bands per pixel, and 30x30 is the spatial dimension of the image. The ground truth map is of size 30x30, but it is one-hot encoded (to use crossentropy loss) o 7 classes, so it is a tensor of size (1, 7, 30, 30).I want to use a combination of convolutional and LSTM (or use an integrated ConvLSTM2D layer) for my classification task, but there are below problems:
1- Not every point has a valid label at the output map (i.e. some one-hot vectors are all-zero),
2- Not every pixel has a valid value in every time stamp. So, at every given time stamp, some of the pixels may have zero value (means invalid) for all of their band values.
I read many Q&As on how to handle this issue and I think I should use sample_weights option to mask the invalid points and classes but I am really uncertain how to do it. Sample_weights should be applied to every pixel and each timestamp independently. I think I can manage it if I didn't have the convolution part (a 2D approach). But don't understand how it works when convolution is in place, because some pixel values in convolution window are valid and some are invalid.If I mask those invalid pixels at a specific time (that still I don't know how to do it), what will happen to the chain of forward and backward propagation and loss calculation? I think it will be ruined!
Looking for comments and help.
Possible solution:
Problem 1- For pixels where do not have class at all you can introduce a new class with a label for example noise,
it means not in your one hot encode you have value for that as well and weights will be generated accordingly for those pixels for noise class
this is an indirect way to achieve the same thing you do with sample weight
cause in the sample_weight technique you tell keras or sklearn that what is the weightage of the parameter or sample ratio of the weights.
Problem 2- To answer part 2 consider the possible use cases for example for these invalid values class value can be there in hot encode vector or it will be all zeros?
or you can preprocess and add these to the noise class as well then point 2 will be handled by point 1 automatically.
I have a tensor of ground truth values of 3D points of G=[18000x3], and an output from my network of the same size O=[18000x3].
I need to compute a loss so that I basically have the square root of the distance between each 3D point, summed over all keypoints and normalized over 18000. How do I write this efficiently?
Just write the expression you propose using the vectorized operations provided by PyTorch. In this case
loss = (O - G).pow(2).sum(axis=1).sqrt().mean()
Check out pow, sum, sqrt and mean.
I am attempting to compute a linear combination of n tensors of the same dimension in Tensorflow. The scalar coefficients are Tensorflow Variables.
Since tf.scalar_mul does not generalise to multiplying a vector of tensors by a vector of scalars, I have thus far used tf.gather and performed each multiplication individually in a python for loop, and then converted the list of results to a tensor and summed them across the zeroth axis. Like so:
coefficients = tf.Variable(tf.constant(initial_value, shape=[n]))
components = []
for i in range(n):
components.append(tf.scalar_mul(tf.gather(coefficients, i), tensors[i]))
combination = tf.reduce_sum(tf.convert_to_tensor(components), axis=0)
This works fine, but does not scale well at all. My application requires computing n linear combinations, meaning I have n^2 gather and multiply operations. With large values of n the computation time is poor and the memory usage of the program is unreasonably large.
Is there a more natural way of computing a linear combination like this in Tensorflow that would be faster and less resource intensive?
Use broadcasting. Assuming coefficients has shape (n,) and tensors shape (n,...) you can simply use
coefficients[:, tf.newaxis, ...] * tensors
here, you would need to repeat tf.newaxis as many times as tensors has dimenions besides the one of size n. So e.g. if tensors has shape (n, a, b) you would use coefficients[:, tf.newaxis, tf.newaxis]
This will turn coefficients into a tensor with the same number of dimensions as tensors, but all dimensions except the first one are of size 1, so they can be broadcast to the shape of tensors.
Some alternatives:
Define coefficients as a variable with the correct number of dimensions in the first place (a little ugly in my opinion).
Use tf.reshape to reshape coefficients to (n, 1, ...) instead if you don't like the indexing syntax.
Use tf.transpose to shift the dimension of size n to the end of tensors. Then the dimensions align for broadcasting without needing to add dimensions to coefficients.
Also see the numpy docs on broadcasting -- it works essentially the same way in Tensorflow.
There is a new PyPI module called TWIT, Tensor Weighted Interpolative Transfer, that will do this fast. It is written in C for the core operations.