Pytorch tensor multiplication with Float tensor giving wrong answer - pytorch

I am seeing some strange behavior when i multiply two pytorch tensors.
x = torch.tensor([99397544.0])
y = torch.tensor([0.1])
x * y
This outputs
tensor([9939755.])
However, the answer should be 9939754.4

In default, the tensor dtype is torch.float32 in pytorch. Change it to torch.float64 will give the right result.
x = torch.tensor([99397544.0], dtype=torch.float64)
y = torch.tensor([0.1], dtype=torch.float64)
x * y
# tensor([9939754.4000])
The mismatched result for torch.float32 caused by rounding error if you do not have enough precision to calculate (represent) it.
What Every Computer Scientist Should Know About Floating-Point Arithmetic

Related

Pytorch gradient calulation of one Tensor

I'm a beginner in pytorch and I'm probably stuck on a relatively trivial problem, but it's not clearing up for me at the moment.
When calculating the gradient of a tensor I get a constant gradient of 1.
Shouldn't the gradient of a constant however result in 0?
Here is a minimal example:
import torch
x = torch.tensor(50., requires_grad=True)
y = x
y.backward()
print(x.grad)
#Ouput: tensor(1.)
So why is the ouput 1 and not 0?
You are not computing the gradient of a constant, but that of the variable x which has a constant value 50. The derivative of x with respect to x is 1.

PyTorch: Computing the norm of batched tensors

I have tensor t with shape (Batch_Size x Dims) and another tensor v with shape (Vocab_Size x Dims). I'd like to produce a tensor d with shape (Batch_Size x Vocab_Size), such that d[i,j] = norm(t[i] - v[j]).
Doing this for a single tensor (no batches) is trivial: d = torch.norm(v - t), since t would be broadcast. How can I do this when the tensors have batches?
Insert unitary dimensions into v and t to make them (1 x Vocab_Size x Dims) and (Batch_Size x 1 x Dims) respectively. Next, take the broadcasted difference to get a tensor of shape (Batch_Size x Vocab_Size x Dims). Pass that to torch.norm along with the optional dim=2 argument so that the norm is taken along the last dimension. This will result in the desired (Batch_Size x Vocab_Size) tensor of norms.
d = torch.norm(v.unsqueeze(0) - t.unsqueeze(1), dim=2)
Edit: As pointed out by #KonstantinosKokos in the comments, due to the broadcasting rules used by numpy and pytorch, the leading unitary dimension on v does not need to be explicit. I.e. you can use
d = torch.norm(v - t.unsqueeze(1), dim=2)

Division in batches of a 3D tensor (Pytorch)

I have a 3D tensor of size say 100x5x2 and mean of the tensor across axis=1 which gives shape 100x2.
100 here is the batch size. Normally without batch, the division of tensor of shape 5x2 and 2 works perfectly but in the case of the 3D tensor with batch, I’m receiving error.
a = torch.rand(5,2)
b = torch.rand(2)
z=a/b
gives me expected answer.
a = torch.rand(100,5,2)
b = torch.rand(100,2)
z=a/b
Gives me the following error.
The size of tensor a (5) must match the size of tensor b (100) at non-singleton dimension 1.
How to divide these tensors such that my output is of shape 100x5x2 ? Something like bmm for division?
Simply do:
z = a / b.unsqueeze(1)
This adds an extra dimension in b and makes it of shape (100, 1, 2) which is compatible for broadcasting with a.

How to calculate gradients on a tensor in PyTorch?

I want to calculate the gradient of a tensor and however, it gives error as
RunTimeerror: grad can be implicitly created only for scalar outputs
and here is what I am trying to code:
x = torch.full((2,3), 4,requires_grad=True)
y = (2*x**2+3)
y.backward()
And at this point, it throws an error.
Since there is no summing up/reducing the loss-value , like .sum()
Hence the issue could be fixed by:
y.backward(torch.ones_like(x))
which performs a Jacobian-vector product with a tensor of all ones and get the gradient.

PyTorch: Calculating the Hessian vector product with nn.parameters()

Using PyTorch, I would like to calculate the Hessian vector product, where the Hessian is the second-derivative matrix of the loss function of some neural net, and the vector will be the vector of gradients of that loss function.
I know how to calculate the Hessian vector product for a regular function thanks to this post. However, I am running into trouble when the function is the loss function of a neural network. This is because the parameters are packaged into a module, accessible via nn.parameters(), and not a torch tensor.
I want to do something like this (doesn't work):
### a simple neural network
linear = nn.Linear(10, 20)
x = torch.randn(1, 10)
y = linear(x).sum()
### compute the gradient and make a copy that is detached from the graph
grad = torch.autograd.grad(y, linear.parameters(),create_graph=True)
v = grad.clone().detach()
### compute the Hessian vector product
z = grad # v
z.backward()
In analogy this this (does work):
x = Variable(torch.Tensor([1, 1]), requires_grad=True)
f = 3*x[0]**2 + 4*x[0]*x[1] + x[1]**2
grad, = torch.autograd.grad(f, x, create_graph=True)
v = grad.clone().detach()
z = grad # v
z.backward()
This post addresses a similar (possibly the same?) issue, but I don't understand the solution.
You are saying it doesn't work but do not show what error you get, this is why you haven't got any answers
torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)
outputs and inputs are expected to be sequences of tensors. But you
use just a tensor as outputs.
What this is saying is that you should pass a sequence, so pass [y] instead of y

Resources