Computing matrix derivatives with torch.autograd.grad (PyTorch) - pytorch

I am trying to compute matrix derivatives in PyTorch using torch.autograd.grad however I am running into few issues. Here is a minimal working example to reproduce the error.
theta = torch.tensor(np.random.uniform(low=-np.pi, high=np.pi), requires_grad=True)
rot_mat = torch.tensor([[torch.cos(theta), torch.sin(theta), 0],
[-torch.sin(theta), torch.cos(theta), 0]],
dtype=torch.float, requires_grad=True)
torch.autograd.grad(outputs=rot_mat,
inputs=theta, grad_outputs=torch.ones_like(rot_mat),
create_graph=True, retain_graph=True)
This code results in the error "One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior."
I tried using allow_unused=True but the gradients are returned as None. I am not sure what is causing the graph to be disconnected here.

Pytorch autograd graph will be created only if pytorch functions are used.
I think python 2d list used while creating rot_mat disconnects the graph. So using torch functions create rotation matrix and also just use backward() function to compute gradients. Here's sample code:
import torch
import numpy as np
theta = torch.tensor(np.random.uniform(low=-np.pi, high=np.pi), requires_grad=True)
# create required values and convert it to torch 1d tensor
cos_t = torch.cos(theta).view(1)
sin_t = torch.sin(theta).view(1)
msin_t = -sin_t
zero = torch.zeros(1)
# create rotation matrix using only pytorch functions
rot_1d = torch.cat((cos_t, sin_t, zero, msin_t, cos_t, zero))
rot_mat = rot_1d.view((2, 3))
# Autograd
rot_mat.backward(torch.ones_like(rot_mat))
# gradient
print(theta.grad)

Related

Pytorch bincount with gradient

I am trying to get gradient from sum of some indexes of an array using bincount. However, pytorch does not implement the gradient. This can be implemented by a loop and torch.sum but it is too slow. Is it possible to do this efficiently in pytorch (maybe einsum or index_add)? Of course, we can loop over indexes and add one by one, however that would increase the computational graph size significantly and is very low performance.
import torch
from torch import autograd
import numpy as np
tt = lambda x, grad=True: torch.tensor(x, requires_grad=grad)
inds = tt([1, 5, 7, 1], False).long()
y = tt(np.arange(4) + 0.1).float()
sum_y_section = torch.bincount(inds, y * y, minlength=8)
#sum_y_section = torch.sum(y * y)
grad = autograd.grad(sum_y_section, y, create_graph=True, allow_unused=False)
print("sum_y_section", sum_y_section)
print("grad", grad)
We can use a new feature in Pytorch V1.11 called scatter_reduce.
bincount = lambda inds, arr: torch.scatter_reduce(arr, 0, inds, reduce="sum")
I’d try to use a hook to manipulate the gradient in a custom way

How can I matrix-multiply two PyTorch quantized Tensors?

I am new to tensor quantization, and tried doing something as simple as
import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)
x#y.T
with PyTorch quantized tensors running on CPU. I thus tried
scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)
qx#qy.T # I tried...
..and got as error
RuntimeError: Could not run 'aten::mm' with arguments from the
'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these
backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId,
CPUTensorId, SparseCUDATensorId].
Is matrix multiplication just not supported, or am I doing something wrong?
It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (#) does not support it (as your error message suggests).
You should look at quantized operations, e.g., torch.nn.quantized.functional.linear:
torch.nn.quantized.functional.linear(qx[None,...], qy.T)

Keras data augmentaion changes pixel values for masks (segmentation)

Iam using runtime data augmentation using generators in keras for segmentation problem..
Here is my data generator
data_gen_args = dict(
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2
)
image_datagen = ImageDataGenerator(**data_gen_args)
def generate_data_generator(generator, Xi, Yi):
genXi = generator.flow(Xi, seed=7, batch_size=32)
genYi = generator.flow(Yi, seed=7,batch_size=32)
while True:
Xi = genXi.next()
Yi = genYi.next()
print(Yi.dtype)
print(np.unique(Yi))
yield (Xi, Yi)
train_generator = generate_data_generator(image_datagen,
x_train,
y_train)
My labels are in a numpy array with data type float 32 and value 0.0 and 1.0.
#Output of np.unique(y_train)
array([0., 1.], dtype=float32
However, the data generator seems to modifies pixel values as shown below:-
#Output of print(np.unique(Yi))
[0.00000000e+00 1.01742386e-04 1.74021334e-04 ... 9.99918878e-01
9.99988437e-01 1.00000000e+00]
It is supposed to have same values(0.0 and 1.0) after data geneartion..
Also, the the official documentation shows an example using same augmentation arguments for generating mask and images together.
However when i remove shift and zoom iam getting (0.0 and 1.0) as output.
Keras verion 2.2.4,Python 3.6.8
UPDATE:-
I saved those images as numpy array and plotted it using matplotlib.It looks like the edges are smoothly interpolated (0.0-1.0) somehow upon including shifts and zoom augmentation. I can round these values in my custom generator as a hack; but i still don't understand the root cause (in case of normal images this is quite unnoticeable and has no adverse effects; but in masks we don't want to change label values )!!!
Still wondering.. is this a bug (nobody has mentioned it so far)or problem with my custom code ??

How to normalize time series data with multiple features by using sklearn?

For data with the shape (num_samples,features), MinMaxScaler from sklearn.preprocessing can be used to normalize it easily.
However, when using the same method for time series data with the shape (num_samples, time_steps,features), sklearn will give an error.
from sklearn.preprocessing import MinMaxScaler
import numpy as np
#Making artifical time data
x1 = np.linspace(0,3,4).reshape(-1,1)
x2 = np.linspace(10,13,4).reshape(-1,1)
X1 = np.concatenate((x1*0.1,x2*0.1),axis=1)
X2 = np.concatenate((x1,x2),axis=1)
X = np.stack((X1,X2))
#Trying to normalize
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X) <--- error here
ValueError: Found array with dim 3. MinMaxScaler expected <= 2.
This post suggests something like
(timeseries-timeseries.min())/(timeseries.max()-timeseries.min())
Yet, it only works for data with only 1 feature. Since my data has more than 1 feature, this method doesn't work.
How to normalize time series data with multiple features?
To normalize a 3D tensor of shape (n_samples, timesteps, n_features) use the following:
(timeseries-timeseries.min(axis=2))/(timeseries.max(axis=2)-timeseries.min(axis=2))
Using the argument axis=2 will return the result of the tensor operation performed along the 3rd dimension i.e., the feature axis. Thus each feature will be normalized independently.

pytorch: how to directly find gradient w.r.t. loss

In theano, it was very easy to get the gradient of some variable w.r.t. a given loss:
loss = f(x, w)
dl_dw = tt.grad(loss, wrt=w)
I get that pytorch goes by a different paradigm, where you'd do something like:
loss = f(x, w)
loss.backwards()
dl_dw = w.grad
The thing is I might not want to do a full backwards propagation through the graph - just along the path needed to get to w.
I know you can define Variables with requires_grad=False if you don't want to backpropagate through them. But then you have to decide that at the time of variable-creation (and the requires_grad=False property is attached to the variable, rather than the call which gets the gradient, which seems odd).
My Question is is there some way to backpropagate on demand (i.e. only backpropagate along the path needed to compute dl_dw, as you would in theano)?
It turns out that this is reallyy easy. Just use torch.autograd.grad
Example:
import torch
import numpy as np
from torch.autograd import grad
x = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 4)))
w = torch.autograd.Variable(torch.from_numpy(np.random.randn(4, 3)), requires_grad=True)
y = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 3)))
loss = ((x.mm(w) - y)**2).sum()
(d_loss_d_w, ) = grad(loss, w)
assert np.allclose(d_loss_d_w.data.numpy(), (x.transpose(0, 1).mm(x.mm(w)-y)*2).data.numpy())
Thanks to JerryLin for answering the question here.

Resources