How can I matrix-multiply two PyTorch quantized Tensors? - pytorch

I am new to tensor quantization, and tried doing something as simple as
import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)
x#y.T
with PyTorch quantized tensors running on CPU. I thus tried
scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)
qx#qy.T # I tried...
..and got as error
RuntimeError: Could not run 'aten::mm' with arguments from the
'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these
backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId,
CPUTensorId, SparseCUDATensorId].
Is matrix multiplication just not supported, or am I doing something wrong?

It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (#) does not support it (as your error message suggests).
You should look at quantized operations, e.g., torch.nn.quantized.functional.linear:
torch.nn.quantized.functional.linear(qx[None,...], qy.T)

Related

To calculate euclidean distance between vectors in a torch tensor with multiple dimensions

There is a random initialized torch tensor of the shape as below.
Inputs
tensor1 = torch.rand((4,2,3,100))
tensor2 = torch.rand((4,2,3,100))
tensor1 and tensor2 are torch tensors with 24 100-dimensional vectors, respectively.
I want to get a tensor with a shape of torch.size([4,2,3]) by obtaining the Euclidean distance between vectors with the same index of two tensors.
I used dist = torch.nn.functional.pairwise_distance(tensor1, tensor2) to get the results I wanted.
However, the pairwise_distance function calculates the euclidean distance for the second dimension of the tensor. So dist shape is torch.size([4,3,100]).
I have performed transpose several times to solve these problems. My code is as follows.
tensor1 = tensor1.transpose(1,3)
tensor2 = tensor2.transpose(1,3)
dist = torch.nn.functional.pairwise_distance(tensor1, tensor2)
dist = dist.transpose(1,2)
Is there a simpler or easier way to get the result I want?
Here ya go
dist = (tensor1 - tensor2).pow(2).sum(3).sqrt()
Basically that's what Euclidean distance is.
Subtract -> power by 2 -> sum along the unfortunate axis you want to eliminate-> square root

Computing matrix derivatives with torch.autograd.grad (PyTorch)

I am trying to compute matrix derivatives in PyTorch using torch.autograd.grad however I am running into few issues. Here is a minimal working example to reproduce the error.
theta = torch.tensor(np.random.uniform(low=-np.pi, high=np.pi), requires_grad=True)
rot_mat = torch.tensor([[torch.cos(theta), torch.sin(theta), 0],
[-torch.sin(theta), torch.cos(theta), 0]],
dtype=torch.float, requires_grad=True)
torch.autograd.grad(outputs=rot_mat,
inputs=theta, grad_outputs=torch.ones_like(rot_mat),
create_graph=True, retain_graph=True)
This code results in the error "One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior."
I tried using allow_unused=True but the gradients are returned as None. I am not sure what is causing the graph to be disconnected here.
Pytorch autograd graph will be created only if pytorch functions are used.
I think python 2d list used while creating rot_mat disconnects the graph. So using torch functions create rotation matrix and also just use backward() function to compute gradients. Here's sample code:
import torch
import numpy as np
theta = torch.tensor(np.random.uniform(low=-np.pi, high=np.pi), requires_grad=True)
# create required values and convert it to torch 1d tensor
cos_t = torch.cos(theta).view(1)
sin_t = torch.sin(theta).view(1)
msin_t = -sin_t
zero = torch.zeros(1)
# create rotation matrix using only pytorch functions
rot_1d = torch.cat((cos_t, sin_t, zero, msin_t, cos_t, zero))
rot_mat = rot_1d.view((2, 3))
# Autograd
rot_mat.backward(torch.ones_like(rot_mat))
# gradient
print(theta.grad)

Keras data augmentaion changes pixel values for masks (segmentation)

Iam using runtime data augmentation using generators in keras for segmentation problem..
Here is my data generator
data_gen_args = dict(
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2
)
image_datagen = ImageDataGenerator(**data_gen_args)
def generate_data_generator(generator, Xi, Yi):
genXi = generator.flow(Xi, seed=7, batch_size=32)
genYi = generator.flow(Yi, seed=7,batch_size=32)
while True:
Xi = genXi.next()
Yi = genYi.next()
print(Yi.dtype)
print(np.unique(Yi))
yield (Xi, Yi)
train_generator = generate_data_generator(image_datagen,
x_train,
y_train)
My labels are in a numpy array with data type float 32 and value 0.0 and 1.0.
#Output of np.unique(y_train)
array([0., 1.], dtype=float32
However, the data generator seems to modifies pixel values as shown below:-
#Output of print(np.unique(Yi))
[0.00000000e+00 1.01742386e-04 1.74021334e-04 ... 9.99918878e-01
9.99988437e-01 1.00000000e+00]
It is supposed to have same values(0.0 and 1.0) after data geneartion..
Also, the the official documentation shows an example using same augmentation arguments for generating mask and images together.
However when i remove shift and zoom iam getting (0.0 and 1.0) as output.
Keras verion 2.2.4,Python 3.6.8
UPDATE:-
I saved those images as numpy array and plotted it using matplotlib.It looks like the edges are smoothly interpolated (0.0-1.0) somehow upon including shifts and zoom augmentation. I can round these values in my custom generator as a hack; but i still don't understand the root cause (in case of normal images this is quite unnoticeable and has no adverse effects; but in masks we don't want to change label values )!!!
Still wondering.. is this a bug (nobody has mentioned it so far)or problem with my custom code ??

Element wise calculation breaks autograd

I am using pytorch to calculate loss for a logistic regression (I know pytorch can do this automatically but I have to make it myself). My function is defined below but the cast to torch.tensor breaks autograd and gives me w.grad = None. Im new to pytorch so Im sorry.
logistic_loss = lambda X,y,w: torch.tensor([torch.log(1 + torch.exp(-y[i] * torch.matmul(w, X[i,:]))) for i in range(X.shape[0])], requires_grad=True)
Your post isn't very clear on details and this is a monster of a one-liner. I first reworked it to make a minimal, complete, verifiable example. Please correct me if I misunderstood your intentions and please do it yourself next time.
import torch
# unroll the one-liner to have an easier time understanding what's going on
def logistic_loss(X, y, w):
elementwise = []
for i in range(X.shape[0]):
mm = torch.matmul(w, X[i, :])
exp = torch.exp(-y[i] * mm)
elementwise.append(torch.log(1 + exp))
return torch.tensor(elementwise, requires_grad=True)
# I assume that's the excepted dimensions of your input
X = torch.randn(5, 30, requires_grad=True)
y = torch.randn(5)
w = torch.randn(30)
# I assume you backpropagate from a reduced version
# of your sum, because you can't call .backward on multi-dimensional
# tensors
loss = logistic_loss(X, y, w).mean()
loss.mean().backward()
print(X.grad)
The simplest solution to your problem is to replace torch.tensor(elementwise, requires_grad=True) with torch.stack(elementwise). You can think of torch.tensor as a constructor for entirely new tensors, if your tensor is more of a result of some mathematical expression, you should use operations like torch.stack or torch.cat.
That being said, this code is still wildly inefficient because you do manual looping over i. Instead, you could write simply
def logistic_loss_vectorized(X, y, w):
mm = torch.matmul(X, w)
exp = torch.exp(-y * mm)
return torch.log(1 + exp)
which is mathematically equivalent, but will be much faster in practice, because it allows for better parallelization due to lack of explicit looping.
Note that there is still a numerical issue with this code - you're taking a logarithm of an exponential, but the intermediate result, called exp, is likely to attain very high values, causing loss of precision. There are workarounds for that, which is why the loss functions provided by PyTorch are preferable.

pytorch: how to directly find gradient w.r.t. loss

In theano, it was very easy to get the gradient of some variable w.r.t. a given loss:
loss = f(x, w)
dl_dw = tt.grad(loss, wrt=w)
I get that pytorch goes by a different paradigm, where you'd do something like:
loss = f(x, w)
loss.backwards()
dl_dw = w.grad
The thing is I might not want to do a full backwards propagation through the graph - just along the path needed to get to w.
I know you can define Variables with requires_grad=False if you don't want to backpropagate through them. But then you have to decide that at the time of variable-creation (and the requires_grad=False property is attached to the variable, rather than the call which gets the gradient, which seems odd).
My Question is is there some way to backpropagate on demand (i.e. only backpropagate along the path needed to compute dl_dw, as you would in theano)?
It turns out that this is reallyy easy. Just use torch.autograd.grad
Example:
import torch
import numpy as np
from torch.autograd import grad
x = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 4)))
w = torch.autograd.Variable(torch.from_numpy(np.random.randn(4, 3)), requires_grad=True)
y = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 3)))
loss = ((x.mm(w) - y)**2).sum()
(d_loss_d_w, ) = grad(loss, w)
assert np.allclose(d_loss_d_w.data.numpy(), (x.transpose(0, 1).mm(x.mm(w)-y)*2).data.numpy())
Thanks to JerryLin for answering the question here.

Resources