Computing intermediate gradients using backward method in Pytorch - pytorch

I am having trouble in understanding backward method in pytorch
x1 = tensor(2.).requires_grad_()
x2 = tensor(3.).requires_grad_() # or x2 = tensor(3.)
x3 = x1 + x2
l = (x3**2).sum()
l.backward()
print(x1)
print(x3)
print(x1.grad)
print(x3.grad)
Results are
tensor(2., requires_grad=True)
tensor(5., grad_fn=<AddBackward0>)
tensor(10.)
None
Why is x3.grad still None? Shouldn't it be tensor(10.) ?
When I run the following lines of code, x3.grad is evaluated to tensor(10.)
x3 = tensor(5.).requires_grad_()
l = (x3**2).mean()
l.backward()
print(x3.grad)

If you print x3.grad on your first example you might notice torch outputs a warning:
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See here for more informations.
To save memory the gradients of the non-leaf tensors (non user-created tensors) are not buffered.
If you wish to see those gradients though you can retain the gradient on x3 by calling .retain_grad() before creating the graph (i.e. before calling .backward().
x3.retain_grad()
l.backward()
print(x3.grad)
will indeed output tensor(10.)

Related

How to get the partial derivative of probability to input in pyTorch?

I want to generate attack samples via the following steps:
Find a pre-trained CNN classification model, whose input is X and output is P(y|X), and the most possible result of X is y.
I want to input X' and get y_fool, where X' is not far away from X and y_fool is not equal to y
The steps for getting X' is:enter image description here
How can I get the partial derivative described in the image?
Here is my code but I got None: (The model is Vgg16)
x = torch.autograd.Variable(image, requires_grad=True)
output = model(image)
prob = nn.functional.softmax(output[0], dim=0)
prob.backward(torch.ones(prob.size()))
print(x.grad)
How should I modify my codes? Could someone help me? I would be absolutely grateful.
Here, the point is to backpropagate a "false" example through the network, in other words you need to maximize one particular coordinate of your output which does not correspond to the actual label of x.
Let's say for example that your model outputs N-dimensional vectors, that x label should be [1, 0, 0, ...] and that we will try to make the model actually predict [0, 1, 0, 0, ...] (so y_fool actually has its second coordinate set to 1, instead of the first one).
Quick note on the side : Variable is deprecated, just set the requires_grad flag to True. So you get :
x = torch.tensor(image, requires_grad=True)
output = model(x)
# If the model is well trained, prob_vector[1] should be almost 0 at the beginning
prob_vector = nn.functional.softmax(output, dim=0)
# We want to fool the model and maximize this coordinate instead of prob_vector[0]
fool_prob = prob_vector[1]
# fool_prob is a scalar tensor, so we can backward it easy
fool_prob.backward()
# and you should have your gradients :
print(x.grad)
After that, if you want to use an optimizer in your loop to modify x, remember that pytorch optimizer.step method tries to minimize the loss, whereas you want to maximize it. So either you use a negative learning rate or you change the backprop sign :
# Maximizing a scalar is minimizing its opposite
(-fool_prob).backward()

How can I matrix-multiply two PyTorch quantized Tensors?

I am new to tensor quantization, and tried doing something as simple as
import torch
x = torch.rand(10, 3)
y = torch.rand(10, 3)
x#y.T
with PyTorch quantized tensors running on CPU. I thus tried
scale, zero_point = 1e-4, 2
dtype = torch.qint32
qx = torch.quantize_per_tensor(x, scale, zero_point, dtype)
qy = torch.quantize_per_tensor(y, scale, zero_point, dtype)
qx#qy.T # I tried...
..and got as error
RuntimeError: Could not run 'aten::mm' with arguments from the
'QuantizedCPUTensorId' backend. 'aten::mm' is only available for these
backends: [CUDATensorId, SparseCPUTensorId, VariableTensorId,
CPUTensorId, SparseCUDATensorId].
Is matrix multiplication just not supported, or am I doing something wrong?
It is not straight forward to implement matrix multiplication for quantized matrices. Therefore, the "conventional" matrix multiplication (#) does not support it (as your error message suggests).
You should look at quantized operations, e.g., torch.nn.quantized.functional.linear:
torch.nn.quantized.functional.linear(qx[None,...], qy.T)

error while printing the predicted value in multiple linear regression

from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
x1 = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y1 = np.asanyarray(test[['CO2EMISSIONS']])
xy = regr.predict(y1)
print(xy) // an error is generating while printing this (valueError)
this worked in simple linear regression but here is not working in multiple-linear-regression
regr.predict expects the same shape of x.
Furthermore, when you want to predict something, it should be based on some input, not output.
So, xy = regr.predict(y1) is wrong.
You should try xy = regr.predict(x1) instead.
The reason why it works (but in fact, it is not correct) in simple regression in that you provide a 1D array to regr.predict. As mentionned, this should be regr.predict(x1) instread of regr.predict(y1), since you are trying to predict y1 from x1. The algorithm does not "distinguish" between x1 and y1 in simple regression because they are both 1D arrays, so it does not raise an error.
However in multiple regression, you fit an equation on a 2D or 3D or...N-dimensional x array. So, when you run regr.predict(y1), it raises an error because you are trying to predict with the 1D y1 array.
Just replace regr.predict(y1) by regr.predict(x1) and it will work both for simple and multiple regrerssion.

In tensorflow 2.0 keras, how can we get gradients for the regularisation parameters like l1 and l2?

For example if we build a network using
tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.L1L1(l1=0.1, l2=0.2))
You get entries appearing in the model.losses however they do not appear to get recalculated when changing l1, and l2. Furthermore, it seems like the loss function is not differentiable w.r.t l1 and l2 even after replace l1, l2 with
l1 = tf.keras.backend.variable(name='l1', value=0.1)
l2 = tf.keras.backend.variable(name='l2', value=0.1)
The variables are trainable via some function that explicitly uses them.
UPDATE:
I have a feeling it is the _gather_children... function that has a non-differentiable boolean in it unfortunately.
OperatorNotAllowedInGraphError: in converted code:
<ipython-input-272-b7a1c0374804>:7 train_one *
d = loss(x, y)
<ipython-input-222-e2bf7d12dcd8>:10 loss *
layer_losses = tf.reduce_mean(agent.predictor.losses)
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:1015 losses
return collected_losses + self._gather_children_attribute('losses')
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:2336 _gather_children_attribute
getattr(layer, attribute) for layer in nested_layers))
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:2336 <genexpr>
getattr(layer, attribute) for layer in nested_layers))
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:1012 losses
loss_tensor = regularizer()
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:1087 _tag_unconditional
loss = loss()
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py:1916 _loss_for_variable
regularization = regularizer(v)
asdf/lib/python3.7/site-packages/tensorflow_core/python/keras/regularizers.py:57 __call__
if not self.l1 and not self.l2:
asdf/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:450 __bool__
return bool(self.read_value())
asdf/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:765 __bool__
self._disallow_bool_casting()
asdf/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:531 _disallow_bool_casting
"using a `tf.Tensor` as a Python `bool`")
asdf/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:518 _disallow_when_autograph_enabled
" decorating it directly with #tf.function.".format(task))
OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did not convert this function. Try decorating it directly with #tf.function.

Element wise calculation breaks autograd

I am using pytorch to calculate loss for a logistic regression (I know pytorch can do this automatically but I have to make it myself). My function is defined below but the cast to torch.tensor breaks autograd and gives me w.grad = None. Im new to pytorch so Im sorry.
logistic_loss = lambda X,y,w: torch.tensor([torch.log(1 + torch.exp(-y[i] * torch.matmul(w, X[i,:]))) for i in range(X.shape[0])], requires_grad=True)
Your post isn't very clear on details and this is a monster of a one-liner. I first reworked it to make a minimal, complete, verifiable example. Please correct me if I misunderstood your intentions and please do it yourself next time.
import torch
# unroll the one-liner to have an easier time understanding what's going on
def logistic_loss(X, y, w):
elementwise = []
for i in range(X.shape[0]):
mm = torch.matmul(w, X[i, :])
exp = torch.exp(-y[i] * mm)
elementwise.append(torch.log(1 + exp))
return torch.tensor(elementwise, requires_grad=True)
# I assume that's the excepted dimensions of your input
X = torch.randn(5, 30, requires_grad=True)
y = torch.randn(5)
w = torch.randn(30)
# I assume you backpropagate from a reduced version
# of your sum, because you can't call .backward on multi-dimensional
# tensors
loss = logistic_loss(X, y, w).mean()
loss.mean().backward()
print(X.grad)
The simplest solution to your problem is to replace torch.tensor(elementwise, requires_grad=True) with torch.stack(elementwise). You can think of torch.tensor as a constructor for entirely new tensors, if your tensor is more of a result of some mathematical expression, you should use operations like torch.stack or torch.cat.
That being said, this code is still wildly inefficient because you do manual looping over i. Instead, you could write simply
def logistic_loss_vectorized(X, y, w):
mm = torch.matmul(X, w)
exp = torch.exp(-y * mm)
return torch.log(1 + exp)
which is mathematically equivalent, but will be much faster in practice, because it allows for better parallelization due to lack of explicit looping.
Note that there is still a numerical issue with this code - you're taking a logarithm of an exponential, but the intermediate result, called exp, is likely to attain very high values, causing loss of precision. There are workarounds for that, which is why the loss functions provided by PyTorch are preferable.

Resources