The following is my toy example
import torch
x = torch.tensor(3.0, requires_grad = True)
y = x**2
y.backward(retain_graph = True)
print(x.grad)
x = x + 4
y.backward(retain_graph = True)
print(x.grad)
The first print prints the gradient of x, while the second print prints nothing. Why the gradient of x disappear after x is updated by x = x + 4? Thanks.
Newly added questions:
The following code could do what I want, which update x iteratively. However, I need to add x.requires_grad = True every time it is updated. Is there any better way without using x.requires_grad = True? Thanks.
x = torch.tensor(3.0, requires_grad = True)
y = x**2
y.backward(retain_graph = True)
with torch.no_grad():
x = x + x.grad
x.requires_grad = True
y = x**2
y.backward(retain_graph = True)
print(x.grad)
Update: my solution
x = torch.tensor(3.0, requires_grad = True)
y = x**2
y.backward(retain_graph = True)
print(x.grad)
x.data = x.data + x.grad.data
x.grad.zero_()
y = x**2
y.backward(retain_graph = True)
print(x.grad)
The result of the code is
tensor(6.)
tensor(18.)
, which is exactly what I want to have. Thanks.
That is mainly because x=x+4 doesn't update the tensor; it creates a new tensor and assigns it to the variable x.
I changed the code to print the data pointer of the tensor in x before and after x=x+4 as follows:
import torch
x = torch.tensor(3.0, requires_grad = True)
y = x**2
y.backward(retain_graph = True)
print(x.data_ptr())
print(x.grad)
z = x
x = x + 4
y.backward(retain_graph = True)
print(x.data_ptr())
print(x.grad)
print(x is z)
print(z.grad)
The output was:
3007755542592
tensor(6.)
3007755541184
None
False
tensor(12.)
First, you will notice that the data pointer of the tensor in x has changed after x=x+4. The is because x+4 created a new tensor which is now what x holds.
Second, I kept the original tensor in another variable z. As you can see x is z returns False and the gradient of z has twice the gradient since y.backward(retain_graph = True) was called twice. z now holds the tensor which was in x before the line x=x+4.
Related
I was trying to write a program which plots level set for any given function.
rmin = -5.0
rmax = 5.0
c = 4.0
x = np.arange(rmin,rmax,0.1)
y = np.arange(rmin,rmax,0.1)
x,y = np.meshgrid(x,y)
f = lambda x,y: y**2.0 - 4*x
realplots = []
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if abs(f(x[i,j],y[i,j])-c)< 1e-4:
realplots.append([x[i,j],y[i,j]])`
But it being a nested for loop, is taking lot of time. Any help in vectorizing the above code/new method of plotting level set is highly appreciated.(Note: The function 'f' will be changed at the time of running.So, the vectorization must be done without considering the function's properties)
I tried vectorizing through
ans = np.where(abs(f(x,y)-c)<1e-4,np.array([x,y]),[0,0])
but it was giving me operands could not be broadcast together with shapes (100,100) (2,100,100) (2,)
I was adding [0,0] as an escape from else condition in np.where which is indeed wrong.
Since you get the values rather than the indexes, you don't really need np.where.
You can directly use the mask to index x and y, look at the "Boolean array indexing" section of the documentation.
It is straightforward:
def vectorized(x, y, c, f, threshold):
mask = np.abs(f(x, y) - c) < threshold
x, y = x[mask], y[mask]
return np.stack([x, y], axis=-1)
Your function for reference:
def op(x, y, c, f, threshold):
res = []
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if abs(f(x[i, j], y[i, j]) - c) < threshold:
res.append([x[i, j], y[i, j]])
return res
Tests:
rmin, rmax = -5.0, +5.0
c = 4.0
threshold = 1e-4
x = np.arange(rmin, rmax, 0.1)
y = np.arange(rmin, rmax, 0.1)
x, y = np.meshgrid(x, y)
f = lambda x, y: y**2 - 4 * x
res_op = op(x, y, c, f, threshold)
res_vec = vectorized(x, y, c, f, threshold)
assert np.allclose(res_op, res_vec)
I know the torch.autograd.grad() returns None if the gradient is stopped somehow, however, I am wondering what is wrong with the following snippet?
x = torch.rand(6, requires_grad=True)
y = x.pow(2).sum()
z = torch.cat([x])
grad1 = torch.autograd.grad(y, x, allow_unused=True)
grad2 = torch.autograd.grad(y, z, allow_unused=True)
print(f'grad1 = {grad1}, grad = {grad2}')
The output is grad1 = (tensor([0.3705, 0.7468, 0.6102, 1.8640, 0.3518, 0.5397]),), grad = (None,).
I am expecting the grad2 is the same to grad1, because z is essentially the x. May I know why please?
Update: After reading the post and the help from #Ivan, I conclude the reason is x is a leaf node of y but z is not any more. x is the leaf node of both y and z in the computation graph, but there is no direct path from z to y, so the torch.autograd.grad returns None.
Note: The returned value None does not necessarily guarantee the values are 0.
Tensor z was not used to compute the value of y, as such it is not connected to its computation graph, and you won't get a gradient on z, since it's not connected to y.
On the other hand, the following will work:
>>> y = x.pow(2).sum()
>>> torch.autograd.grad(y, x, allow_unused=True)
(tensor([0.3134, 1.6802, 0.1989, 0.8495, 1.9203, 1.0905]),)
>>> z = torch.cat([x])
>>> y = z.pow(2).sum()
>>> torch.autograd.grad(y, z, allow_unused=True)
(tensor([0.3134, 1.6802, 0.1989, 0.8495, 1.9203, 1.0905]),)
I want to implement an Fourier Ring Correlation Loss for two images to train a GAN. Therefore I'd like to loop over a specific amount of times and calculate the loss. This works fine for a normal Python loop. To speed up the process I want to use the tf.while_loop but unfortunately I am not able to track the gradients through my while loop. I constructed a dummy example just to calculate gradients during a while loop but it doesn't work. First, the working python loop :
x = tf.constant(3.0)
y = tf.constant(2.0)
for i in range(3):
y = y * x
grad = tf.gradients(y, x)
with tf.Session() as ses:
print("output : ", ses.run(grad))
This works and gives the output
[54]
If i do the same with a tf.while_loop it doesn't work:
a = tf.constant(0, dtype = tf.int64)
b = tf.constant(3, dtype = tf.int64)
x = tf.constant(3.0)
y = tf.constant(2.0)
def cond(a,b,x,y):
return tf.less(a,b)
def body(a,b,x,y):
y = y * x
with tf.control_dependencies([y]):
a = a + 1
return [a,b,x,y]
results = tf.while_loop(cond, body, [a,b,x,y], back_prop = True)
grad = tf.gradients(y, results[2])
with tf.Session() as ses:
print("grad : ", ses.run(grad))
The output is :
TypeError: Fetch argument None has invalid type '<'class 'NoneType'>
So I guess somehow tensorflow is not able to do the backpropagation.
The problem still accours if you use tf.GradientTape() instead of tf.gradients().
I changed the code so that it now outputs the gradients:
import tensorflow as tf
a = tf.constant(0, dtype = tf.int64)
b = tf.constant(3, dtype = tf.int64)
x = tf.Variable(3.0, tf.float32)
y = tf.Variable(2.0, tf.float32)
dy = tf.Variable(0.0, tf.float32)
def cond(a,b,x,y,dy):
return tf.less(a,b)
def body(a,b,x,y,dy):
y = y * x
dy = tf.gradients(y, x)[0]
with tf.control_dependencies([y]):
a = a + 1
return [a,b,x,y,dy]
init = tf.global_variables_initializer()
with tf.Session() as ses:
ses.run(init)
results = ses.run(tf.while_loop(cond, body, [a,b,x,y,dy], back_prop = True))
print("grad : ", results[-1])
The things I modified:
I made x and y into variables and added their initialisation init.
I added a variable called dy which will contain the gradient of y.
I moved the tf.while_loop inside the session.
Put the evaluation of the gradient inside the body function
I think the problem before was that when you define grad = tf.gradients(y, results[2]) the loop has not run yet, so y is not a function of x. Therefore, there is no gradient.
Hope this helps.
I'm reading the basic tutorial of tensorflow serving. From mnist_saved_model.py I can't uderstand something:
serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
feature_configs = {'x': tf.FixedLenFeature(shape=[784], dtype=tf.float32),}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
I don't understand why we use the name 'x' in feature_configs.
It's using a linear equation, where convention has it that y as the output and x as the input.
y = x * w + b
x = input
w = weights
b = bias
y = output
Well, the problem is with delta1, I've checked over math couple times, it seems good to me, everything should be correct with delta2, but it doesn't match with W2 transposed, here is backpropagation:
def backward(self, X, Y):
X = np.array(X)
Y = np.array(Y)
delta2 = -(Y - self.yHat) * self.deriv_sigmoid(self.a2)
dJdW2 = np.dot(self.a2.T, delta2)
delta1 = np.dot(delta2, self.W2.T)*self.deriv_sigmoid(self.a1)
dJdW1 = np.dot(X.T, delta1)
return dJdW1, dJdW2
here is forward propagation:
def forward(self, X):
self.X = X
self.a1 = np.dot(self.W1, X)
self.Z1 = self.sigmoid(self.a1)
self.a2 = np.dot(self.W2, self.Z1)
self.yHat = self.sigmoid(self.a2)
return self.yHat
And here is file from witch I call it:
NN = nn.Neural_Network(2, 3, 1)
X = [[1],[1],]
Y = [[1],]
yHat = NN.forward(X)
dJdW1, dJdW2 = NN.backward(X, Y)
I've tried checking placings in np.dot(), but it seems to be correct, and here is full code: https://hastebin.com/ikijahecaz.py