I'm getting "can't optimize a non-leaf Tensor" on this bit of code
self.W_ch1 = nn.Parameter(
torch.rand(encoder_feature_dim, encoder_feature_dim), requires_grad=True
).to(self.device)
self.W_ch1_optimizer = torch.optim.Adam([self.W_ch1], lr=encoder_lr)
Don't know why it's happening that should be the leaf tensor, because it has no children connected to it. It's just a torch.rand inside a nn.Parameter variable. It throws the error at the initialization of self.w_ch1_optmizer
The reason why it throws an error is that torch.Tensor.cuda has the effect of creating a reference for transferring the data doing so by registering a new node in the graph. In other words your parameter module W_ch1 is no longer a leaf node since you already have this "computation" tree:
nn.Parameter -> cuda:parameter = W_ch1
You can compare the following two results:
>>> p = nn.Parameter(torch.rand(1)).cuda()
>>> p.is_leaf
False
What you need to be doing is first instantiate your modules, and define your optimizer(s). Only then can you transfer them to the desired device. Not before:
>>> p = nn.Parameter(torch.rand(1))
>>> optimizer = optim.Adam([p], lr=lr)
Then you can transfer everything:
>>> p.cuda()
>>> optimizer.cuda()
Related
I gotta add the nodes and edges from two different lists that I created. Although the nodes are being processed but the edges aren't and thus, the error.
Here's a li'l snippet of my code:
#list3 is the list of the whole dataset
listOfRelations = []
listOfNodes = []
for index in range(0, len(list3)):
#if list3[index].isalpha():
if list3[index].isdigit():
listOfNodes.append(list3[index])
else:
listOfRelations.append(list3[index])
#testing purposes
print(listOfNodes[3])
print(listOfRelations[2])
G.add_nodes_from((listOfNodes))
G.add_edges_from((TupleOfEdges))
I also tried to convert the list into a tuple, but that didn't work either :(
Error: NetworkXError: Edge tuple _hypernym must be a 2-tuple or 3-tuple.
I'm working on creating a dictionary of constraints for a large SCED power problem for minimization. However, I'm being given a ValueError saying an unknown type is passed despite only using Optimize.LinearConstraints at present. When I change to NonlinearConstraints (shown below), indicating that 'NonlinearConstraint' object has no attribute 'A'.
I have a feeling it's due to recursive elements, as even using a single constraint as I've defined them returns the same error
Any idea how I can create the recursive linear constraints?
##EDIT
I've been told to copy the code and provide a bit more context. "gen_supply_seg" is a three dimensional array that, depending on different points in time, has different constraints
def con2a():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2a = optimize.NonlinearConstraint(gen_supply_seg[t,g,1],lb=0,ub=P2Max[g])
return(nlc2a)
def con2b():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2b = optimize.NonlinearConstraint(gen_supply_seg[t,g,2],lb=0,ub=P3Max[g])
return (nlc2b)
def con2c():
for t in range(len(LOAD)):
for g in range(len(GEN)):
nlc2c = optimize.NonlinearConstraint(gen_supply_seg[t,g,3],lb=0,ub=P4Max[g])
return (nlc2c)
con2a = con2a()
con2b = con2b()
con2c = con2c()
These constraints are then added to a set like shown
cons = (con2a,
con2b,
con2c)
I am trying to compute a loss on the jacobian of the network (i.e. to perform double backprop), and I get the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I can't find the inplace operation in my code, so I don't know which line to fix.
*The error occurs in the last line:
loss3.backward()
inputs_reg = Variable(data, requires_grad=True)
output_reg = self.model.forward(inputs_reg)
num_classes = output.size()[1]
jacobian_list = []
grad_output = torch.zeros(*output_reg.size())
if inputs_reg.is_cuda:
grad_output = grad_output.cuda()
jacobian_list = jacobian.cuda()
for i in range(10):
zero_gradients(inputs_reg)
grad_output.zero_()
grad_output[:, i] = 1
jacobian_list.append(torch.autograd.grad(outputs=output_reg,
inputs=inputs_reg,
grad_outputs=grad_output,
only_inputs=True,
retain_graph=True,
create_graph=True)[0])
jacobian = torch.stack(jacobian_list, dim=0)
loss3 = jacobian.norm()
loss3.backward()
You can make use of set_detect_anomaly function available in autograd package to exactly find which line is responsible for the error.
Here is the link which describes the same problem and a solution using the abovementioned function.
grad_output.zero_() is in-place and so is grad_output[:, i-1] = 0. In-place means "modify a tensor instead of returning a new one, which has the modifications applied". An example solution which is not in-place is torch.where. An example use to zero out the 1st column
import torch
t = torch.randn(3, 3)
ixs = torch.arange(3, dtype=torch.int64)
zeroed = torch.where(ixs[None, :] == 1, torch.tensor(0.), t)
zeroed
tensor([[-0.6616, 0.0000, 0.7329],
[ 0.8961, 0.0000, -0.1978],
[ 0.0798, 0.0000, -1.2041]])
t
tensor([[-0.6616, -1.6422, 0.7329],
[ 0.8961, -0.9623, -0.1978],
[ 0.0798, -0.7733, -1.2041]])
Notice how t retains the values it had before and zeroed has the values you want.
Thanks!
I replaced the problematic code of the inplace operation in grad_output with:
inputs_reg = Variable(data, requires_grad=True)
output_reg = self.model.forward(inputs_reg)
num_classes = output.size()[1]
jacobian_list = []
grad_output = torch.zeros(*output_reg.size())
if inputs_reg.is_cuda:
grad_output = grad_output.cuda()
for i in range(5):
zero_gradients(inputs_reg)
grad_output_curr = grad_output.clone()
grad_output_curr[:, i] = 1
jacobian_list.append(torch.autograd.grad(outputs=output_reg,
inputs=inputs_reg,
grad_outputs=grad_output_curr,
only_inputs=True,
retain_graph=True,
create_graph=True)[0])
jacobian = torch.stack(jacobian_list, dim=0)
loss3 = jacobian.norm()
loss3.backward()
I hope your problem got solved. I had this problem and solutions like using function clone() did not work for me. But when I installed pytorch version 1.4, it solved.
I think this problem is kind of bug in step() function. Some weird thing is this bug happen when you use pytorch version 1.5 but it's not in v1.4.
You can see all released versions of pytorch in this link.
I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb
I'd like to build a tensorflow graph in a separate function get_graph(), and to print out a simple ops a in the main function. It turns out that I can print out the value of a if I return a from get_graph(). However, if I use get_operation_by_name() to retrieve a, it print out None. I wonder what I did wrong here? Any suggestion to fix it? Thank you!
import tensorflow as tf
def get_graph():
graph = tf.Graph()
with graph.as_default():
a = tf.constant(5.0, name='a')
return graph, a
if __name__ == '__main__':
graph, a = get_graph()
with tf.Session(graph=graph) as sess:
print(sess.run(a))
a = sess.graph.get_operation_by_name('a')
print(sess.run(a))
it prints out
5.0
None
p.s. I'm using python 3.4 and tensorflow 1.2.
Naming conventions in tensorflow are subtle and a bit offsetting at first.
The thing is, when you write
a = tf.constant(5.0, name='a')
a is not the constant op, but its output. Names of op outputs derive from the op name by adding a number corresponding to its rank. Here, constant has only one output, so its name is
print(a.name)
# `a:0`
When you run sess.graph.get_operation_by_name('a') you do get the constant op. But what you actually wanted is to get 'a:0', the tensor that is the output of this operation, and whose evaluation returns an array.
a = sess.graph.get_tensor_by_name('a:0')
print(sess.run(a))
# 5
absolute tensorflow beginner here. I am trying to construct two random tensors and subtract them for an assignment. However I seem to have some issues with understanding how exactly the subtraction process works.
x=tf.random_normal([5],seed=123456)
y=tf.random_normal([5],seed=987654)
print(sess.run(x),sess.run(y))
I get the following outputs:
[ 0.38614973 2.97522092 -0.85282576 -0.57114178 -0.43243945]
[-0.43865281 0.08617876 -2.17495966 -0.24574816 -1.94319296]
But when I try
print(sess.run(x-y))
I get
[-1.88653958 -0.03917438 0.87480474 0.40511152 0.52793759]
Now if I run
print(sess.run(tf.subtract(x,y)))
I also get other wrong values.
[-1.97681355 1.10086703 1.41172433 1.55840468 0.04344697]
I hope somebody can help me out here. Thanks in advance!
This problem occurs when you executes x - y multiple times since each time x and y will be assigned a different value. This is because when you write something like x=tf.random_normal([5],seed=123456)
There really isn't any actual computation. TensorFlow is just constructing an operation node within the static computation graph. It is when you do sess.run() real computation happens.
So, consider the x=tf.random_norm([5], seed=123456) as a random number generator. The first time you call sess.run(), x has initial seed value 123456. But the second time you call sess.run() the state of the random number generator has already changed, so the value will be different.
You can verify this by running the following code:
import tensorflow as tf
x = tf.random_normal([5], seed=123456)
with tf.Session() as sess:
sess.run(x)
sess.run(x)
sess.run(x)
The output will be
[ 0.38614973, 2.97522092, -0.85282576, -0.57114178, -0.43243945]
[-1.41140664, -0.50017339, 1.59816611, 0.07829454, -0.36143178]
[-1.10523391, -0.15264226, 1.79153454, 0.42320547, 0.26876169]
This behaviour actually has to do with how the seed of your normal works, and how the session evaluates your nodes.
Tensorflow will use the seed of your random normal nodes when it creates them - not when it runs them :
>>> sess = tf.InteractiveSession()
>>> x = tf.random_normal([5], seed=123456)
>>> sess.run(x)
array([ 0.38614976, 2.97522116, -0.85282576, -0.57114178, -0.43243945], dtype=float32)
>>> sess.run(x)
array([-1.41140664, -0.50017333, 1.59816611, 0.07829454, -0.36143178], dtype=float32)
You can see that the values change when running x a second time.
Running sess.run(x-y) will actually run x (i.e. generate random numbers), then y (i.e. generate other random numbers), then x-y. Since you're not reinitializing the random generator with the seed before running tf.subtract(x,y), you get different results.