So I was trying to calculate the gradient wrt input, using a combination of Keras and tensorflow:
the code (in a loop) is like:
import keras.backend as K
loss = K.categorical_crossentropy(model's output, target)
gradient = sess.run([tf.gradients(loss, model.input, colocate_gradients_with_ops=True)],feed_dict={model.input: img}) # img is a numpy array that fits the dimension requirements
n_operations = len(tf.get_default_graph().get_operations())
I noticed that "n_operations" increases every iteration, and so as time it costs. Is that normal? Is there any way to prevent this?
Thank you!
No this is not the desired behavior. Your problem is that you are defining your gradient operation again and again, while you only need to execute the operation. The tf.gradient function pushes new operations onto the graph and return a handle to those gradients. So you only have to execute them to get the desired results. With multiple runs of the function multiple operations are generated and this will eventually ruin your performance. The solution is as follows:
# outside the loop
loss = K.categorical_crossentropy(model's output, target)
gradients = tf.gradients(loss, model.input, colocate_gradients_with_ops=True)
# inside the loop
gradient_np = sess.run([gradients],feed_dict={model.input: img}) # img is a numpy array that fits the dimension requirements
Related
I am trying to forward pass 2 different inputs with the same model as shown below:
for epoch in range(num_epochs):
dataloader.sampler.set_epoch(epoch)
for batch_index, (real,_) in enumerate(dataloader):
disc.zero_grad()
real=real.to(rank)
noise=torch.randn((batch_size,z_dim,1,1)).to(rank)
fake_img=gen(noise)
fake_img_clone=fake_img.detach().clone()
disc_real=disc(real).reshape(-1)
lossD_real=critereon(disc_real,torch.ones_like(disc_real))
disc_fake=disc(fake_img.detach()).reshape(-1)
lossD_fake=critereon(disc_fake,torch.zeros_like(disc_fake))
lossD = (lossD_fake+lossD_real)/2
opt_disc.step()
However, I keep getting the error "one of the variables needed for gradient computation has been modified by an inplace operation"
Setting torch.autograd.set_detect_anomaly(True, check_nan=True) shows that the error occurs in disc_real=disc(real).reshape(-1), but when I manually debug it, the error occurs only when I add the second forward pass line disc_fake=disc(fake_img.detach()).reshape(-1)
I am currently using the latest version of pytorch. Please help me solve this :frowning:
The error message "one of the variables needed for gradient computation has been modified by an inplace operation" typically occurs when you modify a tensor in-place, which can break the computation graph and cause issues with backpropagation.
In your code, it seems that you are modifying the fake_img tensor in-place when you detach and clone it with fake_img_clone=fake_img.detach().clone(). This is because the detach() function returns a new tensor with the same data as the original tensor, but it shares the same storage as the original tensor, which means that modifying the new tensor will also modify the original tensor.
To avoid this issue, you can detach the fake_img tensor without cloning it, like this: fake_img.detach(). This will create a new tensor that is not connected to the computation graph and will not be modified by any subsequent operations.
Here's the updated code:
for epoch in range(num_epochs):
dataloader.sampler.set_epoch(epoch)
for batch_index, (real,_) in enumerate(dataloader):
disc.zero_grad()
real = real.to(rank)
noise = torch.randn((batch_size,z_dim,1,1)).to(rank)
fake_img = gen(noise)
disc_real = disc(real).reshape(-1)
lossD_real = critereon(disc_real, torch.ones_like(disc_real))
with torch.no_grad():
fake_img_detached = fake_img.detach()
disc_fake = disc(fake_img_detached).reshape(-1)
lossD_fake = critereon(disc_fake, torch.zeros_like(disc_fake))
lossD = (lossD_fake + lossD_real) / 2
lossD.backward()
opt_disc.step()
In the updated code, we detach the fake_img tensor without cloning it, and store the result in a new tensor fake_img_detached with a with torch.no_grad() context manager to avoid tracking the gradient of the detached tensor. We then use fake_img_detached for the forward pass of the discriminator, which should avoid the in-place modification issue.
I have an expensive filter to run based on binary dilations in a loop. It grows a mask with some conditions on the way and applies that mask at the output. I've got the dilation itself formulated as a tensorflow convolution so that runs reasonaby fast (the TFdilate function below) but I'd like to put the whole filter in tensorflow and have it run on the GPU. the issue in the code below is the conversion of the mask variable back into numpy to apply mask[Pbool]=False and then back into tf. I've tried tf.where but that crashed my kernel (the mask array is very big ~40000x80000). Ultimately what I need amounts to a tf equivalent of np.putmask. Any ideas?
def FixBigRiverTF(ClassRaster, bigwater, Prob50, thresh):
mask=np.logical_and(bigwater, ClassRaster==1)
mask=tf.convert_to_tensor(mask, dtype=tf.bool)
tfbool=tf.convert_to_tensor(ClassRaster==2, dtype=tf.bool)
Pbool=Prob50<thresh
for s in range(100):
oldsum=tf.math.count_nonzero(mask)
mask=TFdilate(mask, 2)
mask=tf.math.logical_and(mask, tfbool)
#mask=tf.where(tf.equal(Pbool, True),False, mask)
mask=mask.numpy()
mask[Pbool]=False
mask=tf.convert_to_tensor(mask, dtype=tf.bool)
newsum=tf.math.count_nonzero(mask)
if newsum==oldsum:
#print('broke big river filter at '+str(s))
break
cond=mask.numpy()==1
ClassRaster[cond] = 1
return ClassRaster
Using OpenAI's gym environment, I've created my own environment in which the observation space of box type, and the shape is (21,21,1).
The intention is to use a keras Conv2D layer as the model's input. Ideally, the shape going into this model would be (None,21,21,1), with None representing the batch size. Kera's documentation is here: https://keras.io/api/layers/convolution_layers/convolution2d/
The issue I'm having is that an extra dimension is being required while checking the shaping. Because of this, the shape it expects is (None,1,21,21,1). This is prohibiting me from using MaxPooling layers in the model. After investigating the keras RL library, this is due to two functions that are adding this dimensionality.
The first function is found in memory.py, where a current observation is put into a list and returned as such. Here:
def get_recent_state(self, current_observation):
"""Return list of last observations
# Argument
current_observation (object): Last observation
# Returns
A list of the last observations
"""
# This code is slightly complicated by the fact that subsequent observations might be
# from different episodes. We ensure that an experience never spans multiple episodes.
# This is probably not that important in practice but it seems cleaner.
state = [current_observation]
idx = len(self.recent_observations) - 1
for offset in range(0, self.window_length - 1):
current_idx = idx - offset
current_terminal = self.recent_terminals[current_idx - 1] if current_idx - 1 >= 0 else False
if current_idx < 0 or (not self.ignore_episode_boundaries and current_terminal):
# The previously handled observation was terminal, don't add the current one.
# Otherwise we would leak into a different episode.
break
state.insert(0, self.recent_observations[current_idx])
while len(state) < self.window_length:
state.insert(0, zeroed_observation(state[0]))
return state
The second function is called just after and computes the Q values based on the recent observation. It creates a list of the state when passing onto "compute_batch_q_values".
def compute_q_values(self, state):
q_values = self.compute_batch_q_values([state]).flatten()
assert q_values.shape == (self.nb_actions,)
return q_values
I understand that one extra dimension should be added to represent the batch size, but is it twice? Can anyone explain why this is or how to use Conv2d layers with OpenAI gym?
Thanks.
I want to copy the gradients of loss, with respect to weight, for different data samples using pytorch. In the code below, I am iterating one sample each time from the data loader (batch size = 1) and collecting gradients for 1st fully connected (fc1) layer. Gradients should be different for different samples. The print function shows correct gradients, which are different for different samples. But when I store them in a list, I get the same gradients repeatedly. Any suggestions would be much appreciated. Thanks in advance!
grad_list = [ ]
for data in test_loader:
inputs, labels = data[0], data[1]
inputs = torch.autograd.Variable(inputs)
labels = torch.autograd.Variable(labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward
output = target_model(inputs)
loss = criterion(output, labels)
loss.backward()
grad_list.append(target_model.fc1.weight.grad.data)
print(target_model.fc1.weight.grad.data)
Try using clone and detach instead:
grad_list.append(target_model.fc1.weight.grad.clone().detach())
The data property you are appending to your list is a mutable reference to the storage of the parameter (i.e. the actual memory address and the values contained within). What you need to do is create a replica of the gradient tensor (with clone) and remove it from the computational graph (with detach) to avoid it interfering with gradient computation.
I would like to know if Keras can be used as an interface to TensoFlow for only doing computation on my GPU.
I tested TF directly on my GPU. But for ML purposes, I started using Keras, including the backend. I would find it 'comfortable' to do all my stuff in Keras instead of Using two tools.
This is also a matter of curiosity.
I found some examples like this one:
http://christopher5106.github.io/deep/learning/2018/10/28/understand-batch-matrix-multiplication.html
However this example does not actually do the calculation.
It also does not get input data.
I duplicate the snippet here:
'''
from keras import backend as K
a = K.ones((3,4))
b = K.ones((4,5))
c = K.dot(a, b)
print(c.shape)
'''
I would simply like to know if I can get the result numbers from this snippet above, and how?
Thanks,
Michel
Keras doesn't have an eager mode like Tensorflow, and it depends on models or functions with "placeholders" to receive and output data.
So, it's a little more complicated than Tensorflow to do basic calculations like this.
So, the most user friendly solution would be creating a dummy model with one Lambda layer. (And be careful with the first dimension that Keras will insist to understand as a batch dimension and require that input and output have the same batch size)
def your_function_here(inputs):
#if you have more than one tensor for the inputs, it's a list:
input1, input2, input3 = inputs
#if you don't have a batch, you should probably have a first dimension = 1 and get
input1 = input1[0]
#do your calculations here
#if you used the batch_size=1 workaround as above, add this dimension again:
output = K.expand_dims(output,0)
return output
Create your model:
inputs = Input(input_shape)
#maybe inputs2 ....
outputs = Lambda(your_function_here)(list_of_inputs)
#maybe outputs2
model = Model(inputs, outputs)
And use it to predict the result:
print(model.predict(input_data))