compute Gradients fro GradCam in tf 2.0 - python-3.x

I have updated my tensorflow in Python from 1.14 to 2.0 . Now I have a problem with gradient computing, in order to see the GradCam visualisation for a layer.
For example with a model named my_cnn_model, that is already fitted on data, for a classification problem with three classes. If I want to "compute the gradCam" for a given layer named "conv2d_3" for example, I would start with the following in 1.14 :
layer_conv = my_cnn_model.get_layer( "conv2d_3" )
#I want it with respect to the first class (so 0), because for example it might have been the model prediction for that image, so I check the proba for that class :
final_layer = my_cnn_model.output[:, 0]
#Then I computed the gradients like that :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
print(grads)
The last statement (print) would say (the shape is specific for the cnn I used but nevermind):
Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32)
Now, when I use tf 2.0 : the grads computing part, so :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
is not working any more, with the error :
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
I already searched, and found things like
with tf.GradientTape() as tape:
...
But all the same I get errors, or I couldn't get the same output Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32), so the rest of my gradcam function does not work.
How could I compute the grads, which of course would be similar to my 1.14 tf env? Do I miss something trivial?
Edit : I used the functionnal API, with my own CNN, or with "Transfer Learning" model already here in tf.keras, with modified/added layers at the top.
Thanks for any help.

If you are not interested in eager mode, like using old code all around, you can simply disable eager execution.
As mentioned here:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
If, on the other hand, you want to keep eager mode on, of if another thing is troubling your code, you can instead:
#you need a persistent tape if you're calling many gradients instead of just one
with tf.GradientTape(persistent = True) as tape:
#must "watch" all variables that are not "trainable weights"
#if you are using them for gradients
tape.watch(layer_conv.output)
#if the input data should be watched (you're getting the gradients related to the inputs)
input_tensor = tf.constant(input_data)
tape.watch(input_tensor)
#must do the entire prediction inside this tape block.
#it would be better if you could make your model output all tensors of interest
#not sure if you can do "some_layer.output" in eager mode for this purpose
model_outputs = model(input_tensor)
#finally, outside the block you can get the gradients
g1 = tape.gradient(model_outputs, layer_conv.output)
#again, maybe you need this layer output to be "actually output"
#instead of gotten from the layer like this
g2 = tape.gradient(some_output, input_tensor)
g3...
g4...
#finally delete the persistent tape
del tape

Related

Train two model iteratively with PyTorch

I hope to train two cascaded networks, e.g. X->Z->Y, Z=net1(X), Y=net2(Z).
I hope to optimize the parameters of these two networks iteratively, i.e., for a fixed parameter of net1, firstly train parameters of net2 using MSE(predY,Y) loss util convergence; then, use the converged MSE loss to train a iteration of net1, etc.
So, I define two optimizers for each networks respectively. My training code is below:
net1 = SimpleLinearF()
opt1 = torch.optim.Adam(net1.parameters(), lr=0.01)
loss_func = nn.MSELoss()
for itera1 in range(num_iters1 + 1):
predZ = net1(X)
net2 = SimpleLinearF()
opt2 = torch.optim.Adam(net2.parameters(), lr=0.01)
for itera2 in range(num_iters2 + 1):
predY = net2(predZ)
loss = loss_func(predY,Y)
if itera2 % (num_iters2 // 2) == 0:
print('iteration: {:d}, loss: {:.7f}'.format(int(itera2), float(loss)))
loss.backward(retain_graph=True)
opt2.step()
opt2.zero_grad()
loss.backward()
opt1.step()
opt1.zero_grad()
However, I encounter the following mistake:
RuntimeError: one of the variables needed for gradient computation has been modified by an
inplace operation: [torch.FloatTensor [1, 1]], which is output 0 of AsStridedBackward0, is at
version 502; expected version 501 instead. Hint: enable anomaly detection to find the
operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Does anyone know why this error occurs? How should I solve this problem. Many Thanks.
I found the answer to my question after some searching on PyTorch computation graph.
Just remove the retain_graph=True and add a .detach() in net2(predZ) will solve this error.
This detach operation can cut net1 away from the computation graph of net2/optimizor2.

Compute gradient between a scalar and vector in PyTorch

I am trying to replicate code which was written using Theano, to PyTorch. In the code, the author computes the gradient using
import theano.tensor as T
gparams = T.grad(cost, params)
and the shape of gparams is (256, 240)
I have tried using backward() but it doesn't seem to return anything. Is there an equivalent to grad within PyTorch?
Assume this is my input,
import torch
from torch.autograd import Variable
cost = torch.tensor(1.6019)
params = Variable(torch.rand(1, 73, 240))
cost needs to be a result of an operation involving params. You can't compute a gradient just knowing the values of two tensors. You need to know the relationship as well. This is why pytorch builds a computation graph when you perform tensor operations. For example, say the relationship is
cost = torch.sum(params)
then we would expect the gradient of cost with respect to params to be a vector of ones regardless of the value of params.
That could be computed as follows. Notice that you need to add the requires_grad flag to indicate to pytorch that you want backward to update the gradient when called.
# Initialize independent variable. Make sure to set requires_grad=true.
params = torch.tensor((1, 73, 240), requires_grad=True)
# Compute cost, this implicitly builds a computation graph which records
# how cost was computed with respect to params.
cost = torch.sum(params)
# Zero the gradient of params in case it already has something in it.
# This step is optional in this example but good to do in practice to
# ensure you're not adding gradients to existing gradients.
if params.grad is not None:
params.grad.zero_()
# Perform back propagation. This is where the gradient is actually
# computed. It also resets the computation graph.
cost.backward()
# The gradient of params w.r.t to cost is now stored in params.grad.
print(params.grad)
Result:
tensor([1., 1., 1.])

Obtaining hidden layer outputs in a denoising autoencoder using Keras

I have built a Sequential Keras model with three layers: A Gaussian Noise layer, a hidden layer, and the output layer with the same dimension as the input layer. For this, I'm using the Keras package that comes with Tensorflow 2.0.0-beta1. Thus, I'd like to get the output of the hidden layer, such that I circumvent the Gaussian Noise layer since it's only necessary in the training phase.
To achieve my goal, I followed the instructions in https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer, which are pretty much described in Keras, How to get the output of each layer? too.
I have tried the following example from the official Keras documentation:
from tensorflow import keras
from tensorflow.keras import backend as K
dae = keras.Sequential([
keras.layers.GaussianNoise( 0.001, input_shape=(10,) ),
keras.layers.Dense( 80, name="hidden", activation="relu" ),
keras.layers.Dense( 10 )
])
optimizer = keras.optimizers.Adam()
dae.compile( loss="mse", optimizer=optimizer, metrics=["mae"] )
# Here the fitting process...
# dae.fit( ยท )
# Attempting to retrieve a decoder functor.
encoder = K.function([dae.input, K.learning_phase()],
[dae.get_layer("hidden").output])
However, when K.learning_phase() is used to create the Keras backend functor, I get the error:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/contextlib.py", line 99, in __exit__
self.gen.throw(type, value, traceback)
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 534, in _scratch_graph
yield graph
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3670, in __init__
base_graph=source_graph)
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/eager/lift_to_graph.py", line 249, in lift_to_graph
visited_ops = set([x.op for x in sources])
File "/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/eager/lift_to_graph.py", line 249, in <listcomp>
visited_ops = set([x.op for x in sources])
AttributeError: 'int' object has no attribute 'op'
The code works great if I don't include K.learning_phase(), but I need to make sure that the output from my hidden layer is evaluated over an input that is not polluted with noise (i.e. in "test" mode -- not "training" mode).
I know my other option is to create a model from the original denoising autoencoder, but can anyone point me into why my approach from the officially documented functor creation fails?
Firstly, ensure your packages are up-to-date, as your script works fine for me. Second, encoder won't get the outputs - continuing from your snippet after # Here is the fitting process...,
x = np.random.randn(32, 10) # toy data
y = np.random.randn(32, 10) # toy labels
dae.fit(x, y) # run one iteration
encoder = K.function([dae.input, K.learning_phase()], [dae.get_layer("hidden").output])
outputs = [encoder([x, int(False)])][0][0] # [0][0] to index into nested list of len 1
print(outputs.shape)
# (32, 80)
However, as of Tensorflow 2.0.0-rc2, this will not work with eager execution enabled - disable via:
tf.compat.v1.disable_eager_execution()

How to use the result of embedding with mask_zero=True in keras

In keras, I want to calculate the mean of nonzero embedding output.
I wonder what is the difference between mask_zero=True or False in Embedding Layer.
I tried the code below :
input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
x = K.mean(x, axis=1, keepdims=True)
return x
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)
But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?
From the function's doc :
If this is True then all subsequent layers in the model need to
support masking
Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.
Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.
You can find here a way to build your lambda layers that support masking
I hope it helps.

Random results from pre-trained InceptionV3 CNN

I'm trying to create an InceptionV3 CNN which has previously been trained on Imagenet. While the creation and the loading of the checkpoint seems to be working correctly, the result seems to be random, as everytime I run the script, I get a different result, even though I don't change anything. The network is recreated from scratch, the same unchanged network is loaded and the same image is classified (which to my understanding should still lead to the same result, even if it can't decide what the image actually is).
I just noticed that even if I try to classify the same image multiple times within the same execution of the script, I end up with a random result.
I create the CNN using like this
from tensorflow.contrib.slim.nets import inception as nn_architecture
from tensorflow.contrib import slim
with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
normalizer_params={'updates_collections': None}): ## this is a fix for an issue where the model doesn't fit the checkpoint https://github.com/tensorflow/models/issues/2977
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
True, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope
afterwards I load the imagenet trained checkpoint like this
saver = tf.train.Saver()
saver.restore(sess, CHECKPOINT_PATH)
then I verify that it is workingby classifying this image
which I squish from it's original resolution to 299x299 which is required as input for the network
from skimage import io
car = io.imread("data/car.jpg")
car_scaled = zoom(car, [299 / car.shape[0], 299 / car.shape[1], 1])
car_cnnable = np.array([car_scaled])
Then I try to classify the image and print which class the image belongs to most likely and with what likelihood.
predictions = sess.run(logits, feed_dict={images: car_cnnable})
predictions = np.squeeze(predictions) #shape (1, 1001) to shape (1001)
print(np.argmax(predictions))
print(predictions[np.argmax(predictions)])
The class is (or seems to be) random and the likelihood varies as well.
My last few executions were:
Class - likelihood
899 - 0.98858
660 - 0.887204
734 - 0.904047
675 - 0.886952
Here is my full code: https://gist.github.com/Syzygy2048/ddb8602652b547a71316ee0febfddbef
Since I set isTraining to true, it applied the dropout rate every time the network was used. I was under the impression that this only happened during back propagation.
To get it to work correctly, the code should be
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
False, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope

Resources