Random results from pre-trained InceptionV3 CNN - python-3.x

I'm trying to create an InceptionV3 CNN which has previously been trained on Imagenet. While the creation and the loading of the checkpoint seems to be working correctly, the result seems to be random, as everytime I run the script, I get a different result, even though I don't change anything. The network is recreated from scratch, the same unchanged network is loaded and the same image is classified (which to my understanding should still lead to the same result, even if it can't decide what the image actually is).
I just noticed that even if I try to classify the same image multiple times within the same execution of the script, I end up with a random result.
I create the CNN using like this
from tensorflow.contrib.slim.nets import inception as nn_architecture
from tensorflow.contrib import slim
with slim.arg_scope([slim.conv2d, slim.fully_connected], normalizer_fn=slim.batch_norm,
normalizer_params={'updates_collections': None}): ## this is a fix for an issue where the model doesn't fit the checkpoint https://github.com/tensorflow/models/issues/2977
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
True, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope
afterwards I load the imagenet trained checkpoint like this
saver = tf.train.Saver()
saver.restore(sess, CHECKPOINT_PATH)
then I verify that it is workingby classifying this image
which I squish from it's original resolution to 299x299 which is required as input for the network
from skimage import io
car = io.imread("data/car.jpg")
car_scaled = zoom(car, [299 / car.shape[0], 299 / car.shape[1], 1])
car_cnnable = np.array([car_scaled])
Then I try to classify the image and print which class the image belongs to most likely and with what likelihood.
predictions = sess.run(logits, feed_dict={images: car_cnnable})
predictions = np.squeeze(predictions) #shape (1, 1001) to shape (1001)
print(np.argmax(predictions))
print(predictions[np.argmax(predictions)])
The class is (or seems to be) random and the likelihood varies as well.
My last few executions were:
Class - likelihood
899 - 0.98858
660 - 0.887204
734 - 0.904047
675 - 0.886952
Here is my full code: https://gist.github.com/Syzygy2048/ddb8602652b547a71316ee0febfddbef

Since I set isTraining to true, it applied the dropout rate every time the network was used. I was under the impression that this only happened during back propagation.
To get it to work correctly, the code should be
logits, endpoints = nn_architecture.inception_v3(input, # input
1001, #NUM_CLASSES, #num classes
# num classes #maybe set to 0 or none to ommit logit layer and return input for logit layer instead.
False, # is training (dropout = zero if false for eval
0.8, # dropout keep rate
16, # min depth
1.0, # depth multiplayer
layers_lib.softmax, # prediction function
True, # spatial squeeze
tf.AUTO_REUSE,
# reuse, use get variable to get variables directly... probably
'InceptionV3') # scope

Related

Truncated backpropagation in PyTorch (code check)

I am trying to implement truncated backpropagation through time in PyTorch, for the simple case where K1=K2. I have an implementation below that produces reasonable output, but I just want to make sure it is correct. When I look online for PyTorch examples of TBTT, they do inconsistent things around detaching the hidden state and zeroing out the gradient, and the ordering of these operations. Please let me know if I have made a mistake.
In the code below, H maintains the current hidden state, and model(weights, H, x) outputs the prediction and the new hidden state.
while i < NUM_STEPS:
# Grab x, y for ith datapoint
x = data[i]
target = true_output[i]
# Run model
output, new_hidden = model(weights, H, x)
H = new_hidden
# Update running error
error += (output - target)**2
if (i+1) % K == 0:
# Backpropagate
error.backward()
opt.step()
opt.zero_grad()
error = 0
H = H.detach()
i += 1
So the idea of your code is to isolate the last variables after each Kth step. Yes, your implementation is absolutely correct and this answer confirms that.
# truncated to the last K timesteps
while i < NUM_STEPS:
out = model(out)
if (i+1) % K == 0:
out.backward()
out.detach()
out.backward()
You can also follow this example for your reference.
import torch
from ignite.engine import Engine, EventEnum, _prepare_batch
from ignite.utils import apply_to_tensor
class Tbptt_Events(EventEnum):
"""Aditional tbptt events.
Additional events for truncated backpropagation throught time dedicated
trainer.
"""
TIME_ITERATION_STARTED = "time_iteration_started"
TIME_ITERATION_COMPLETED = "time_iteration_completed"
def _detach_hidden(hidden):
"""Cut backpropagation graph.
Auxillary function to cut the backpropagation graph by detaching the hidden
vector.
"""
return apply_to_tensor(hidden, torch.Tensor.detach)
def create_supervised_tbptt_trainer(
model, optimizer, loss_fn, tbtt_step, dim=0, device=None, non_blocking=False, prepare_batch=_prepare_batch
):
"""Create a trainer for truncated backprop through time supervised models.
Training recurrent model on long sequences is computationally intensive as
it requires to process the whole sequence before getting a gradient.
However, when the training loss is computed over many outputs
(`X to many <https://karpathy.github.io/2015/05/21/rnn-effectiveness/>`_),
there is an opportunity to compute a gradient over a subsequence. This is
known as
`truncated backpropagation through time <https://machinelearningmastery.com/
gentle-introduction-backpropagation-time/>`_.
This supervised trainer apply gradient optimization step every `tbtt_step`
time steps of the sequence, while backpropagating through the same
`tbtt_step` time steps.
Args:
model (`torch.nn.Module`): the model to train.
optimizer (`torch.optim.Optimizer`): the optimizer to use.
loss_fn (torch.nn loss function): the loss function to use.
tbtt_step (int): the length of time chunks (last one may be smaller).
dim (int): axis representing the time dimension.
device (str, optional): device type specification (default: None).
Applies to batches.
non_blocking (bool, optional): if True and this copy is between CPU and GPU,
the copy may occur asynchronously with respect to the host. For other cases,
this argument has no effect.
prepare_batch (callable, optional): function that receives `batch`, `device`,
`non_blocking` and outputs tuple of tensors `(batch_x, batch_y)`.
.. warning::
The internal use of `device` has changed.
`device` will now *only* be used to move the input data to the correct device.
The `model` should be moved by the user before creating an optimizer.
For more information see:
* `PyTorch Documentation <https://pytorch.org/docs/stable/optim.html#constructing-it>`_
* `PyTorch's Explanation <https://github.com/pytorch/pytorch/issues/7844#issuecomment-503713840>`_
Returns:
Engine: a trainer engine with supervised update function.
"""
def _update(engine, batch):
loss_list = []
hidden = None
x, y = batch
for batch_t in zip(x.split(tbtt_step, dim=dim), y.split(tbtt_step, dim=dim)):
x_t, y_t = prepare_batch(batch_t, device=device, non_blocking=non_blocking)
# Fire event for start of iteration
engine.fire_event(Tbptt_Events.TIME_ITERATION_STARTED)
# Forward, backward and
model.train()
optimizer.zero_grad()
if hidden is None:
y_pred_t, hidden = model(x_t)
else:
hidden = _detach_hidden(hidden)
y_pred_t, hidden = model(x_t, hidden)
loss_t = loss_fn(y_pred_t, y_t)
loss_t.backward()
optimizer.step()
# Setting state of engine for consistent behaviour
engine.state.output = loss_t.item()
loss_list.append(loss_t.item())
# Fire event for end of iteration
engine.fire_event(Tbptt_Events.TIME_ITERATION_COMPLETED)
# return average loss over the time splits
return sum(loss_list) / len(loss_list)
engine = Engine(_update)
engine.register_events(*Tbptt_Events)
return engine

compute Gradients fro GradCam in tf 2.0

I have updated my tensorflow in Python from 1.14 to 2.0 . Now I have a problem with gradient computing, in order to see the GradCam visualisation for a layer.
For example with a model named my_cnn_model, that is already fitted on data, for a classification problem with three classes. If I want to "compute the gradCam" for a given layer named "conv2d_3" for example, I would start with the following in 1.14 :
layer_conv = my_cnn_model.get_layer( "conv2d_3" )
#I want it with respect to the first class (so 0), because for example it might have been the model prediction for that image, so I check the proba for that class :
final_layer = my_cnn_model.output[:, 0]
#Then I computed the gradients like that :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
print(grads)
The last statement (print) would say (the shape is specific for the cnn I used but nevermind):
Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32)
Now, when I use tf 2.0 : the grads computing part, so :
grads = keras.backend.gradients( final_layer, layer_conv.output )[0]
is not working any more, with the error :
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
I already searched, and found things like
with tf.GradientTape() as tape:
...
But all the same I get errors, or I couldn't get the same output Tensor("gradients/max_pooling2d/MaxPool_grad/MaxPoolGrad:0", shape=(?, 76, 76, 64), dtype=float32), so the rest of my gradcam function does not work.
How could I compute the grads, which of course would be similar to my 1.14 tf env? Do I miss something trivial?
Edit : I used the functionnal API, with my own CNN, or with "Transfer Learning" model already here in tf.keras, with modified/added layers at the top.
Thanks for any help.
If you are not interested in eager mode, like using old code all around, you can simply disable eager execution.
As mentioned here:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
If, on the other hand, you want to keep eager mode on, of if another thing is troubling your code, you can instead:
#you need a persistent tape if you're calling many gradients instead of just one
with tf.GradientTape(persistent = True) as tape:
#must "watch" all variables that are not "trainable weights"
#if you are using them for gradients
tape.watch(layer_conv.output)
#if the input data should be watched (you're getting the gradients related to the inputs)
input_tensor = tf.constant(input_data)
tape.watch(input_tensor)
#must do the entire prediction inside this tape block.
#it would be better if you could make your model output all tensors of interest
#not sure if you can do "some_layer.output" in eager mode for this purpose
model_outputs = model(input_tensor)
#finally, outside the block you can get the gradients
g1 = tape.gradient(model_outputs, layer_conv.output)
#again, maybe you need this layer output to be "actually output"
#instead of gotten from the layer like this
g2 = tape.gradient(some_output, input_tensor)
g3...
g4...
#finally delete the persistent tape
del tape

Compute gradient between a scalar and vector in PyTorch

I am trying to replicate code which was written using Theano, to PyTorch. In the code, the author computes the gradient using
import theano.tensor as T
gparams = T.grad(cost, params)
and the shape of gparams is (256, 240)
I have tried using backward() but it doesn't seem to return anything. Is there an equivalent to grad within PyTorch?
Assume this is my input,
import torch
from torch.autograd import Variable
cost = torch.tensor(1.6019)
params = Variable(torch.rand(1, 73, 240))
cost needs to be a result of an operation involving params. You can't compute a gradient just knowing the values of two tensors. You need to know the relationship as well. This is why pytorch builds a computation graph when you perform tensor operations. For example, say the relationship is
cost = torch.sum(params)
then we would expect the gradient of cost with respect to params to be a vector of ones regardless of the value of params.
That could be computed as follows. Notice that you need to add the requires_grad flag to indicate to pytorch that you want backward to update the gradient when called.
# Initialize independent variable. Make sure to set requires_grad=true.
params = torch.tensor((1, 73, 240), requires_grad=True)
# Compute cost, this implicitly builds a computation graph which records
# how cost was computed with respect to params.
cost = torch.sum(params)
# Zero the gradient of params in case it already has something in it.
# This step is optional in this example but good to do in practice to
# ensure you're not adding gradients to existing gradients.
if params.grad is not None:
params.grad.zero_()
# Perform back propagation. This is where the gradient is actually
# computed. It also resets the computation graph.
cost.backward()
# The gradient of params w.r.t to cost is now stored in params.grad.
print(params.grad)
Result:
tensor([1., 1., 1.])

How to get Conv2D layer filter weights

How do I get the weights of all filters (like 32 ,64, etc.) of a Conv2D layer in Keras after each epoch? I mention that, because initial weights are random but after optimization they will change.
I checked this answer but did not understand. Please help me find a solution of getting the weights of all the filter and after every epoch.
And one more question is that in Keras documentation for the Conv2D layer input shape is (samples, channels, rows, cols). What exactly does samples mean? Is it the total number of inputs we have (like in MNIST data set it is 60.000 training images) or the batch size (like 128 or other)?
Samples = batch size = number of images in a batch
Keras will often use None for this dimension, meaning it can vary and you don't have to set it.
Although this dimension actually exists, when you create a layer, you pass input_shape without it:
Conv2D(64,(3,3), input_shape=(channels,rows,cols))
#the standard it (rows,cols,channels), depending on your data_format
To have actions done after each epoch (or batch), you can use a LambdaCallback, passing the on_epoch_end function:
#the function to call back
def get_weights(epoch,logs):
wsAndBs = model.layers[indexOfTheConvLayer].get_weights()
#or model.get_layer("layerName").get_weights()
weights = wsAndBs[0]
biases = wsAndBs[1]
#do what you need to do with them
#you can see the epoch and the logs too:
print("end of epoch: " + str(epoch)) for instance
#the callback
from keras.callbacks import LambdaCallback
myCallback = LambdaCallback(on_epoch_end=get_weights)
Pass this callback to the training function:
model.fit(...,...,... , callbacks=[myCallback])

Reporting accuracy and loss issues with MonitoredTrainingSession

I am performing transfer learning on InceptionV3 for a dataset of 5 types of flowers. All layers are frozen except the output layer. My implementation is heavily based off of the Cifar10 tutorial from Tensorflow and the input dataset is formated in the same way as Cifar10.
I have added a MonitoredTrainingSession (like in the tutorial) to report the accuracy and loss after a certain number of steps. Below is the section of the code for the MonitoredTrainingSession (almost identical to the tutorial):
class _LoggerHook(tf.train.SessionRunHook):
def begin(self):
self._step = -1
self._start_time = time.time()
def before_run(self,run_context):
self._step+=1
return tf.train.SessionRunArgs([loss,accuracy])
def after_run(self,run_context,run_values):
if self._step % LOG_FREQUENCY ==0:
current_time = time.time()
duration = current_time - self._start_time
self._start_time = current_time
loss_value = run_values.results[0]
acc = run_values.results[1]
examples_per_sec = LOG_FREQUENCY/duration
sec_per_batch = duration / LOG_FREQUENCY
format_str = ('%s: step %d, loss = %.2f, acc = %.2f (%.1f examples/sec; %.3f sec/batch)')
print(format_str %(datetime.now(),self._step,loss_value,acc,
examples_per_sec,sec_per_batch))
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
if MODE == 'train':
file_writer = tf.summary.FileWriter(LOGDIR,tf.get_default_graph())
with tf.train.MonitoredTrainingSession(
save_checkpoint_secs=70,
checkpoint_dir=LOGDIR,
hooks=[tf.train.StopAtStepHook(last_step=NUM_EPOCHS*NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN),
tf.train.NanTensorHook(loss),
_LoggerHook()],
config=config) as mon_sess:
original_saver.restore(mon_sess,INCEPTION_V3_CHECKPOINT)
print("Proceeding to training stage")
while not mon_sess.should_stop():
mon_sess.run(train_op,feed_dict={training:True})
print('acc: %f' %mon_sess.run(accuracy,feed_dict={training:False}))
print('loss: %f' %mon_sess.run(loss,feed_dict={training:False}))
When the two lines printing the accuracy and loss under mon_sess.run(train_op... are removed, the loss and accuracy printed from after_run, after it trains for surprisingly only 20 min, report that the model is performing very well on the training set and the loss is decreasing. Even the moving average loss was reporting great results. It eventually approaches greater than 90% accuracy for multiple random batches.
After, the training session was reporting high accuracy for a while,I stopped the training session, restored the model, and ran it on random batches from the same training set. It performed poorly, only achieving between 50% and 85% accuracy. I confirmed it was restored properly because it did perform better than a model with an untrained output layer.
I then went back to training again from the last checkpoint. The accuracy was initially low but after about 10 mini batch runs the accuracy went back above 90%. I then repeated the process but this time added the two lines for evaluating the loss and accuracy after the training operation. Those two evaluations reported that the model was having issues converging and performing poorly. While the evaluations via before_run and after_run, now only occasionally showed high accuracy and low loss (the results jumped around). But still after_run sometimes reported 100% accuracy (the fact that it is no longer consistent I think is because after_run is getting called also for mon_sess.run(accuracy...) and mon_sess.run(loss...)).
Why would the results reported from MonitoredTrainingSession be indicating the model is performing well when it really isn't? Aren't the two operations in SessionRunArgs being fed with the same mini batch as train_op, indicating model performance on the batch before gradient update?
Here is the code I used for restoring and testing the model(based of the cifar10 tutorial):
elif MODE == 'test':
init = tf.global_variables_initializer()
ckpt = tf.train.get_checkpoint_state(LOGDIR)
if ckpt and ckpt.model_checkpoint_path:
with tf.Session(config=config) as sess:
init.run()
saver = tf.train.Saver()
print(ckpt.model_checkpoint_path)
saver.restore(sess,ckpt.model_checkpoint_path)
global_step = tf.contrib.framework.get_or_create_global_step()
coord = tf.train.Coordinator()
threads =[]
try:
for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,start=True))
print('model restored')
i =0
num_iter = 4*NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/BATCH_SIZE
print(num_iter)
while not coord.should_stop() and i < num_iter:
print("loss: %.2f," %loss.eval(feed_dict={training:False}),end="")
print("acc: %.2f" %accuracy.eval(feed_dict={training:False}))
i+=1
except Exception as e:
print(e)
coord.request_stop(e)
coord.request_stop()
coord.join(threads,stop_grace_period_secs=10)
Update :
So I was able to fix the issue. However, i am not sure why it worked. In the arg_scope for the inception model i was passing in an is_training Boolean placeholder for Batch Norm and dropout used by inception. However, when I removed the placeholder and just set the is_training keyword to true, the accuracy on the training set when the model was restored was extremely high. This was the same model checkpoint that previously performed poorly. When i trained it i always had the is_training placeholder set to true. Having the is_training set to true while testing would mean batch Norm is now using th sample mean and variance.
Why would telling Batch Norm to now use the sample average and sample standard deviation like it does during training increase the accuracy?
This would also mean that the dropout layer is dropping units and that the model's accuracy during testing on both the training set and test set is higher with the dropout layer enabled.
Update 2
I went through the tensorflow slim inceptionv3 model code that the arg_scope in the code above is referencing. I removed the final dropout layer after the Avg pool 8x8 and the accuracy remained at around 99%. However, when I set is_training to False only for the batch norm layers, the accuracy dropped back to around 70%. Here is the arg_scope from slim\nets\inception_v3.py and my modification.
with variable_scope.variable_scope(
scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope:
with arg_scope(
[layers_lib.batch_norm],is_training=False): #layers_lib.dropout], is_training=is_training):
net, end_points = inception_v3_base(
inputs,
scope=scope,
min_depth=min_depth,
depth_multiplier=depth_multiplier)
I tried this with both the dropout layer removed and the dropout layer kept with passing in is_training=True to the dropout layer.
(Summarizing from dylan7's debugging in the question's comments)
Batch norm relies on variables to save the summary statistics it normalizes with. These are only updated when is_training is True through an UPDATE_OPS collection (see the batch_norm documentation). If these update ops don't get run (or the variables are overwritten), there may be transient "reasonable" statistics based on each batch which get lost when is_training is False (testing data is not, and should not be, used to inform batch_norm summary statistics).

Resources