retraining last layer of inception-v3 significantly slowers the classification - python-3.x

In an attempt for transfer learning over inception-v3 with TF and PY3.5, I've tested two approaches:
1- retraining the last layer, as shown here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/image_retraining
2- Apply linear SVM on top of inception-V3 bottlenecks as demonstrated here: https://www.kernix.com/blog/image-classification-with-a-pre-trained-deep-neural-network_p11
Expectedly, they should've had a similar runtime for classification phase, since the critical part - the bottlenecks extraction - is identical. In practice though, the retrained network is about 8X slower when running classification.
My questions is whether anyone has an idea for the reason of this.
Some code snippets:
SVM on top (the faster):
def getTensors():
graph_def = tf.GraphDef()
f = open('classify_image_graph_def.pb', 'rb')
graph_def.ParseFromString(f.read())
tensorBottleneck, tensorsResizedImage = tf.import_graph_def(graph_def, name='', return_elements=['pool_3/_reshape:0', 'Mul:0'])
return tensorBottleneck, tensorsResizedImage
def calc_bottlenecks(imgFile, tensorBottleneck, tensorsResizedImage):
""" - read, decode and resize to get <resizedImage> - """
bottleneckValues = sess.run(tensorBottleneck, {tensorsResizedImage : resizedImage})
return np.squeeze(bottleneckValues)
This takes about 0.5 sec on my (Windows) laptop while the SVM part takes no time.
Retraining last layer - (this is harder to summarize since longer code)
def loadGraph(pbFile):
with tf.gfile.FastGFile(pbFile, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
softmaxTensor = sess.graph.get_tensor_by_name('final_result:0')
def labelImage(imageFile, softmaxTensor):
with tf.Session() as sess:
input_layer_name = 'DecodeJpeg/contents:0'
predictions, = sess.run(softmax_tensor, {input_layer_name: image_data})
'pbFile' is the file saved be the retrainer, which supposed to have identical topology and weights excluding the classification layer, as 'classify_image_graph_def.pb'. This takes about 4sec to run (on my same laptop, without the loading).
Any idea for the performance gap?
Thanks!

Solved. The problem was in creating a new tf.Session() for every image. Storing the session when reading graph and using it made runtime back to expected.
def loadGraph(pbFile):
...
with tf.Session() as sess:
softmaxTensor = sess.graph.get_tensor_by_name('final_result:0')
sessToStore = sess
return softmaxTensor, sessToStore
def labelImage(imageFile, softmaxTensor, sessToStore):
input_layer_name = 'DecodeJpeg/contents:0'
predictions, = sessToStore.run(softmax_tensor, {input_layer_name: image_data})

Related

PyTorch: Speed up data loading

I am using densenet121 to do cat/dog detection from Kaggle dataset. I enabled cuda and it appears that training is very fast. However, the data loading (or perhaps processing) appears to be very slow. Are there some ways to speed it up? I tried to play witch batch size, that didn't provide much help. I also changed num_workers from 0 to some positive numbers. Going from 0 to 2 reduces loading time by perhaps 1/3, increasing by more doesn't have additional effect. Are there some other ways I can speed loading things up?
This is my rough code (I am focused on learning, so it's not very organized):
import matplotlib.pyplot as plt
import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
data_dir = 'Cat_Dog_data'
train_transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5],
[0.5, 0.5, 0.5])])
test_transforms = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train',
transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)
trainloader = torch.utils.data.DataLoader(train_data, batch_size=64,
num_workers=16, shuffle=True,
pin_memory=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64,
num_workers=16)
model = models.densenet121(pretrained=True)
# Freeze parameters so we don't backprop through them
for param in model.parameters():
param.requires_grad = False
from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
model.cuda()
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)
epochs = 30
steps = 0
import time
device = torch.device('cuda:0')
train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0
count = 0
total_start = time.time()
for images, labels in trainloader:
start = time.time()
images = images.cuda()
labels = labels.cuda()
optimizer.zero_grad()
log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()
elapsed = time.time() - start
if count % 20 == 0:
print("Optimized elapsed: ", elapsed, "count:", count)
print("Total elapsed ", time.time() - total_start)
total_start = time.time()
count += 1
running_loss += loss.item()
else:
test_loss = 0
accuracy = 0
for images, labels in testloader:
images = images.cuda()
labels = labels.cuda()
with torch.no_grad():
model.eval()
log_ps = model(images)
test_loss += criterion(log_ps, labels)
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
compare = top_class == labels.view(*top_class.shape)
accuracy += compare.type(torch.FloatTensor).mean()
model.train()
train_losses.append(running_loss / len(trainloader))
test_losses.append(test_loss / len(testloader))
print("Epoch: {}/{}.. ".format(e + 1, epochs),
"Training Loss: {:.3f}.. ".format(
running_loss / len(trainloader)),
"Test Loss: {:.3f}.. ".format(test_loss / len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy / len(testloader)))
torchvision 0.8.0 version or greater
Actually torchvision now supports batches and GPU when it comes to transformations (this is done on torch.Tensors instead of PIL images), so one should use it as an initial improvement.
See here for more info about this release. Also those act as torch.nn.Module, hence can be used inside a model, for example:
transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
T.ConvertImageDtype(torch.float),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
Furthermore, those operations could be JITed possibly improving the performance even further.
torchvision < 0.8.0 (original answer)
Increasing batch_size won't help as torchvision performs transform on single image while it's loaded from your disk.
There are a couple of ways one could speed up data loading with increasing level of difficulty:
Improve image loading times
Load & normalize images and cache in RAM (or on disk)
Produce transformations and save them to disk
Apply non-cache'able transforms (rotations, flips, crops) in batched manner
Prefetching
1. Improve image loading
Easy improvements can be gained by installing Pillow-SIMD instead of original pillow. It is a drop-in replacement and could be faster (or so is claimed at least for Resize which you are using).
Alternatively, you could create your own data loading and processing with OpenCV as some say it's faster or check albumentations (though can't tell you whether those will improve the performance and might be a lot of time wasted for no gain except learning experience).
2. Load & normalize images & cache
You can use Python's LRU Cache functionality to cache some outputs.
You can also use torchdata which acts almost exactly like PyTorch's torch.utils.data.Dataset but allows caching to disk or in RAM (or mixed modes) with simple cache() on torchdata.Dataset (see github repository, disclaimer: i'm the author).
Remember: you have to load and normalize images, cache and after that use RandomRotation, RandomResizedCrop and RandomHorizontalFlip (as those change each time they are run).
3. Produce transformations and save them to disk
You would have to perform a lot of transformations on images, save them to disk and use this enhanced dataset afterwards. Once again that could be done with torchdata but it's really wasteful when it comes to I/O and hard drive and very inelegant solution. Furthermore it's "static" so the data would only last your for X epochs, it wouldn't be "infinite" generator with augmentations.
4. Batched transformations
torchvision does not support it so you would have to write those functions on your own. See this issue for justification. AFAIK no other 3rd party provides it either. For large batches it should speed up things but implementation is open question I think (correct me if I'm wrong).
5. Prefetch
IMO would be hardest to implement (though a really good idea for the project come to think about it). Basically you load data for the next iteration when your model trains. torch.utils.data.DataLoader does provide it, though there are some concerns (like workers pausing after their data got loaded). You can read PyTorch thread about it (not sure about it as I didn't verify on my own). Also, a lot of valuable insight provided by this comment and this blog post (though not sure how up to date those are).
All in all to substantially improve data loading you would need to get your hands quite dirty (or maybe there are libraries doing this some of those for PyTorch, if so,I would love to know about them).
Also remember to profile your changes, see torch.nn.bottleneck
EDIT: DALI project might be worth checking out, though AFAIK it has some problems with RAM memory growing linearly with number of epochs.

Restore best checkpoint to an estimator tensorflow 2.x

Briefly, I put in place a data input pipline using tensorflow Dataset API. Then, I implemented a CNN model for classification using keras, which i converted to an estimator. I feeded my estimator Train and Eval Specs with my input_fn providing input data for training and evaluation. And as final step I launched the model training with tf.estimator.train_and_evaluate
def my_input_fn(tfrecords_path):
dataset = (...)
return batch_fbanks, batch_labels
def build_model():
model = tf.keras.models.Sequential()
model.add(...)
model.compile(...)
return model
model = build_model()
run_config=tf.estimator.RunConfig(model_dir,save_summary_steps=100,save_checkpoints_steps=1000)
estimator = tf.keras.estimator.model_to_estimator(model,config=run_config)
def serving_input_receiver_fn():
inputs = {'Conv1_input': tf.compat.v1.placeholder(shape=[None, 11,120,1], dtype=tf.float32)}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
exporter = tf.estimator.BestExporter(serving_input_receiver_fn, name="best_exporter", exports_to_keep=5)
train_spec_dnn = tf.estimator.TrainSpec(input_fn = lambda: my_input_fn(train_data_path),hooks=[hook])
eval_spec_dnn = tf.estimator.EvalSpec(input_fn = lambda: my_eval_input_fn(eval_data_path),exporters=exporter,start_delay_secs=0,throttle_secs=15)
tf.estimator.train_and_evaluate(estimator, train_spec_dnn, eval_spec_dnn)
I save the 5 best checkpoints using the tf.estimator.BestExporter as shown above. Once i finished training, i want to reload the best model and convert it to an estimator to re-evaluate the model and predict on new dataset. However my issue is in restoring the checkpoint to an estimator. I tried several solutions but each time i don't get the estimator object I need to run its evaluate and predict methods.
Just to specify more, each of the best checkpoints directory is organised as follow:
./
variables/
variables.data-00000-of-00002
variables.data-00001-of-00002
variables.index
saved_model.pb
So the question is how can I get an estimator object from the best checkpoint so that i can use it to evaluate my model and predict on new data?
Note : I found some proposed solutions relying on TensorFlow v1 features which can not solve my problem because i work with TF v2.
Thanks a lot, any help is appreciated.
You can use the class below created from tf.estimator.BestExporter
What it does is, except for saving the best model (.pb files and etc) it will also save
the best-exported model checkpoint on a different folder.
Below is the class:
import shutil, glob, os
# import tensorflow.logging as logging
## the path where all the checkpoint reside
BEST_CHECKPOINTS_PATH_FROM = 'PATH TO ALL CHECKPOINT FILES'
## the path it will save the best exporter checkpoint files
BEST_CHECKPOINTS_PATH_TO = 'PATH TO BEST EXPORTER CHECKPOINT FILES TO BE SAVE'
class BestCheckpointsExporter(tf.estimator.BestExporter):
def export(self, estimator, export_path, checkpoint_path, eval_result,is_the_final_export):
if self._best_eval_result is None or \
self._compare_fn(self._best_eval_result, eval_result):
#print('Exporting a better model ({} instead of {})...'.format(eval_result, self._best_eval_result))
for name in glob.glob(checkpoint_path + '.*'):
print(name)
print(os.path.join(BEST_CHECKPOINTS_PATH_TO, os.path.basename(name)))
shutil.copy(name, os.path.join(BEST_CHECKPOINTS_PATH_TO, os.path.basename(name)))
# also save the text file used by the estimator api to find the best checkpoint
with open(os.path.join(BEST_CHECKPOINTS_PATH_TO, "checkpoint"), 'w') as f:
f.write("model_checkpoint_path: \"{}\"".format(os.path.basename(checkpoint_path)))
self._best_eval_result = eval_result
else:
print('Keeping the current best model ({} instead of {}).'.format(self._best_eval_result, eval_result))
Example Usage of the Class
You will just replace the exporter by calling the class and pass the serving_input_receiver_fn.
def serving_input_receiver_fn():
inputs = {'my_dense_input': tf.compat.v1.placeholder(shape=[None, 4], dtype=tf.float32)}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
exporter = BestCheckpointsExporter(serving_input_receiver_fn=serving_input_receiver_fn)
train_spec_dnn = tf.estimator.TrainSpec(input_fn = input_fn, max_steps=5)
eval_spec_dnn = tf.estimator.EvalSpec(input_fn=input_fn,exporters=exporter,start_delay_secs=0,throttle_secs=15)
(x, y) = tf.estimator.train_and_evaluate(keras_estimator, train_spec_dnn, eval_spec_dnn)
At this point, It will save the best-exported model checkpoint files in the folder you have specified.
For loading the checkpoint files you need to do the following steps:
Step 1: Rebuild your model instance
def build_model():
model = tf.keras.models.Sequential()
model.add(...)
model.compile(...)
return model
model = build_model()
Step 2: use the model load_weights API
Reference URL: https://www.tensorflow.org/tutorials/keras/save_and_load
ck_path = tf.train.latest_checkpoint('PATH TO BEST EXPORTER CHECKPOINT FILES')
model.load_weights(ck_path)
## From there you will be able to call the predict & evaluate the functionality of the trained model
##PREDICT
prediction = model.predict(x)
##EVALUATE
for features_batch, labels_batch in input_fn().take(1):
model.evaluate(features_batch, labels_batch)
Note: All of these have been simulated on google colab.

how to calculate the accuracy on the whole train dataset and val dataset respectively

Hello I'm a newbie about TensorBoard and tf.metrics.accuracy()
(I am a Chinese so maybe my English is not very well, I will try to describe my question)
for convenience, I just write the key codes
Now I have a problem about save train and val accuracy to TensorBoard every epoch, and my data amount is big, so I use batch of data.
What I have finished is:
1) Get Dataset
Now, I use
train_iterator = train_dataset.make_initializable_iterator()
val_iterator = val_dataset.make_initializable_iterator()
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, train_iterator.output_types,
train_iterator.output_shapes)
image_batch, label_batch = iterator.get_next()
to get the dataset and I can switch between two datasets within tf.Session() using
sess.run([train_iterator.initializer, accuracy_vars_initializer])
and
sess.run([val_iterator.initializer, accuracy_vars_initializer])
2) calculate accuracy
with tf.name_scope("meters"):
accuracy, accuracy_op = tf.metrics.accuracy(labels=label_batch,
predictions=tf.argmax(tf.nn.softmax(logits_batch), -1),
name="accuracy")
accuracy_value_ = tf.placeholder(tf.float32, shape=())
accuracy_summary = tf.summary.scalar('accuracy', accuracy_value_)
accuracy_vars = tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES, scope="meters/accuracy")
accuracy_vars_initializer = tf.variables_initializer(var_list=accuracy_vars)
I use
sess.run(accuracy_vars_initializer)
before the train/val process on the whole train/val dataset to set the internal counting value within tf.metrics.accuracy()
then I want use
for epoch_i in range(FLAGS.epoch):
sess.run([train_iterator.initializer, accuracy_vars_initializer])
train_loss_avg, train_acc_avg = [], []
while True:
try:
_, loss_value, step, acc_value, acc_op_value, summary = sess.run(
[train_op, loss, global_step, accuracy, accuracy_op, merged],
feed_dict={handle: train_iterator_handle,
accuracy_value_: np.average(train_acc_avg),
loss_value_: np.average(train_loss_avg)})
train_acc_avg.append(acc_value)
train_loss_avg.append(loss_value)
except tf.errors.OutOfRangeError:
train_writer.add_summary(summary, global_step=step)
saver.save(sess, os.path.join(FLAGS.model_dir, "fcn8.ckpt"), global_step)
print("train dataset finished")
break
sess.run([val_iterator.initializer, accuracy_vars_initializer])
val_loss_avg = []
while True:
try:
loss_value, acc_value, acc_op_value, summary = sess.run(
[loss, accuracy, accuracy_op, merged], feed_dict={handle: val_iterator_handle,
accuracy_value_: acc_op_value,
loss_value_: np.average(val_loss_avg)})
print("Epoch[%d],val batch loss = %g,acc = %g." % (epoch_i, loss_value, acc_value))
val_loss_avg.append(loss_value)
except tf.errors.OutOfRangeError:
val_writer.add_summary(summary, global_step=step)
print("val dataset finished")
break
train_writer.close()
val_writer.close()
to achieve my goal.
The accuracy calculating method I used before is simply
feed_dict={handle: train_iterator_handle,
accuracy_value_: accuracy_op,
loss_value_: np.average(train_loss_avg)})
But both the old and new method will result in a horizontal accuracy line in TensorBoard. And I improved my code many times but the problem still exists
Can anyone help me to find the reason? And is there a better and standardized way to structure my code? Because it's too complicated right now.
Many thanks for any help.

Running Tensorflow model test data after closing session

I have a Convnet I am trying to replicate (not my original code) that was able to run test dataset into the trained model only when I trained and tested in the same sitting. I tweaked only a few lines of the code to make it run test data after said sitting so I am not sure what might be going on. I noticed that "logits_out" was a dataflow edge rather than node in tensorboard, so is it that because edges aren't saved in checkpoints automatically, in conjunction with the fact that it is not saved as a node or in any other form intentionally in original code, that it can't be called after the first sitting closes?
This is the general structure of the training phase:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
with tf.name_scope('1st_pool'):
#first layer
#subsequent layers
with graph.as_default():
#flattening, dropout, optimization, etc...
#some summary.scalar for loss analyses
logits_out = tf.layers.dense(flat, 1) #flat is the flattened array
saved_1 = tf.train.Saver()
trained_event = tf.summary.FileWriter('./CNN/train', graph=graph)
test_event = tf.summary.FileWriter('./CNN/test', graph=graph)
merged = tf.summary.merge_all()
with tf.Session(graph=graph) as sess:
#training and "validating"
sess.run(tf.global_variables_initializer())
#running train summaries
if step = test_round:
#running test summaries
saved_1.save(sess, './CNN/model_1.ckpt')
(EDITED:code pasted incorrectly)
This code ran successfully during the continuous sitting with graph still open:
with tf.Session(graph=graph) as sess:
saved_1.restore(sess, tf.train.latest_checkpoint('./CNN'))
#
pred = sess.run(logits_out, feed_dict={some inputs for placeholders})
#
Only tweaked 2 lines pretty much (shown below) to load meta files in a new graph on the next day but gave the error "name 'logits_out' is not defined" when I try to run in a separate sitting (in fact, other variables I tried to sess.run gave the same error):
with tf.Session(graph=tf.get_default_graph()) as sess:
saved_1 = tf.train.import_meta_graph('./CNN/model_1.ckpt.meta')
saved_1.restore(sess, tf.train.latest_checkpoint('./CNN'))
pred = sess.run(logits_out, feed_dict={some inputs for placeholders})
#
EDITED:I'm thinking it might be because I am missing a scope - or misunderstanding how tensorflow names stuff - after restoring the session/graph the next day, but I can't see how - the only thing that had been named were the pool.
I was able to run data through the model by just creating the graph by running this section of code today:
tf.reset_default_graph()
graph = tf.Graph()
with graph.as_default():
with tf.name_scope('1st_pool'):
#first layer
#subsequent layers
with graph.as_default():
#flattening, dropout, optimization, etc...
#some summary.scalar for loss analyses
logits_out = tf.layers.dense(flat, 1) #flat is the flattened array
saved_1 = tf.train.Saver()
trained_event = tf.summary.FileWriter('./CNN/train', graph=graph)
test_event = tf.summary.FileWriter('./CNN/test', graph=graph)
merged = tf.summary.merge_all()
with tf.Session(graph=graph) as sess:
#training and "validating"
sess.run(tf.global_variables_initializer())
#running train summaries
if step = test_round:
#running test summaries
saved_1.save(sess, './CNN/model_1.ckpt')
and then running
the code without the edited 2 lines:
with tf.Session(graph=graph) as sess:
saved_1.restore(sess, tf.train.latest_checkpoint('./CNN'))
#
pred = sess.run(logits_out, feed_dict={some inputs for placeholders})
#
So the gist of all this entire post on SO was that I did not have to use tf.train.import_meta_graph, but what I don't understand is what is the use of tf.train.import_meta_graph? I thought it imports the graph and it's metadata saved in ".meta" file so I could avoid having to rebuild the graph from the source code?
(note: I will remove this postscript question once I figure it out)

What's the best way to evaluate test error during training?

I have a neural network I'm training with TensorFlow. Actually, at each iteration, I can compute the training cost to pass to the optimizer. A pseudo code of my implementation is:
def defineNetworkStructure(): # layers
...
def feedForward():
...
def defineCost():
...
def defineOptimizer():
opt = ...
def train(train_X, train_Z, ...):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(N):
_, ith_cost = sess.run([opt, cost], feed_dict={X:train_X, Y:train_Y})
print("Cost at {} is {}".format(i, ith_cost))
Now, inside the loop, I'd like to insert something like:
ith_cost = sess.run([opt, cost], feed_dict={X:test_X, Y:test_Y})
Note: test_X and test_Y instead of train_X and train_Y.
However, if I do so, I'll modify the value of the tensorflow variable cost and consequentely (but I'm not sure), I'll influence the optimization process.
What is the best way to achieve this task in tensorflow?
The thing you've missed here is, you shouldn't run opt on test_X and test_Y.
Just doing sess.run(cost, feed_dict={X:test_X, Y:test_Y}) will output your testing-loss and in no-way affects the training or optimization process.

Resources