Is there any way to speed up the predicting process for tensorflow lattice? - python-3.x

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?

There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)

As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Related

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

Why is accuracy higher with Caffe than with tf.keras?

I converted a model from tf.keras to caffe. When I evaluate the model with Caffe on the test set, I find that the accuracy is higher with caffe than with tf.keras. I can't think of a way to get a hand on the source of the problem (if there's a problem in the first place...)
Is this difference due to the lower-level libraries used for accelerating the computations (I am thinking of cudnn and the caffe engine)? Is there a well-known accuracy problem with the keras module of tensorflow?
By the way, there are other people that have a similar issue:
https://github.com/keras-team/keras/issues/4444
This can happen.
Once you convert your keras .h5 model to .caffemodel, the weights are numerically copied. But, internally you'll load your model to Caffe and not Keras.
As, caffe and keras are two different libraries, their internal algorithms can vary slightly. Also if you change your pre-processing scheme that can change the result too. Usually, if you use pruning (to optimize the size) the performance can go low, in the weird case this can be thought of as an extreme regularization and act as a performance booster in test.

Running keras LSTM models with multiple GPU (Tensorflow backend)

I have a keras LSTM model and want to run it under multiple GPUs for speed improvement. But I have some ambiguities:
1- I found that to really get the great speed on GPU I should define my network using CuDNNLSTM layer and not normal LSTM layer. To use multiple GPUs, I looked at Keras documentation and wanted to use multi_gpu_model() function to make distributed model. However, in the sample scripts they recommend to define the model on CPU for easy weight sharing, but my CuDNNLSTM model is not deployable on CPU and LSTM model will not benefit from the enhancements provided by GPU. What is the correct approach?
2- So I tried many configurations, including:
Group 1(using the normal (non-fast) LSTM layers): placing model on CPU and no copying to GPU; placing model on CPU and then use multi_gpu_model to create GPU copies; place model on default GPU and no copying to other GPU; placing model on default GPU and then use multi_gpu_model to create two GPU copies.
Group2 (using CuDNNLSTM layer and therefore no possibility to place model on CPU): defining a single model (which Tensorflow places it on the default GPU); using multi_gpu_model to create two GPU copies.
In all cases, data parallelism (using multi_gpu_model) resulted in lower speed of execution. I didn't change anything else in my code and input data pipeline or batch sizes. What is wrong with me?
3- In general, should I only use CuDNN-type layers to get high speed computation with GPUs when I am programming at high level of keras API?

Extract CNN features using Caffe and train using SVM

I want to extract features using caffe and train those features using SVM. I have gone through this link: http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html. This links provides how we can extract features using caffenet. But I want to use Lenet architecture here. I am unable to change this line of command for Lenet:
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10 leveldb
And also, after extracting the features, how to train these features using SVM? I want to use python for this. For eg: If I get features from this code:
features = net.blobs['pool2'].data.copy()
Then, how can I train these features using SVM by defining my own classes?
You have two questions here:
Extracting features using LeNet
Training an SVM
Extracting features using LeNet
To extract the features from LeNet using the extract_features.bin script you need to have the model file (.caffemodel) and the model definition for testing (.prototxt).
The signature of extract_features.bin is here:
Usage: extract_features pretrained_net_param feature_extraction_proto_file extract_feature_blob_name1[,name2,...] save_feature_dataset_name1[,name2,...] num_mini_batches db_type [CPU/GPU] [DEVICE_ID=0]
So if you take as an example val prototxt file this one (https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt), you can change it to the LeNet architecture and point it to your LMDB / LevelDB. That should get you most of the way there. Once you did that and get stuck, you can re-update your question or post a comment here so we can help.
Training SVM on top of features
I highly recommend using Python's scikit-learn for training an SVM from the features. It is super easy to get started, including reading in features saved from Caffe's format.
Very lagged reply, but should help.
Not 100% what you want, but I have used the VGG-16 net to extract face features using caffe and perform a accuracy test on a small subset of the LFW dataset. Exactly what you needed is in the code. The code creates classes for training and testing and pushes them into the SVM for classification.
https://github.com/wajihullahbaig/VGGFaceMatching

Caffe vs Theano MNIST example

I'm trying to learn (and compare) different deep learning frameworks, by the time they are Caffe and Theano.
http://caffe.berkeleyvision.org/gathered/examples/mnist.html
and
http://deeplearning.net/tutorial/lenet.html
I follow the tutorial to run those frameworks on MNIST dataset. However, I notice a quite difference in term of accuracy and performance.
For Caffe, it's extremely fast for the accuracy to build up to ~97%. In fact, it only takes 5 mins to finish the program (using GPU) which the final accuracy on test set of over 99%. How impressive!
However, on Theano, it is much poorer. It took me more than 46 minutes (using same GPU), just to achieve 92% test performance.
I'm confused as it should not have so much difference between the frameworks running relatively same architectures on same dataset.
So my question is. Is the accuracy number reported by Caffe is the percentage of correct prediction on test set? If so, is there any explanation for the discrepancy?
Thanks.
The examples for Theano and Caffe are not exactly the same network. Two key differences which I can think of are that the Theano example uses sigmoid/tanh activation functions, while the Caffe tutorial uses the ReLU activation function, and that the Theano code uses normal minibatch gradient descent while Caffe uses a momentum optimiser. Both differences will significantly affect the training time of your network. And using the ReLU unit will likely also affect the accuracy.
Note that Caffe is a deep learning framework which already has ready-to-use functions for many commonly used things like the momentum optimiser. Theano, on the other hand, is a symbolic maths library which can be used to build neural networks. However, it is not a deep learning framework.
The Theano tutorial you mentioned is an excellent resource to understand how exactly convolutional and other neural networks work on a basic level. However, it will be cumbersome to implement all the state-of-the-art tweaks. If you want to get state-of-the-art results quickly you are better off using one of the existing deep learning frameworks. Apart from Caffe, there are a number of frameworks based on Theano. I know of keras, blocks, pylearn2, and my personal favourite lasagne.

Resources