Porting pre-trained keras models and run them on IPU - keras

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time

To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

Related

Is there any way to speed up the predicting process for tensorflow lattice?

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)
As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

model.to(device) for Pytorch Lighting

I currently train my model using GPUs using Pytorch Lightning
trainer = pl.Trainer( gpus=[0,1],
distributed_backend='ddp',
resume_from_checkpoint=hparams["resume_from_checkpoint"])
trainer.fit(model, train_dataloader=train_loader, val_dataloaders=val_loader)
Instructions are also clear for how to run test samples with trainer defined to use GPU
trainer.test(test_dataloader=test_dataloader)
and also how to load a model and use it interactively
model = transformer.Model.load_from_checkpoint('/checkpoints/run_300_epoch_217.ckpt')
results = model(in_data,
I use the later to interface with an interactive system via sockets in a docker container.
Is there a proper way to make this Pytorch Lightning model run on GPU?
Lightning instructions say not to use model.to(device), but it appears to work just like Pytorch. Reason for instructions to avoid a side effect?
I started reading about ONNX, but would rather just have an easy way to specify GPU since the interactive setup works perfectly with cpu.
My understanding is that "Remove any .cuda() or to.device() calls" is only for using with the Lightning trainer, because the trainer handles that itself.
If you don't use the trainer, a LightningModule module is basically just a regular PyTorch model with some naming conventions. So using model.to(device) is how to run on GPU.

How to save model architecture in PyTorch?

I know I can save a model by torch.save(model.state_dict(), FILE) or torch.save(model, FILE). But both of them don't save the architecture of model.
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ? I want to apply different tweaks to my model. Do I have any better way than copying the whole class definition every time and creating a new class if I can't save the architecture of a model?
You can refer to this article to understand how to save the classifier. To make a tweaks to a model, what you can do is create a new model which is a child of the existing model.
class newModel( oldModelClass):
def __init__(self):
super(newModel, self).__init__()
With this setup, newModel has all the layers as well as the forward function of oldModelClass. If you need to make tweaks, you can define new layers in the __init__ function and then write a new forward function to define it.
Saving all the parameters (state_dict) and all the Modules is not enough, since there are operations that manipulates the tensors, but are only reflected in the actual code of the specific implementation (e.g., reshapeing in ResNet).
Furthermore, the network might not have a fixed and pre-determined compute graph: You can think of a network that has branching or a loop (recurrence).
Therefore, you must save the actual code.
Alternatively, if there are no branches/loops in the net, you may save the computation graph, see, e.g., this post.
You should also consider exporting your model using onnx and have a representation that captures both the trained weights as well as the computation graph.
Regarding the actual question:
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ?
The answer is: You cannot
Is there any way to load a trained model without declaring the class definition before ?
I want the model architecture as well as parameters to be loaded.
no, you have to load the class definition before, this is a python pickling limitation.
https://discuss.pytorch.org/t/how-to-save-load-torch-models/718/11
Though, there are other options (probably you have already seen most of those) that are listed at this PyTorch post:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
PyTorch's way of serializing a model for inference is to use torch.jit to compile the model to TorchScript.
PyTorch's TorchScript supports more advanced control flows than TensorFlow, and thus the serialization can happen either through tracing (torch.jit.trace) or compiling the Python model code (torch.jit.script).
Great references:
Video which explains this: https://www.youtube.com/watch?app=desktop&v=2awmrMRf0dA
Documentation: https://pytorch.org/docs/stable/jit.html

Where can one configure the CNN used by the Spacy TextCategorizer?

According to the comment at the top of the TextCategorizer,
Train a convolutional neural network text classifier on the IMDB
dataset, using the TextCategorizer component. The dataset will be
loaded automatically via Thinc's built-in dataset loader. The model is
added to spacy.pipeline, and predictions are available via doc.cats.
For more details, see the documentation:
* Training: https://spacy.io/usage/training
Where is the code for the CNN? Can the CNN be configured? Is there a research paper the implementation is based on?
The network architecture is defined in the _.ml module specifically within the build_text_classifier function.
The code related with the training is within the pipeline module specifically within the TextCategorizer class.
Some parameters like drop_out, batch_size and the number of epochs can be configured as showed in the example, you can also modify the architecture of the network but for that you have to know about the framework behind spaCy which is called Thinc https://github.com/explosion/thinc and some Cython.
I don't know about any paper describing the model but this video provide a great description of it https://www.youtube.com/watch?v=sqDHBH9IjRU

Using a pytorch model for inference

I am using the fastai library (fast.ai) to train an image classifier. The model created by fastai is actually a pytorch model.
type(model)
<class 'torch.nn.modules.container.Sequential'>
Now, I want to use this model from pytorch for inference. Here is my code so far:
torch.save(model,"./torch_model_v1")
the_model = torch.load("./torch_model_v1")
the_model.eval() # shows the entire network architecture
Based on the example shown here: http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#sphx-glr-beginner-data-loading-tutorial-py, I understand that I need to write my own data loading class which will override some of the functions in the Dataset class. But what is not clear to me is the transformations that I need to apply at test time? In particular, how do I normalize the images at test time?
Another question: is my approach of saving and loading the model in pytorch fine? I read in the tutorial here: http://pytorch.org/docs/master/notes/serialization.html that the approach that I have used is not recommended. The reason is not clear though.
Just to clarify: the_model.eval() not only prints the architecture, but sets the model to evaluation mode.
In particular, how do I normalize the images at test time?
It depends on the model you have. For instance, for torchvision modules, you have to normalize the inputs this way.
Regarding on how to save / load models, torch.save/torch.load "saves/loads an object to a disk file."
So, if you save the_model, it will save the entire model object, including its architecture definition and some other internal aspects. If you save the_model.state_dict(), it will save a dictionary containing the model state (i.e. parameters and buffers) only. Saving the model can break the code in various ways, so the preferred method is to save and load only the model state. However, I'm not sure if fast.ai "model file" is actually a full model or the state of a model. You have to check this so you can correctly load it.

Resources