How to use Keras effectively agnostic to backend - keras

I am trying out some examples using keras models, that are already available. Most of the examples are using keras with tensorflow (or pytorch or theano).
Due to limited available resource and cost cutting, I am using plaidml to work with amd gpu. As keras support pluggable backend, I think this may not be an issue.
Please share your thoughts about using keras api and later plugging in with desired backend.
I have this concern because the samples and this are using keras from tensorflow (import tensorflow.keras) and I am using plain from keras(import keras) with pluggable backend.
what is equivalent statement for
img = tf.io.decode_png(img, channels=1)
# 3. Convert to float32 in [0, 1] range
img = tf.image.convert_image_dtype(img, tf.float32)
Is there any limitation going with plain keras api?

I just used PIL Image to read and convert an image. It works the same as without using tensorflow api. Most of the keras api can be used irrespective of the backend. There are some caveat with PlaidML as well, there are some function like CTC Loss ctc_batch_cost cannot be found. I got an error like
The Keras backend function 'ctc_batch_cost' is not yet implemented in
Plaid. You can help us prioritize by letting us know if this function
is important to you, and as always, contributions are welcome!
There are some posts, which provide some sample implementation but it is not straight forward. From PLaidML, the response was that it may not be available soon.

Related

How to get the inference compute graph of the pytorch model?

I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

keras preprocessing logic

Background:
In a vision application on a GCP, we are using TF serving. The application using TF Serving is written in Go. This application converts the image to Tensor and sends it to the TF serving using gRPC.
Problem:
The preprocessing logic in Golang does not work as well as it does in Python, using Keras image library (the accuracy of inference suffers). A part of the reason could be that Python libraries were used during training.
We tried,
Tensorflow serving provides a way to introduce a pre-processor that can run on the serving container. It seems to have limited functionality (can't package Keras library with the model). We tried the following two options
What works is Keras Preprocessing (Python), on the client side as follows.
img = tf.keras.preprocessing.image.load_img(file_name, target_size=(HEIGHT, WIDTH))
img_array = tf.keras.preprocessing.image.img_to_array(img)
… grpc call to TensorflowServing...
Our goal is to use “serving_input_receiver_fn” and preprocess image in the TFServing space as described in this blog post: https://medium.com/devseed/technical-walkthrough-packaging-ml-models-for-inference-with-tf-serving-2a50f73ce6f8
But the following code which is executed as “serving_input_receiver_fn” does not yield the correct inferences.
image = tf.image.decode_image(image_str_tensor, channels=CHANNELS dtype=tf.uint8)
image = tf.reshape(image, [HEIGHT, WIDTH, CHANNELS])
Our goal is to run the following Keras code (in a similar way) inside ““serving_input_receiver_fn” (assuming that we can load the image from “grpc” stream).
img = tf.keras.preprocessing.image.load_img(file_name, target_size=(HEIGHT, WIDTH))
img_array = tf.keras.preprocessing.image.img_to_array(img)
Is it possible? This is a massive deployment (70 GPUs & 2300 CPUs) therefore every bit of the performance counts. In our case, the image preprocessing on TF-Serving machine is the most optimal.
I don't actually have an answer but maybe can point you to some resources to help. I think firstly, keras.preprocessing is supposed to be quite slow, check out https://www.tensorflow.org/tutorials/load_data/images and it recommends building pre-processing pipeline as a tf.data.Dataset pipeline
The above keras.preprocessing method is convienient, but has two
downsides:
It's slow. See the performance section below. It lacks fine-grained
control. It is not well integrated with the rest of TensorFlow. To
load the files as a tf.data.Dataset
Why not include the pre-processing layer as part of the model graph itself so that it will run WITHIN tensorflow serving?

Image segmentation with edgeTPU

I´m new here, so please be kind and teach me if I did not provide all the information you need :)
I would like to compare Edge TPU with other edge device such as Myriad. I would like to select one object detection model and one image segmentation model. Considering the following link which shows supported operations, I have noticed that yolov3 cannot be compiled for EdgeTPU because it includes LeakyRelu.
https://coral.withgoogle.com/docs/edgetpu/models-intro/
For image segmentation, I'd like to use Deeplab. But I'm still don't know if operations included in deeplab v3+, such as atrous convolution or feature pyramid network, are supported.
I'd appreciate if someone teach me what models are usable on edgeTPU. Are there any models of image segmentation?
Did you already found below?
https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/quantize.md
"mobilenetv2_coco_voc_trainaug_8bit":
deeplabv3_mnv2_pascal_train_aug_8bit/frozen_inference_graph.pb
This model is possible to converting to TFLite FlatBuffer.
And also possible to compile for edgetpu with edgetpu_compiler.
Note. edgetpu_api environment had updated.
You can find it below.
https://coral.withgoogle.com/news/updates-07-2019/
Yes. There are prepackaged segmentation models and code examples how to use them.
Here they are https://coral.ai/models/
Please share if you know where to find something similar for Movidius based VPU devices.
Here you can find all supported layers for edgetpu: https://coral.ai/docs/edgetpu/models-intro/#supported-operations.
And for Conv2D it says "Must use the same dilation in x and y dimensions.". So implementing a version of deeplab v3+ is possible for the edgetpu.

memory saving gradients or memory check pointing in keras

I recently found a github repo: https://github.com/openai/gradient-checkpointing
The main purpose is to reduce gpu memory consumption. And the usage seems pretty straight forward:
from tensorflow.python.keras._impl.keras import backend as K
K.__dict__["gradients"] = memory_saving_gradients.gradients_memory
How can I do the same thing but with keras installed separately, not as a part of tensorflow? Since this didn't work:
from keras import backend as K
K.__dict__["gradients"] = memory_saving_gradients.gradients_memory
Thank you in advance
I know I am a bit late, but I recently ran into the same problem, and I was able to solve it.
The problem (I think) is that memory_saving_gradients.gradients_memory uses a heuristic approach which does not work well for many scenarios. Fortunately, there is an alternative function: memory_saving_gradients.gradients_collection, which works perfectly fine, but it requires you to specify at which points in the network the gradient must be checkpointed.
As an example on how this can be accomplished, suppose that we want to checkpoint all the Keras layers whose name contains the word 'add' (for instance, to make a resnet memory effcient). Then, you could include something like this after building your model, but before training it:
layer_names= [layer.name for layer in self.model.layers]
[tf.add_to_collection("checkpoints", self.model.get_layer(l).get_output_at(0))
for l in [i for i in layer_names if 'add' in i]]
K.__dict__["gradients"] = memory_saving_gradients.gradients_collection
I hope it helps!

Resources