Memory Leakage tflite models - python-3.x

I have a tensorsorflow lite model (keras) in a flutter application that segments objects in an image (model is trained only on one specific object). With this model I can make 50-60 predictions without problems. Now i used the exact same model to train for another classification, when i use these two in a sequence it crashes after 3 images because of a memory leakage. For this task i have to have separate models, because in the future i might want to add/remove/switch just some of the models.
Does anyone have experience with keras.tenorsflow and memory leakage when tflite models are run sequentially?
I have done:
both models to tflite
include optimizers as false
compile as false
What i can not figure out:
Why one model can make 60 predictions in a row but not more than 3 when 2 models are run
How i can clear memory in flutter application
Where in model.predict() memory leakage occur
And i tried to minimize the size of the models through code below but that did not solve the issue.
import tensorflow as tf
model = tf.keras.models.load_model("model")
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.target_spec.supported_types = [tf.float16]
tflite_quant_model = converter.convert()
#save converted quantization model to tflite format
open("model.tflite", "wb").write(tflite_quant_model)

Related

Loading a GPU trained BERTopic model on CPU?

I trained a BERTopic model on a GPU, and now for visualization purposes I want to load it on a CPU.
But when I tried to do that I got:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
When I tried to use the suggested fix I got the same problem?
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
topic_model = torch.load(args.model, map_location=torch.device('cpu'))
When you want to save the BERTopic model without the embedding model, you can run the following:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
from sentence_transformers import SentenceTransformer
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Train the model
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic(embedding_model=embedding_model)
topics, probs = topic_model.fit_transform(docs)
# Save the model without the embedding model
topic_model.save("my_model", save_embedding_model=False)
This should prevent any issues with GPU/CPU if you are not using any of the cuML sub-models in BERTopic.
Saw some fix that suggests to save the model without its embeddings model, but don't want to retrain an resave unless its the last option, and would also love if someone could explain what's this embedding model and what's going on under the hood.
The embedding model is typically a pre-trained model that actually is not learning from the input data. There are options to make it learn during training but that requires a custom component in BERTopic. In other words, when you use a pre-trained model, it is no problem removing that pre-trained model when saving the topic model as there would be no need to re-train the model.
In other words, we would first save our topic model in our GPU environment without the embedding model:
topic_model.save("my_model", save_embedding_model=False)
Then, we load in our saved BERTopic model in our CPU environment and then pass the pre-trained embedding model:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
topic_model = BERTopic.load("my_model", embedding_model=embedding_model )
You can learn more about the role of the embedding model here.

Is there any way to speed up the predicting process for tensorflow lattice?

I build my own model with Keras Premade Models in tensorflow lattice using python3.7 and save the trained model. However, when I use the trained model for predicting, the speed of predicting each data point is at millisecond level, which seems very slow. Is there any way to speed up the predicting process for tfl?
There are multiple ways to improve speed, but they may involve a tradeoff with prediction accuracy. I think the three most promising options are:
Reduce the number of features
Reduce the number of lattices per feature
Use an ensemble of lattice models where every lattice model only gets a subsets of the features and then average the predictions of the different models (like described here)
As the lattice model is a standard Keras model, I recommend trying OpenVINO. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime. OpenVINO is optimized for Intel hardware, but it should work with any CPU.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. If you care about latency, I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement. If you care about throughput, change the value to THROUGHPUT or CUMULATIVE_THROUGHPUT.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.

Using pretrained models in Pytorch for Semantic Segmentation, then training only the fully connected layers with our own dataset

I am learning Pytorch and trying to understand how the library works for semantic segmentation.
What I've understood so far is that we can use a pre-trained model in pytorch. I've found an article which was using this model in the .eval() mode but I have not been able to find any tutorial on using such a model for training on our own dataset. I have a very small dataset and I need transfer learning to get results. My goal is to only train the FC layers with my own data. How is that achievable in Pytorch without complicating the code with OOP or so many .py files. I have been having a hard time figuring out such repos in github as I am not the most proficient person when it comes to OOP. I have been using Keras for Deep Learning until recently and there everything is easy and straightforward. Do I have the same options in Pycharm?
I appreciate any guidance on this. I need to run a piece of code that does the semantic segmentation and I am really confused about many of the steps I need to take.
Assume you start with a pretrained model called model. All of this occurs before you pass the model any data.
You want to find the layers you want to train by looking at all of them and then indexing them using model.children(). Running this command will show you all of the blocks and layers.
list(model.children())
Suppose you have now found the layers that you want to finetune (your FC layers as you describe). If the layers you want to train are the last 5 you can grab all of the layers except for the last 5 in order to set their requires_grad params to False so they don't train when you run the training algorithm.
list(model.children())[-5:]
Remove those layers:
layer_list = list(model.children())[-5:]
Rebuild model using sequential:
model_small = nn.Sequential(*list(model.children())[:-5])
Set requires_grad params to False:
for param in model_small.parameters():
param.requires_grad = False
Now you have a model called model_small that has all of the layers except the layers you want to train. Now you can reattach the layers that your removed and they will intrinsically have the requires_grad param set to True. Now when you train the model it will only update the weights on those layers.
model_small.avgpool_1 = nn.AdaptiveAvgPool2d()
model_small.lin1 = nn.Linear()
model_small.logits = nn.Linear()
model_small.softmax = nn.Softmax()
model = model_small.to(device)

Keras predict returns same results despite loading model and loading weights

I trained and saved a keras model in json format and saved the weights in h5 format. The only issue is that each time I run predict, I get different results. There is no randomness when loading the data, I simply load the data (without shuffling), load the keras model and weights, and then run predict. I don't run the compile command, since it's not needed. Something I was wondering is if the dropout layers I added to the model contribute to this, but I read that they are automatically disabled for prediction/evaluation stages. Any ideas on why this might be happening?

Train, Save and Load a Tensorflow Model

Refer to this to train a GAN model for MNIST dataset, I want to save a model and restore it for further prediction. After having some understanding of Saving and Importing a Tensorflow Model I am able to save and restore some variables of inputs and outputs but for this network I am able to save the model only after some specific iterations and not able to predict some output.
Did you refer to this guide? It explains very clearly how to load and save tensorflow models in all possible formats.
If you are new to ML, I'd recommend you give Keras a try first, which is much easier to use. See https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model, pretty much you can use:
model.save('my_model.h5')
to save your model to disk.
model = load_model('my_model.h5')
to load your model and make prediction

Resources