How to use a trained deeplearning model at different resolutions? - python-3.x

I have trained a model for image segmentation task on 320x240x3 resolution images using tensorflow 2.x. I am wondering if there is a way to use the same model or tweak the model to make it work on different resolutions?
I have to use a model trained on a 320x240 resolution for Full HD (1920x1080) and SD(1280x720) images but as the GPU Memory is not sufficient to train the model at the specified resolutions with my architecture, I have trained it on 320x240 images.
I am looking for a scalable solution that works at all the resolutions. Any Suggestions?

The answer to your question is no: you cannot use a model trained at a particular resolution to be used at different resolution; in essence, this is why we train the models at different resolutions, to check the performance and possibly improve it.
The suggestion below omits one crucial aspect: that, depending on the task at hand, increasing the resolution can considerably improve the results in object detection and image segmentation, particularly if you have small objects.
The only solution for your problem, considering the GPU memory constraint, is to try to split the initial image into smaller parts (or maybe tiles) and train per part(say 320x240) and then reconstruct the initial image; otherwise, there is no other solution than to increase the GPU memory in order to train at higher resolutions.
PS: I understood your question after reading it a couple of times; I suggest that you modify a little bit the details w.r.t the resolution.

YEAH, you can do it in high resolution image. But the small resolution is easy to train and it is easy for the model to find the features of the image. Training in small resolution models saves your time and makes your model faster since it has the less number of parameters. HD images contains large amount of pixels, so if you train your model in higher resolution images, it makes your training and model slower as it contains large number of parameters due to the presence of higher number of pixels and it makes difficult for your model to find features in the high resolution image. So, mostly your are advisable to use lower resolution instead of higher resolution.

Related

How to optimize memory footprint of Stanza models

I'm using Stanza to get tokens, lemmas and tags from documents in multiple languages for the purposes of a language learning app. This means that I need to store and load many Stanza (default) models for different languages.
My main problem right now is that if I want to load all those models the memory requirement is too much for my resources. I currently deploy a web API running Stanza NLP on AWS. I want to keep my infrastructure costs at a minimum.
One possible solution is to load one model at a time when I need to run my script. I guess that means there will be some extra overhead each time in order to load the model in memory.
Another thing I tried is just to use the processors that I really need which decreases the memory footprint but not by that much.
I tried looking at open and closed issues on Github and Google but didn't find much.
What other possible solutions are out there?
The bottom line is a model for a language has to be in memory during execution, so by some means or another you need to make the model smaller or tolerate storing models on disk. I can offer some suggestions to make the models smaller, though be warned that making your model smaller will probably result in poorer accuracy.
You could examine the percentage breakdown of language requests, and store commonly requested languages in memory and only go to disk for rarer language requests.
The most immediate impact strategy for reducing model size is to shrink the vocabulary size. It is possible you could cut the vocabulary even smaller and still get similar accuracy. We have done some optimization on this front, but there may be more opportunity to cut model size.
You could experiment with smaller model size and word embeddings and may only get a small accuracy drop, we haven't really aggressively experimented with different model sizes to see how much accuracy you lose. This would mean retraining the model and just setting the embedding size and model size parameters smaller.
I don't know a lot about this, but there is a strategy of tagging a bunch of data with your big accurate model, and then training a smaller model to mimic the big model. I believe this is called "knowledge distillation".
In a similar direction, you could tag a bunch of data with Stanza, and then train a CoreNLP model (which I think would have a smaller memory footprint).
In summary, I think the easiest thing to do would be to retrain a model with a smaller vocabulary size. We I think it currently has 250,000 words, and cutting to 10,000 or 50,000 will reduce model size, but may not affect accuracy too badly.
Unfortunately I don't think there is a magical option you can select that will just solve this issue, you will have to retrain models and see what kind of accuracy you are willing to sacrifice for a lower memory footprint.

How many images(minimum) should be there in each classes for training YOLO?

I am trying to implement YOLOv2 on my custom dataset. Is there any minimum number of images required for each class?
There is no minimum images per class for training. Of course the lower number you have, the model will converge slowly and the accuracy will be low.
What important, according to Alexey's (popular forked darknet and the creator of YOLO v4) how to improve object detection is :
For each object which you want to detect - there must be at least 1
similar object in the Training dataset with about the same: shape,
side of object, relative size, angle of rotation, tilt, illumination.
So desirable that your training dataset include images with objects at
diffrent: scales, rotations, lightings, from different sides, on
different backgrounds - you should preferably have 2000 different
images for each class or more, and you should train 2000*classes
iterations or more
https://github.com/AlexeyAB/darknet
So I think you should have minimum 2000 images per class if you want to get the optimum accuracy. But 1000 per class is not bad also. Even with hundreds of images per class you can still get decent (not optimum) result. Just collect as many images as you can.
It depends.
There is an objective minimum of one image per class. That may work with some accuracy, in principle, if using data-augmentation strategies and fine-tuning a pretrained YOLO network.
The objective reality, however, is that you may need as many as 1000 images per class, depending on your problem.

What type of CNN will be suitable for underwater image processing?

The primary objective (my assigned work) is to do an image segmentation for the underwater images using a convolutional neural network. The camera shots taken from the underwater structure will have poor image quality due to severe noise and bad light exposure. In order to achieve higher classification accuracy, I want to do an automatic image enhancement for the images (see the attached file). So, I want to know, which CNN architecture will be best to do both tasks. Please kindly suggest any possible solutions to achieve the objective.
What do you need to segment? I'd be nice so see some labels of the segmentation.
You may not need to enhance the image, if all your dataset has that same amount of noise, the network will generalize properly.
Regarding CNNs architectures, it depends on the constraints you have with processing power and accuracy. If that is not a constrain go with something like MaskRCNN, check that repo as a good starting point, some results are like this:
Be mindful it's a bit of a complex architecture so inference times might be a bit too high (but it's doable on realtime depending your gpu).
Other simple architectures are FCN (Fully Convolutional Networks) with are basically your CNN but instead of fully connected layers:
You replace with with Fully Convolutional Layers:
Images taken from HERE.
The advantage of this FCNs are that they are really easy to implement and modify since you can go with simple architectures (FCN-Alexnet), to more complex and more accurate ones (FCN-VGG, FCN-Resnet).
Also, I think you don't mention framework, there are many to choose from and it depends on your familiarly with languages, most of them you can do them with python:
TensorFlow
Pytorch
MXNet
But if you are a beginner, try starting with a GUI based one, Nvidia Digits is a great starting point and really easy to configure, it's based on Caffe so it's fairly fast when deploying and can easily be integrated with accelerators like TensorRT.

Pruning in Keras

I'm trying to design a neural network using Keras with priority on prediction performance, and I cannot get sufficiently high accuracy by further reducing the number of layers and nodes per layer. I have noticed that very large portion of my weights are effectively zero (>95%). Is there a way to prune dense layers in hope of reducing prediction time?
Not a dedicated way :(
There's currently no easy (dedicated) way of doing this with Keras.
A discussion is ongoing at https://groups.google.com/forum/#!topic/keras-users/oEecCWayJrM.
You may also be interested in this paper: https://arxiv.org/pdf/1608.04493v1.pdf.
Take a look at Keras Surgeon:
https://github.com/BenWhetton/keras-surgeon
I have not tried it myself, but the documentation claims that it has functions to remove or insert nodes.
Also, after looking at some papers on pruning, it seems that many researchers create a new model with less channels (or less layers), and then copy the weights from the original model to the new model.
See this dedicated tooling for tf.keras. https://www.tensorflow.org/model_optimization/guide/pruning
As the overview suggests, support for latency improvements is a work in progress
Edit: Keras -> tf.keras based on LucG's suggestion.
If you set an individual weight to zero won't that prevent it from being updated during back propagation? Shouldn't thatv weight remain zero from one epoch to the next? That's why you set the initial weights to nonzero values before training. If you want to "remove" an entire node, just set all of the weights on that node's output to zero and that will prevent that nodes from having any affect on the output throughout training.

Training Methodology of CNN in theano with large scale data

I am training a CNN with 1M images with theano. Now I am puzzled on how to prepare the training data.
My questions are:
When the images resize to 64*64*3, the size of whole data is about 100G. Should I save the data into a single npy file or some smaller files? which one is efficient?
How to decide the number of parameters of the CNN? How about 1M/10 = 100K?
Should I limit the memory cost of a training block and the CNN parameters less than GPU memory?
My computer is with 16G memory and GPU Titian.
Thank you very much.
If you're using a NN framework like pylearn2, lasagne, Keras, etc, check the docs to see if there are guidelines for iterating batches off disk from an hdf5 store or similar.
If there's nothing and you don't want to roll your own, the fuel package provides lots of helpful data iteration schemes that can be adapted to models in theano (and probably most of the frameworks; there's a good tutorial in the fuel repository).
As for the parameters, you'll have to cross validate to figure out the best parameters for your data.
And yes, the model size + minibatch size + dropout mask for the batch has to be under the available vram.

Resources