running classification model with object detection demo - keras

Since intel open vino does not directly support keras, I saved keras model in saved_model.pb by using this method https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html (using open vino version 2021.4)
After that, I converted it to IR (xml and bin) and tested with hello classification.py in development tool (it worked fine, result with 1 image = class id 0 = 100 percent , class id 1 = 0 percent).
Finally I want to use my classification model with one of open model in open vino (etc. person detection). I ran my code like below
I got unsupported model output. I think I know why because my classification has label 0 (person wearing hat) and 1(not wearing hat). However, at the same time person-detection-0201 (https://docs.openvino.ai/2021.4/omz_models_model_person_detection_0201.html) has output id 0 = person. if my classifcation classifiy whether a person is wearing hat or not with this person detection, what should I do? I am lost on this. Do I have to make my own custom object detection.py (or modify from the original object detection.py ,same thing)
final output result (steps):
I want to see the object detection demo detects a person first with person-detection-0201
classify whether a person is wearing a hat or not from person-detection-0201

The network of Object Detection Demo is only supported for one model in a single inference. To run an inference with multiple outputs, you should combine the image data and re-train the network. The result of the first model will be fed as an input of the second model for inference. The Security Barrier Camera C++ Demo and Action Recognition Python Demo are examples that use multiple models in a single inference.
Person-detection-0201 model network is only able to detect based on MobileNetV2 backbone with two SSD heads from 1/16 and 1/8 scale feature maps, and clustered prior boxes for 384x384 resolution. To classify a person wearing hat, it is only possible if you have the weight file and add person wearing hat dataset into the training set. This can be done with fine-tuning or using the pre-trained weight file.

Related

Darknet Yolov3 - Custom training on pre-trained model

Actually in darknet yolov3 model has coco.names file for labels which include 80 classes.
Now if I want to train a custom model with two labels only, where one label is already there in coco.names and another is not there.
For example I want to train a model to detect for cell phone and dslr camera, so cell phone class already exist in coco.names whereas dslr camera is not there in its labels file.
So can I train custom model using two classes cell phone and dslr camera and give data of only dslr camera for training and it will predict for both dslr camera and cell phone or shall I train with both data of cell phone and dslr images or is there any other way out.
I am a bit new to ML, so any help would be great
Thanks
So you want to fine tune a pre-trained model.
You need to think of classes by just being a set of end nodes of a network, the labels (phone, camera) are just a naming convention for them, and to give us visual guidance.
These nodes are fully connected (with associated weights) to the previous layer of the network, the total number of these intermediate connections varies depending on the number of end nodes (classes) you have.
With the fully trained model, you can't just select the nodes you want, and take out the rest, and add a few more. Because the previous layer (and full network) was trained to give estimates/predictions taking into account a certain number of final nodes.
So basically you need to give a full reset on the last layer (the head), and restart it with the desired number of classes. The idea here, is that you take advantage of the previous training effort on a broader dataset, and fine tune it to your desired data.
Short answer, you need data for both, and need to change the model to accept 2 classes only.
To configure that specific model for the new number of classes and data, I believe you can find some guidance and instructions here

YOLOV3 object detection not detecting the object and bounding boxes are not bounding the objects

I am implementing YOLOv3 and have trained the model on my custom class ( which is tomato). I have used the darknet model 53 weights ( https://pjreddie.com/media/files/darknet53.conv.74) to start my training as per the instructions provided by many sites on training and object detection using YOLOv3 . I thought it was not necessary to list down the steps.
One of my object images used for training is shown below ( with bounding boxes using LabelImg):
The txt file for the above image for the bounding boxes contains the following coordinates , as created using labellmg:
0 0.152807 0.696655 0.300640 0.557093
0 0.468728 0.705306 0.341862 0.539792
0 0.819652 0.695213 0.337242 0.543829
0 0.317164 0.271626 0.324449 0.501730
Now when I use the same image for testing to determine the accuracy of detection, it is unable to detect all the tomatoes and moreover the bounding boxes are way off from the objects as shown below:
I am not sure what is going on.
I have cloned the git
https://github.com/AlexeyAB/darknet and did a local make and trained the model on the custom object. Nothing fancy.
The pictures above were taken from my phone. I have trained the darknet using a combination of downloaded images and custom tomato pictures I had taken from my phone. I have 290 images for training.
Maybe your model can't generalize well. Maybe your are training too much, which can cause over-fitting or even your dataset is small.
You can try testing on a never seen data (a new tomato picture) and sees if it does well.
Double-check your config files, if something is incorrect there, like you are using a yolov4 cfg in a yolov3 model.
And I recommend that you read this article in which can help you understand better how neural networks works:
https://towardsdatascience.com/understand-neural-networks-model-generalization-7baddf1c48ca

can we extract VGG16/19 features for classes it was not trained on

I have a query regarding the extraction of VGG16/VGG19 features for my experiments.
The pre-trained VGG16 and VGG19 models have been trained on ImageNet dataset having 1000 classes (say c1,c2, ... c1000) and normally we extract the features from first and second fully connected layers designated ('FC1' and 'FC2'); these 4096 dimensional feature vectors are then used for computer vision tasks.
My question is that can we use these networks to extract features of an image that does not belong to any of the above 1000 classes ? In other words, can we use these networks to extract features of an image with label c1001 ? Remember that c1001 does not belong to the Imagenet classes on which these networks were initially trained on.
In the article available on https://www.pyimagesearch.com/2019/05/20/transfer-learning-with-keras-and-deep-learning/, I am quoting the following -
When performing feature extraction, we treat the pre-trained network
as an arbitrary feature extractor, allowing the input image to
propagate forward, stopping at pre-specified layer, and taking the
outputs of that layer as our features
From the above text, there is no restriction to whether the image must necessarily belong to one of the Imagenet classes.
Kindly spare some time to uncover this mystery.
In the research papers, the authors simply state that they have used features extracted from VGG16/VGG19 network pre-trained on Imagenet dataset without giving any further details.
I am giving a case study for reference:
Animal with Attribute dataset (see https://cvml.ist.ac.at/AwA2/) is a very popular dataset with 50 animal classes for image recognition task. The authors have extracted ILSVRC-pretrained ResNet101 features for the above dataset images. This ResNet 101 network has been pre-trained on 1000 imagenet classes (different imagenet classes are available at https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a#file-imagenet1000_clsidx_to_labels-txt).
Also, the AWA classes are put as follows:
antelope, grizzly+bear, killer+whale, beaver, dalmatian, persian+cat, horse
german+shepherd, blue+whale, siamese+cat, skunk, mole, tiger, hippopotamus, leopard, moose, spider+monkey, humpback+whale, elephant, gorilla, ox, fox, sheep
seal, chimpanzee, hamster, squirrel, rhinoceros, rabbit, bat, giraffe, wolf, chihuahua, rat, weasel, otter, buffalo, zebra, giant+panda, deer, bobcat, pig, lion, mouse, polar+bear, collie, walrus, raccoon, cow, dolphin
Now, if we compare the classes in the dataset with 1000 Imagenet classes, we find that classes like dolphin, cow, racoon, bobcat, bat, seal, sheep, horse, grizzly bear, giraffe etc are not there in the Imagenet and still the authors went on with extracting ResNet101 features. I believe that the features extracted are generalizable and that is why authors consider these features as meaningful representations for the AWA images.
Your take on this ?
The idea is to get the representations for the images not belonging to ImageNet classes and use them along with their labels in some other classifier.
Yes, you can, but.
Features in first fully-connected layers suppose to encode very general patterns, like angles, lines, and simple shapes. You can assume those can be generalized outside the class set it was trained on.
There is one But, however - those features were found as to minimize error on that particular classification task with 1000 classes. It means, that there can be no guarantee that they are helpful for classifying arbitrary class.
For only extracting the features, you can input any image you want in your pretrained VGG/other CNN. However, for the purpose of training, you have to implement other steps as stated below.
The features that are extracted have been determined by means of exclusively training on those 1000 classes and belong to those 1000 classes. You can use your network to predict on images that do not belong to those 1000 classes, but in the paragraphs below I explain why this is not the desired approach.
The key point to outline here is that, the set features that were extracted can be used to detect/determine the presence of other objects within a photo, but not "ready"/"out of the box".
For example, edges and lines are features that are not related exclusively to those 1000 classes, but also to other ones, hence they are useful, general features.
Therefore, you can employ "transfer learning", to train on your own images (dataset), for example c1001, c1002, c1003.
Notice however that you need to train on your own set before you can use the network to predict on your new images(new classes). Transfer learning refers to using the set of already gathered/learned features, which can be suitable to apply on another problem, but you need to train on your "new problem", say c1001, c1002, c1003.
For Image classification you may need to fine tune the model using relevant classes for c1001 class label.
But if you are planning to use it for unsupervised learning and using it for feature extraction part only, then there is no need to retrain the model. You can use existing pre-trained weights from ImageNet and extract feature then using that weights as VGG16/19 will generalize lower level feature in its initial layers and last few layers are only used for classification purpose.
So basically pretrained model can be used for unsupervised and feature extraction purpose without retraining.

How to use CNN model to detect object recognized by YOLO

Let start by saying that i have 2 pre-trained models (in hdf5 files):
The first model is a YOLO-based model, trained on dataset A, which is used to locate human in any images (note that: a trained images o this model may contain many people inside)
The second model is a CNN model which is used to detect gender of a person (male or female) based on the image which only contains 1 person.
Suppose that i only want to use these 2 models and do not want to re-train or modify anything on the dataset. How could i locate female person in a picture of Dataset A?
A possible solution that i think could work:
First use the first model to detect, that is to create bounding boxes around persons in the images.
Crop the bounding boxes into unique images. Feed those images to the second model to see if that person is Female/Male
However, this solution is slow in performance. So is there anyway that can festen this solution or perform this task in different ways?

CNN multi-class network

what approach should i take when I want my CNN multi-class network to output something like [0.1, 0,1] when image doesn't belong
to any class. Using softmax and categorical_crossentropy for multi-class would give me output that sums up to 1 so still not what I want.
I'm new to neural networks so sorry for silly question and thanks in advance for any help.
I think you are gonna think about Bayesian Learning. First, talking about uncertainty.
For example, given several pictures of dog breeds as training data—when a user uploads a photo of his dog—the hypothetical website should return a prediction with rather high confidence. But what should happen if a user uploads a photo of a cat and asks the website to decide on a dog breed?
The above is an example of out of distribution test data. The model has been trained on photos of dogs of different breeds, and has (hopefully) learnt to distinguish between them well. But the model has never seen a cat before, and a photo of a cat would lie outside of the data distribution the model was trained on. This illustrative example can be extended to more serious settings, such as MRI scans with structures a diagnostics system has never observed before, or scenes an autonomous car steering system has never been trained on.
A possible desired behaviour of a model in such cases would be to return a prediction (attempting to extrapolate far away from our observed data), but return an answer with the added information that the point lies outside of the data distribution. We want our model to possess some quantity conveying a high level of uncertainty with such inputs (alternatively, conveying low confidence).
Then, I think you could read briefly this paper when they also apply to classification task and generate uncertainty for classes (dog, cat...). From this paper, you can extend your finding to application using this paper, and I think you will find what you want.

Resources