how to get ground truth for saliency object detection(SOD)? - object

I am new to SOD(saliency object detection) and these days are working on it. I see many CNN and transformer models are applied in SOD, such as poolnet, VIT. However, these models are all used in open source datasets for example MSRA. I want to train these models on my own dataset, but first how can I get the ground truth(saliency map) of SOD? Do I need to segment the salient object in images with tools like labelme and what should I do then? I have been confused for several days, Could anyone please give me any clues?

Related

How to get the inference compute graph of the pytorch model?

I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model

YOLOV3 object detection not detecting the object and bounding boxes are not bounding the objects

I am implementing YOLOv3 and have trained the model on my custom class ( which is tomato). I have used the darknet model 53 weights ( https://pjreddie.com/media/files/darknet53.conv.74) to start my training as per the instructions provided by many sites on training and object detection using YOLOv3 . I thought it was not necessary to list down the steps.
One of my object images used for training is shown below ( with bounding boxes using LabelImg):
The txt file for the above image for the bounding boxes contains the following coordinates , as created using labellmg:
0 0.152807 0.696655 0.300640 0.557093
0 0.468728 0.705306 0.341862 0.539792
0 0.819652 0.695213 0.337242 0.543829
0 0.317164 0.271626 0.324449 0.501730
Now when I use the same image for testing to determine the accuracy of detection, it is unable to detect all the tomatoes and moreover the bounding boxes are way off from the objects as shown below:
I am not sure what is going on.
I have cloned the git
https://github.com/AlexeyAB/darknet and did a local make and trained the model on the custom object. Nothing fancy.
The pictures above were taken from my phone. I have trained the darknet using a combination of downloaded images and custom tomato pictures I had taken from my phone. I have 290 images for training.
Maybe your model can't generalize well. Maybe your are training too much, which can cause over-fitting or even your dataset is small.
You can try testing on a never seen data (a new tomato picture) and sees if it does well.
Double-check your config files, if something is incorrect there, like you are using a yolov4 cfg in a yolov3 model.
And I recommend that you read this article in which can help you understand better how neural networks works:
https://towardsdatascience.com/understand-neural-networks-model-generalization-7baddf1c48ca

How to train CNN on LFW dataset?

I want to train a facial recognition CNN from scratch. I can write a Keras Sequential() model following popular architectures and copying their networks.
I wish to use the LFW dataset, however I am confused regarding the technical methodology. Do I have to crop each face to a tight-fitting box? That seems impractical, as the dataset has 13000+ faces.
Lastly, I know it's stupid, but all I have to do is preprocess the images (of course), then fit the model to these images? What's the exact procedure?
Your question is very open ended. Before preprocessing and fitting the model, you need to understand Object Detection. Once you understand what object detection you will get answer to your 1st question whether you are required to manually crop every 13000 image. The answer is no. However, you will have to draw bounding boxes around faces and assign label to images if they are not available in the training data.
Your second question is very vague . What do you mean by exact procedure? Is it the steps you need to do or how to do preprocessing and fitting of the model in python/or any other language? There are lots of references available on the internet about how to do preprocessing and model training for every specific problem. There are no universal steps which can be applied to any problem

Model unable to identify distant objects

I have made a object recognition and detection model using tensorflow. It identifies the images which are clearly visible but its unable to identify if the same object is at a large distance. I am using Faster RCNN model. the model is able to identify the same object when it is closer but not when it is at a far distance. It has been trained already for the same object. How can i make the model identify objects at a distance?
You can resize and add padding using data augmentation to images with objects that are clearly visible so that they look like they are in a big distance and train your model further with those images

Train multiple models with various measures and accumulate predictions

So I have been playing around with Azure ML lately, and I got one dataset where I have multiple values I want to predict. All of them uses different algorithms and when I try to train multiple models within one experiment; it says the “train model can only predict one value”, and there are not enough input ports on the train-model to take in multiple values even if I was to use the same algorithm for each measure. I tried launching the column selector and making rules, but I get the same error as mentioned. How do I predict multiple values and later put the predicted columns together for the web service output so I don’t have to have multiple API’s?
What you would want to do is to train each model and save them as already trained models.
So create a new experiment, train your models and save them by right clicking on each model and they will show up in the left nav bar in the Studio. Now you are able to drag your models into the canvas and have them score predictions where you eventually make them end up in the same output as I have done in my example through the “Add columns” module. I made this example for Ronaldo (Real Madrid CF player) on how he will perform in match after training day. You can see my demo on http://ronaldoinform.azurewebsites.net
For more detailed explanation on how to save the models and train multiple values; you can check out Raymond Langaeian (MSFT) answer in the comment section on this link:
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-convert-training-experiment-to-scoring-experiment/
You have to train models for each variable that you going to predict. Then add all those predicted columns together and get as a single output for the web service.
The algorithms available in ML are only capable of predicting a single variable at a time based on the inputs it's getting.

Resources