Model unable to identify distant objects - python-3.x

I have made a object recognition and detection model using tensorflow. It identifies the images which are clearly visible but its unable to identify if the same object is at a large distance. I am using Faster RCNN model. the model is able to identify the same object when it is closer but not when it is at a far distance. It has been trained already for the same object. How can i make the model identify objects at a distance?

You can resize and add padding using data augmentation to images with objects that are clearly visible so that they look like they are in a big distance and train your model further with those images

Related

fast-cnn model - images with no labels in the training data

How can I modify the example in the "Object Detection FasterRCNN Tutorial" (https://www.kaggle.com/code/pdochannel/object-detection-fasterrcnn-tutorial) to include empty labeled images in the training data? I want to be able to handle cases where there are no classes of interest in certain images.
Additionally, I have noticed that even when I include images of underwater scenes (where there are no objects of interest) the model still tends to detect something.

Can HSV images be used for CNN training

I am currently working on fingers-count deep learning problem. When you look at the dataset, images in the training and validation set are very basic and are almost the same. The network can achieve high training and validation accuracies. But when it comes to prediction in real-life images, it performs very badly(this is because the model has been trained on very basic images).
To overcome this, I converted the training and validation images to HSV(Hue-Saturation-Value) and trained the model on new HSV images. Example of 1 such image from new training set is:
I then convert my image from real life to HSV and pass it to model for prediction. But still, the model is not able to predict correctly. I assumed that since the training images and predicting image are almost same after applying HSV, the model should be predicting good. Is there something which I am thinking incorrectly here? Can HSV images be actually used for training CNN?
It seems you have the overfitting issue, and your model only memorize the simple samples of the training set and in contrast it can not generalize to more complex and diverse data.
In the context of Deep Learning there are various methods to avoid overfitting and I think you don't need to transform your input to HSV necessarily. First of all you can apply various data augmentation methods like random crop or rotation to create various versions of your data. If this method does not work, you can use a smaller model or applying techniques such as Drop Out or Regularization.
Here is a good tutorial from TensorFlow.

YOLOV3 object detection not detecting the object and bounding boxes are not bounding the objects

I am implementing YOLOv3 and have trained the model on my custom class ( which is tomato). I have used the darknet model 53 weights ( https://pjreddie.com/media/files/darknet53.conv.74) to start my training as per the instructions provided by many sites on training and object detection using YOLOv3 . I thought it was not necessary to list down the steps.
One of my object images used for training is shown below ( with bounding boxes using LabelImg):
The txt file for the above image for the bounding boxes contains the following coordinates , as created using labellmg:
0 0.152807 0.696655 0.300640 0.557093
0 0.468728 0.705306 0.341862 0.539792
0 0.819652 0.695213 0.337242 0.543829
0 0.317164 0.271626 0.324449 0.501730
Now when I use the same image for testing to determine the accuracy of detection, it is unable to detect all the tomatoes and moreover the bounding boxes are way off from the objects as shown below:
I am not sure what is going on.
I have cloned the git
https://github.com/AlexeyAB/darknet and did a local make and trained the model on the custom object. Nothing fancy.
The pictures above were taken from my phone. I have trained the darknet using a combination of downloaded images and custom tomato pictures I had taken from my phone. I have 290 images for training.
Maybe your model can't generalize well. Maybe your are training too much, which can cause over-fitting or even your dataset is small.
You can try testing on a never seen data (a new tomato picture) and sees if it does well.
Double-check your config files, if something is incorrect there, like you are using a yolov4 cfg in a yolov3 model.
And I recommend that you read this article in which can help you understand better how neural networks works:
https://towardsdatascience.com/understand-neural-networks-model-generalization-7baddf1c48ca

How to use CNN model to detect object recognized by YOLO

Let start by saying that i have 2 pre-trained models (in hdf5 files):
The first model is a YOLO-based model, trained on dataset A, which is used to locate human in any images (note that: a trained images o this model may contain many people inside)
The second model is a CNN model which is used to detect gender of a person (male or female) based on the image which only contains 1 person.
Suppose that i only want to use these 2 models and do not want to re-train or modify anything on the dataset. How could i locate female person in a picture of Dataset A?
A possible solution that i think could work:
First use the first model to detect, that is to create bounding boxes around persons in the images.
Crop the bounding boxes into unique images. Feed those images to the second model to see if that person is Female/Male
However, this solution is slow in performance. So is there anyway that can festen this solution or perform this task in different ways?

How can I combine two fast R-CNN models for object detection?

I am working on object detection in video, and I want to train two fast R-CNN models on two different types of data (for example RGB and optical flow) then combine these models in one network.
I trained the model in one type of data, but I have no idea how can I combine the both type of data.
Any help would be great!
What you are looking for are two-stream CNN architectures. As for example described by Peng and Schmid in Multi-region two-stream R-CNN for action detection. There is a large variaty of implementations but you will need to retrain your two-stream model with optical flow and image data.

Resources