YOLO - are the anchor boxes used only in training? - conv-neural-network

Another question in YOLO. I've red about how YOLO adjusts anchor boxes by offsets to create the final bounding boxes.
What I do not understand, is when YOLO does it. Is it being done only during the training process, or also during the common use of already trained model?
*My guess is that it is being done ONLY in training stage, where anchor boxes are being compared to ground-truth box using IoU, and thus start "fitting" the offsets using by lose sunction until they get IoU close to 1. Am I right?

Please ignore my previous two posts. The anchor boxes are used in both training and testing with trained model. In my first post when I said the results are the same, it only applies to the detected classes, the bounding boxes are not the same if the anchor box values changed in the config file.

The anchor boxes are only used in training, they are not used during detection using the trained model. You can test this by changing the anchor boxes values to random values on a test set, the results are the same.

Following up my previous comment, I think the answer is NO and Yes. NO -- the anchor values (initial values) defined in the config file are only used during training, not used when you do detection with the trained model. Yes -- during training those values will be adjusted and saved with the model, and those adjusted values will be used when you do detection with the trained model.

Related

I have 5 folders (each contain about 200 RGB images), I want to use "Principal Component Analysis" for image classification

I have 5 folders (which represent 5 classes, and each contain about 200 colored images), I want to use "Principal Component Analysis" for image classification.
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
I am trying to apply that with code, any help please?
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
PCA is not a method for classification. It is a dimensional reduction method that is sometimes used as a processing step.
Take a look at this CrossValidated post for some more explanation. It has an example.
(FYI, saw this because you pinged me via the MATLAB Answers forum.)

YOLOV3 object detection not detecting the object and bounding boxes are not bounding the objects

I am implementing YOLOv3 and have trained the model on my custom class ( which is tomato). I have used the darknet model 53 weights ( https://pjreddie.com/media/files/darknet53.conv.74) to start my training as per the instructions provided by many sites on training and object detection using YOLOv3 . I thought it was not necessary to list down the steps.
One of my object images used for training is shown below ( with bounding boxes using LabelImg):
The txt file for the above image for the bounding boxes contains the following coordinates , as created using labellmg:
0 0.152807 0.696655 0.300640 0.557093
0 0.468728 0.705306 0.341862 0.539792
0 0.819652 0.695213 0.337242 0.543829
0 0.317164 0.271626 0.324449 0.501730
Now when I use the same image for testing to determine the accuracy of detection, it is unable to detect all the tomatoes and moreover the bounding boxes are way off from the objects as shown below:
I am not sure what is going on.
I have cloned the git
https://github.com/AlexeyAB/darknet and did a local make and trained the model on the custom object. Nothing fancy.
The pictures above were taken from my phone. I have trained the darknet using a combination of downloaded images and custom tomato pictures I had taken from my phone. I have 290 images for training.
Maybe your model can't generalize well. Maybe your are training too much, which can cause over-fitting or even your dataset is small.
You can try testing on a never seen data (a new tomato picture) and sees if it does well.
Double-check your config files, if something is incorrect there, like you are using a yolov4 cfg in a yolov3 model.
And I recommend that you read this article in which can help you understand better how neural networks works:
https://towardsdatascience.com/understand-neural-networks-model-generalization-7baddf1c48ca

Can I train YOLO on small already segmented out images and test it on a large image for detection?

I have been thinking about building a YOLO model for detecting parking lot occupancy, I have all the small segmented out images for every parking space. Can I train YOLO on these small images already divided into separate empty and occupied classes and test it on a test image like the ariel view of a parking lot with say 28 parking spots and the model should detect the occupied and empty spaces.
If yes then can someone guide me how to approach the problem? I will be using YOLO implemented on Keras.
YOLO is a n object detection model. During training, it takes coordinates of bounding boxes in an image as input and learns to identify the images inside such bounding boxes. As per your problem statement, if you have a aerial view of parking lot then draw the bounding boxes, generate xml files (as per your training requirement) and start training. This ideally should give you the desired model to predict.
Free tool to label images - https://github.com/tzutalin/labelImg
Github project to get an idea of how to train Yolo in Keras on custom dataset - https://github.com/experiencor/keras-yolo2
By any means, this is not a perfect tailor made solution for your problem given you haven't provided any code or images. But this is a good place to start.

How feature map in Keras ConvNet represent features?

I know that it might be a dumb question, but I searched everywhere for an answer but I could not get.
Okay first properly explaining my question,
When I was learning CNN I was told that kernels or filters or activation map represent a feature of image.
To be specific, assume a cat image identification, a feature map would represent a "whiskers"
and in images which the activation of this feature map would be high it is inferred as whisker is present in image and so the image is a cat. (Correct me if I am wrong)
Well now when I made a Keras ConvNet I save the model
and then loaded the model and
saved all the filters to png images.
What I saw was 3x3 px images where each each pixel was of different colour (green, blue or their various variants and so on)
So how these 3x3px random colour pattern images of kernels represent in any way the "whisker" or any other feature of cat?
Or how could I know which png images is which feature ie which is whisker detector filter etc?
I am asking this because I might be asked in oral examination by teacher.
Sorry for the length of answer (but I had to make it so to explain properly)
You need to have a further look into how convolutional neural networks operate: the main topic being the convolution itself. The convolution occurs with the input image and filters/kernels to produce feature maps. A feature map is what may highlight important features.
The filters/kernels do not know anything of the input data so when you save these you are only going to see psuedo-random images.
Put simply, where * is the convolution operator,
input_image * filter = feature map
What you want to save, if you want to vizualise what is occuring during convolution, are the feature maps. This website gives a very detailed account on how to do so, and it is the method I have used in the past.

should I label and train on all objects that exist in the training set (yolo darknet)

(yolo - object detection)
if there are two dogs in the image and I trained on only one of them in all images that exist in the training set,
is the other dogs in the training set that I didn't label and train on them will affect on the process and will cause to consider them part of background?
I am asking especially about yolo darknet object detection.
it seems so, because after 3000 batches it didn't detect anything.
so the question, should I train on all objects (like all dogs in all training set) or it doesn't matter because the yolo will take the features only from the labeled ones and ignore the background?
Yes, it is important that all the objects that you want to find - are marked on image from training dataset. You teach to find objects where they are, and not to find objects where none exist.
CNN Yolo try to solve 3 problems:
to mark by rectangle the objects for which Yolo trained - positive error on last layer
don't mark one object as another object - negative error on last layer
don't mark any objects at background - negative error on last layer
I.e. Yolo looking for differences, why the first dog is considered to be an object, and the second considered the background. If you want to find any dogs, but you label only some of them, and labeled dogs are not statistically different from not labeled dogs, then it will be extremely low accuracy of detection. Because abs(positive_error) ~= abs(negative_error) and result of training sum(positive_errors) + sum(negative_errors) ~= 0. It is a contradictory task - you want at the same time: and find a dog, and don't find the dog.
But if labeled dogs are statistically different from not labeled dogs, for example if labeled bulldogs and not labeled labradors, then Yolo-network will been trained to distinguish one from another.
it seems so, because after 3000 batches it didn't detect anything.
It is not enough, Yolo requires 10000 - 40000 iterations.

Resources