I’m currently working on object detection using yolov5. I trained a model with a custom dataset which has 3 classes = [‘Car’,‘Motorcycle’,‘Person’]
I have many questions related to yolov5.
All the custom images are labelled using Roboflow.
question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos.
question 2:
Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)
Is it ok if place all my images like this
And should i need a ‘test’ folder?
question3: What is the ‘imgsz’ in detect.py? Is it downsampling the input source?
I’ve spent more than 3 weeks in yolo. I love it but i find some parts difficult to grasp. kindly provide suggestion for this questions. Thanks in advance.
"question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos."
As long as you've resized/normalized all of your images to be the same square size, then you should be fine. YOLO trains on square images. You can use a platform like Roboflow to process your images so they not only come out in the right structure (for your images and annotation files) but also resize them while generating your dataset so they are all the same size. http://roboflow.com/ - you just need to make a public workspace to upload your images to and you can use the platform free. Here's a video that covers custom training with YOLOv5: https://www.youtube.com/watch?v=x0ThXHbtqCQ
Roboflow's python package can also be used to extract your images programmatically: https://docs.roboflow.com/python
"Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)"
Yes that directory model is correct from training. Its what I have whenever I run YOLOv5 training too.
You do need a test folder if you want to run inference against the test folder images to learn more about your model's performance.
The 'imgsz' parameter in detect.py is for setting the height/width of the images for inference. You set it at the value you used for --img when you ran train.py.
For example: Resized images to 640 by 640 when generating your images for training? Use (640, 640) for the 'imgsz' parameter (that is the default value). And that would also mean you set --img to 640 when you ran train.py
detect.py parameters (YOLOv5 Github repo)
train.py parameters (YOLOv5 Github repo)
YOLOv5's Github: Tips for Best Training Results https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results
Roboflow's Model Production Tips: https://docs.roboflow.com/model-tips
Related
I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model
I am implementing YOLOv3 and have trained the model on my custom class ( which is tomato). I have used the darknet model 53 weights ( https://pjreddie.com/media/files/darknet53.conv.74) to start my training as per the instructions provided by many sites on training and object detection using YOLOv3 . I thought it was not necessary to list down the steps.
One of my object images used for training is shown below ( with bounding boxes using LabelImg):
The txt file for the above image for the bounding boxes contains the following coordinates , as created using labellmg:
0 0.152807 0.696655 0.300640 0.557093
0 0.468728 0.705306 0.341862 0.539792
0 0.819652 0.695213 0.337242 0.543829
0 0.317164 0.271626 0.324449 0.501730
Now when I use the same image for testing to determine the accuracy of detection, it is unable to detect all the tomatoes and moreover the bounding boxes are way off from the objects as shown below:
I am not sure what is going on.
I have cloned the git
https://github.com/AlexeyAB/darknet and did a local make and trained the model on the custom object. Nothing fancy.
The pictures above were taken from my phone. I have trained the darknet using a combination of downloaded images and custom tomato pictures I had taken from my phone. I have 290 images for training.
Maybe your model can't generalize well. Maybe your are training too much, which can cause over-fitting or even your dataset is small.
You can try testing on a never seen data (a new tomato picture) and sees if it does well.
Double-check your config files, if something is incorrect there, like you are using a yolov4 cfg in a yolov3 model.
And I recommend that you read this article in which can help you understand better how neural networks works:
https://towardsdatascience.com/understand-neural-networks-model-generalization-7baddf1c48ca
I've noticed that for any tutorial or example of a Keras CNN that I've seen, the input images are numbered, e.g.:
dog0001.jpg
dog0002.jpg
dog0003.jpg
...
Is this necessary?
I'm working with an image dataset with fairly random filenames (the classes come from the directory name), e.g.:
picture_A2.jpg
image41110.jpg
cellofinterest9A.jpg
I actually want to keep the filenames because they mean something to me, but do I need to append sequential numbers to my image files?
No they can be of different names, it really depends on how you load your data. In your case, you can use flow_from_directory to generate the training data and indeed the directory will be the associated class, this is part of ImageDataGenerator.
Good morning,
I have been working with the tensorflow object detection tutorial using the ssd_mobilenet they are providing as a frozen graph as well as with the corresponding checkpoint files (model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta).
However, as the images are sometimes badly recognized, I hoped I could feed own images to the detection model and improve its performance for my images, that are all taken by the same camera.
Google could not help me where to start. The questions I am having:
- Are there any code snippets that show which of those files to load and how to train the existing model?
- Do I need to retrain the loaded model with the old data (i.e. COCO) + the new data (my images) or can I just retrain it using my data and the model remembers what it has learned before?
Sorry for this very unspecific questions, but I just can not figure out where to start.
There is a great walkthrough blog and code base written by Dat Tran. He trained a model to recognize Raccoons in images using the pre-trained SSD_mobilenet as a start. This is the best place I found to start. Hope this helps.
I have trying to develop machine learning based image classification system using Scikit-Learn. I am trying to do is multi class classification. the biggest problem i am facing with Scikit-Learn is how to load the data. Then I came across one of the examples face_recognition.py. which using fetch_lfw_people to fetch data from internet. I could see this example actually does multi class classification. I was trying to find some documentation on the example but was unable to find. I have some question here, what does fetch_lfw_people do ? what does this function load in the lfw_people. Also what i saw in the data folder there are some text file .is the code reading the text files/? My main intention is to load my set of image data but i am unable to do it with fetch_lfw_people in case i change the path that my image folder by data_home and funneled=False.I get erros, I hope i get some answers here
First thing first. You can't directly give images as an input to your classifier. You have to extract some features from you images. Or you can load your image using opencv and use the numpy array as an input to your classifier.
I would suggest you to read some basics of image classification , like how you can train your classifier and all.
Coming to your question about fetch_lfw_people function. It might be downloading already pre-processed image data from any text file. If you are training from your images you have to first convert your image data to some numerical features.