Retrain object detection model with own images (tensorflow) - python-3.x

Good morning,
I have been working with the tensorflow object detection tutorial using the ssd_mobilenet they are providing as a frozen graph as well as with the corresponding checkpoint files (model.ckpt.data-00000-of-00001, model.ckpt.index, model.ckpt.meta).
However, as the images are sometimes badly recognized, I hoped I could feed own images to the detection model and improve its performance for my images, that are all taken by the same camera.
Google could not help me where to start. The questions I am having:
- Are there any code snippets that show which of those files to load and how to train the existing model?
- Do I need to retrain the loaded model with the old data (i.e. COCO) + the new data (my images) or can I just retrain it using my data and the model remembers what it has learned before?
Sorry for this very unspecific questions, but I just can not figure out where to start.

There is a great walkthrough blog and code base written by Dat Tran. He trained a model to recognize Raccoons in images using the pre-trained SSD_mobilenet as a start. This is the best place I found to start. Hope this helps.

Related

How to place the dataset for training Yolov5?

I’m currently working on object detection using yolov5. I trained a model with a custom dataset which has 3 classes = [‘Car’,‘Motorcycle’,‘Person’]
I have many questions related to yolov5.
All the custom images are labelled using Roboflow.
question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos.
question 2:
Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)
Is it ok if place all my images like this
And should i need a ‘test’ folder?
question3: What is the ‘imgsz’ in detect.py? Is it downsampling the input source?
I’ve spent more than 3 weeks in yolo. I love it but i find some parts difficult to grasp. kindly provide suggestion for this questions. Thanks in advance.
"question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos."
As long as you've resized/normalized all of your images to be the same square size, then you should be fine. YOLO trains on square images. You can use a platform like Roboflow to process your images so they not only come out in the right structure (for your images and annotation files) but also resize them while generating your dataset so they are all the same size. http://roboflow.com/ - you just need to make a public workspace to upload your images to and you can use the platform free. Here's a video that covers custom training with YOLOv5: https://www.youtube.com/watch?v=x0ThXHbtqCQ
Roboflow's python package can also be used to extract your images programmatically: https://docs.roboflow.com/python
"Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)"
Yes that directory model is correct from training. Its what I have whenever I run YOLOv5 training too.
You do need a test folder if you want to run inference against the test folder images to learn more about your model's performance.
The 'imgsz' parameter in detect.py is for setting the height/width of the images for inference. You set it at the value you used for --img when you ran train.py.
For example: Resized images to 640 by 640 when generating your images for training? Use (640, 640) for the 'imgsz' parameter (that is the default value). And that would also mean you set --img to 640 when you ran train.py
detect.py parameters (YOLOv5 Github repo)
train.py parameters (YOLOv5 Github repo)
YOLOv5's Github: Tips for Best Training Results https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results
Roboflow's Model Production Tips: https://docs.roboflow.com/model-tips

"I would like to know where I can download the original data, which is used to train the model on official YOLO page"

"I would like to know where I can download the original data, which is used to train the model on official YOLO page, from and how I can add the "seal" data to the original data."
As you can see in the left side of the photo, “seals” were labeled as "dog" being analyzed by Keras-yolo v3.
Wanting to train “seal”, I referred to this site (https://sleepless-se.net/2019/06/21/how-to-train-keras%E2%88%92yolo3/) and train the label "seal”, however, the label “person" has disappeared as you can see in the right side of the photo.
I believe that this is due to the fact that “seals" were trained and replaced with trained data from the official yolo website (which includes label “person”).
In order to solve the problem, I would like to know where I can download the original data, which is used to train the model on official YOLO page, and how I can add the "seal" data to the original data before trained. So I beleive YOLO can learn "seals" without forgetting "person" and other labels.
Could you please tell me where the data is located?
Or, if there are any other ways, I would appreciate it if you could tell me.
enter image description here
YOLO uses the COCO dataset, here's a link:
https://cocodataset.org/#home

How to train CNN on LFW dataset?

I want to train a facial recognition CNN from scratch. I can write a Keras Sequential() model following popular architectures and copying their networks.
I wish to use the LFW dataset, however I am confused regarding the technical methodology. Do I have to crop each face to a tight-fitting box? That seems impractical, as the dataset has 13000+ faces.
Lastly, I know it's stupid, but all I have to do is preprocess the images (of course), then fit the model to these images? What's the exact procedure?
Your question is very open ended. Before preprocessing and fitting the model, you need to understand Object Detection. Once you understand what object detection you will get answer to your 1st question whether you are required to manually crop every 13000 image. The answer is no. However, you will have to draw bounding boxes around faces and assign label to images if they are not available in the training data.
Your second question is very vague . What do you mean by exact procedure? Is it the steps you need to do or how to do preprocessing and fitting of the model in python/or any other language? There are lots of references available on the internet about how to do preprocessing and model training for every specific problem. There are no universal steps which can be applied to any problem

How to find similar images on the RFCN model in the tensorflow library?

I am doing a deep learning report that specifically uses the tensorflow library to identify and target the subject, and I want to find the same image as the identifying image, what should I do?
I have a tutorial on identifying images similar to the CNN model but with RFCN (rfcn_resnet101_coco) I have not done it yet. May everyone help.
Thank you very much

Can i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn?

I'm new in the field of Deep Neural Network. There are various deep learning frameworks nearby. Notably Theano, Torch7, Caffe, and recently open sourced TensorFlow. I have tried out a couple of tutorials with TensorFlow provided on their site. Specifically the MNIST dataset. I guess this is the hello world of every deep learning framework out there. I also viewed tutorials from here. This one was explained in detail, but they do not provide hands on experience with any deep learning frameworks. So which framework should be better for beginners? I looked up similar questions asked on Quora. Some said that theano is tougher to learn but it gives more control, Caffe is easier, but it gives less control over the network. And nothing on Tensorflow, as it is new, but from what i've seen the documentation is not That well written, also it seems tougher to understand. So as a newbie what should i choose to learn?
Another question, As I said, MNIST is the hello world of every deep learning framework, and many neural networks can be found for recognizing MNIST dataset. So, if I use the same network to detect other dataset, say CIFAR-10 dataset, will it work?? Let's just say that i turn the CIFAR-10 dataset to grayscale images and convert it to same dimension as MNIST dataset. Will the model be invalid or fail to learn? or have bad accuracy or what?

Resources