How to train YOLO-Tensor flow own dataset - python-3.x

I am trying make an app which will detect traffic sign from video's frames.I am using yolo-tensor by following steps from https://github.com/thtrieu/darkflow .
I need to know how can I train this model with my data-set of images of traffice signs?

If you're using Darkflow on Windows then you need to make some small adjustments to how you use Darkflow. If cloning the code and using straight from the repository then you need to place python in front of the commands given as it is a python file.
e.g. python flow --imgdir sample_img/ --model cfg/yolo-tiny.cfg --load bin/yolo-tiny.weights --json
If you are installing using pip globally (not a bad idea) and you still want to use the flow utility from any directory just make sure you take the flow file with you.
To train, use the commands listed on the github page here: https://github.com/thtrieu/darkflow
If training on your own data you will need to take some extra steps as outlined here: https://github.com/thtrieu/darkflow#training-on-your-own-dataset
Your annotations need to be in the popular PASCAL VOC format which are a set of xml files including file information and the bounding box data.
Point your flow command at your new dataset and annotations to train.

The best data for you to practice is PASCAL VOC dataset. There are 2 folders you need to prepare for the training. 1 folder with images and 1 folder with xml files(annotation folder), 1 image will need 1 xml file (have the same name) content all the basic informations (object name, object position, ...). after that you only need to choose 1 predefine .cfg file in cfg folder and run the command follow:
flow --model cfg/yolo-new.cfg --train --dataset "path/to/images/folder" --annotation "path/to/annotation/folder"
Read more the options supported by darkflow to optimize more the training process.

After spending too much time on how to train custom data set for object detection
Prerequisite :
1:training environment : a system with at least 4gb GPU or you can use AWS / GCP pre-configured cloud machine with cuda 9 installation
2: ubuntu 16.04 os
3: images of the object you want to detect. images size should not be too much large it will create out of memory issue in dataset training
4: labelling tool many are available like LabelImg/ BBox-Label-Tool i used is also good one
I tried python project dataset-generator also but result of labelling using dataset generator was not efficient in real time scenarios
My suggestion for training environment is to use AWS machine rather than spend time in local installation of cuda and cudnn even though you are able to install cuda locally but if you are not having GPU >= 4 gb you will not be able to train many times it will break due to out of memory issue
solutions to train data set :
1: train ssd_mobilenet_v2 data set using tensorflow object detection api
this training output can be use on both android , ios platform
2: use darknet to train data set which required pascal VOC data format of labelling , for that labelIMG can do the job of labelling very good
3: retrain that data weights which comes as output from darknet with darkflow

Related

How to place the dataset for training Yolov5?

I’m currently working on object detection using yolov5. I trained a model with a custom dataset which has 3 classes = [‘Car’,‘Motorcycle’,‘Person’]
I have many questions related to yolov5.
All the custom images are labelled using Roboflow.
question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos.
question 2:
Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)
Is it ok if place all my images like this
And should i need a ‘test’ folder?
question3: What is the ‘imgsz’ in detect.py? Is it downsampling the input source?
I’ve spent more than 3 weeks in yolo. I love it but i find some parts difficult to grasp. kindly provide suggestion for this questions. Thanks in advance.
"question1 : As you can see from the table that my dataset has mix of images with different sizes. Will this be a problem in training? And also assume that i’ve trained the model and got ‘best.pt’. Will that model work efficiently in any dimensions of images/videos."
As long as you've resized/normalized all of your images to be the same square size, then you should be fine. YOLO trains on square images. You can use a platform like Roboflow to process your images so they not only come out in the right structure (for your images and annotation files) but also resize them while generating your dataset so they are all the same size. http://roboflow.com/ - you just need to make a public workspace to upload your images to and you can use the platform free. Here's a video that covers custom training with YOLOv5: https://www.youtube.com/watch?v=x0ThXHbtqCQ
Roboflow's python package can also be used to extract your images programmatically: https://docs.roboflow.com/python
"Is this directory model correct for training. Even i have ‘test’ directory but it seems that the directory is not at all used. The images in the ‘test’ folder is useless. ( I know that i’m asking dumb questions, please bare with me.)"
Yes that directory model is correct from training. Its what I have whenever I run YOLOv5 training too.
You do need a test folder if you want to run inference against the test folder images to learn more about your model's performance.
The 'imgsz' parameter in detect.py is for setting the height/width of the images for inference. You set it at the value you used for --img when you ran train.py.
For example: Resized images to 640 by 640 when generating your images for training? Use (640, 640) for the 'imgsz' parameter (that is the default value). And that would also mean you set --img to 640 when you ran train.py
detect.py parameters (YOLOv5 Github repo)
train.py parameters (YOLOv5 Github repo)
YOLOv5's Github: Tips for Best Training Results https://github.com/ultralytics/yolov5/wiki/Tips-for-Best-Training-Results
Roboflow's Model Production Tips: https://docs.roboflow.com/model-tips

Unable to Load FastText model

I am trying to load the FastText and save that as a model so that I can deploy that on production as the file size is 1.2 gb and wont be a good practice to use that on Prod.
Can anyone suggest an approach to save and load the model for production ("fasttext-wiki-news-subwords-300")
Loading the file using gensim.downloader api
You can use the library https://github.com/avidale/compress-fasttext, which is a wrapper around Gensim that can serve compressed versions of unsupervised FastText models.
The compressed versions can be orders of magnitude smaller (e.g. 20mb), with a tolerable loss in quality.
In order to have clarity over exactly what you're getting, in what format, I strongly recommend downloading things like sets of pretrained vectors from their original sources rather than the Gensim gensim.downloader convenience methods. (That API also, against most users' expectations & best packaging hygeine, will download & run arbitrary other code that's not part of Gensim's version-controlled source repository or its official PyPI package. See project issue #2283.)
For example, you could grab the raw vectors files direct from: https://fasttext.cc/docs/en/english-vectors.html
The tool from ~david-dale's answer looks interesting, for its radical compression, and if you can verify the compressed versions still work well for your purposes, it may be an ideal approach for memory-limited production deployments.
I would also consider:
A production machine with enough GB of RAM to load the full model may not be too costly, and with these sorts of vector-models, typical access patterns mean you essentially always want the full model in RAM, with no virtual-memory swapping at all. If your deployment is in a web server, there are some memory-mapping tricks possible that can help many processes share the same singly-loaded copy of the model (to avoid time- and memory-consumptive redundant reloads). See this answer for an approach that works with Word2Vec (though that may need some adaptation for FastText & recent Gensim versions).
If you don't need the Fasttext-specific subword-based synthesis, you can save the full-word vectors to a file in a simple format, then choose to only reload any small subset of the leading vectors (most common words) using the limit option of load_word2vec_format(). For exmaple:
# save only the word-vectors from a FastText model
ft_model.wv.save_word2vec_format('wvonly.txt', binary=False)
# ... then, later/elsewhere:
# load only 1st 50,000 word-vectors
wordvecs = KeyedVectors.load_word2vec_format('wvonly.txt', binary=False, limit=50000)

PyTorch loads old data when using tensorboard

In using tensorboard I have cleared my data directory and trained a new model but I am seeing images from an old model. Why is tensorboard loading old data, where is it being stored, and how do I remove it?
Tensorboard was built to have caches in case long training fails you have "bak"-like files that your board will generate visualizations from. Unfortunately, there is not a good practice to manually remove hidden temp files as they are not seen from displaying files including ones with the . (dot) prefix using bash. This memory is self-managed. For best practices, (1) have your tensorboard name be dynamic for results of each run: this can be done using datetime library in combination with an f-string in python so that the name of each run is separated by a time stamp. (This command be done right from python, say a jupyter notebook, if you import the subprocess package and run your bash command straight from the script.) (2) Additionally, you are strongly advised to save your logdir (log directory) separately from where you are running the code. These two practices together should solve all the problems related to tmp files erroneously populating new results.
How to "reset" tensorboard data after killing tensorflow instance

How do I deploy deep reinforcement learning neural network I coded in Pytorch to my website?

I have built and trained neural network in Pytorch and is ready for production to a website but how do I deploy it?
There is multiple ways to do it.
Yet first, I add a PS : I noticed that you were asking specifically about reinforcement learning after having posted my answer. Know that even though I have written this answer with a static neural network model in mind, I offer at the end of the post a solution to apply the ideas of this answer to reinforcement learning.
The different options :
From what I know, using PyTorch in production is not especially recommended for big scale production.
It is more common to convert the PyTorch model to the ONNX format (a format to make ai models interchangeable between frameworks). Here is a tutorial if you want to operate this way : https://github.com/onnx/tutorials/blob/master/tutorials/PytorchOnnxExport.ipynb .
Then run it using the ONNX runtime, Caffe2 (by Facebook) or with TensorFlow (by Google).
My answer is not going to explore those solutions (and i did not include tutorials to those options), because i recently did the same as you are trying to do (building a neural network architecture and wanting to deploy it, and also allowing users to train their neural network with the architecture), yet i did not converted my neural network for the following reasons :
ONNX is evolving quickly, yet is currently not supporting all the operations you can possibly do in a PyTorch model. So if you have a highly custom or specific neural network (like in my case), you might not be able to convert it to ONNX with ease. You might need to change your architecture, or maybe have to re-write a big part of it so that it can be converted to ONNX.
You will need to use one or two additional tools, where most tutorials are not going really deep, or not explaining the logic behind what they are doing.
Note that you might want to convert your neural network if you call your network billions or trillions of times a day, otherwise i think you can stick with PyTorch without issues even for production, and avoid the fallback of converting to ONNX.
First let's see how we can save a trained neural network, load it back trough the architecture of the network, and re-run the trained network.
Second how we can deploy a network to a website, and also how you can allow users to train their networks. It is likely not the best or most efficient way, yet it sure works.
Saving the network :
First, you clearly need to have imported pyTorch with "import torch". Inside your neural network file you should save the stateDict (basically a dictionary of the operations and weights of your network) of the network you want to re-use. You could for example only save the stateDict of the model with the smallest loss of your epoch.
# network is the variable containing your neural network class
network_stateDict = network.state_dict()
# Saving network stateDict to a variable
Then when you want to save the stateDict to a file that you can re-use later, use :
torch.save(network_stateDict, "folderPath/myStateDict.pt)
# Saving the stateDict variable to a file
# The pt extension is just a convention in the PyTorch community, ptr is also used a lot
Finally when you will want to re-use your trained network later on, you will need to :
network = myNetwork(1, 2, 3)
# Load the architecture of the network in a variable (use the same architecture
# and the same network parameters as the ones used to create the stateDict)
network.load_state_dict(torch.load(folderPath/myStateDict.pt))
# Loading the file containing the stateDict of the trained network into a format
# pyTorch can read with the torch.load function. Then load the stateDict inside the
# network architecture with the load_state_dict function, applied to your network
# object with network.load_state_dict .
network.eval()
# To make sure that the stateDict has correctly been loaded.
output = network(input_data)
# You should now be able to get output data from your
# trained network, by feeding it a single set of input data.
For more infos on saving models and the stateDicts : https://pytorch.org/tutorials/beginner/saving_loading_models.html
Deploying the network:
Now that we know how to save, restore and feed input data to a network, all that there is left to do is to deploy it so that this process is done trough the website.
You first need to get (likely from your user) the inputs that your neural network will use. I am not going to include any link, since there is so many different web frameworks.
You would then need to either use a framework (like Django) that allow you to do in Python the logic of :
import torch
network = myNetwork(1, 2, 3)
network.load_state_dict(torch.load(folderPath/myStateDict.pt))
network.eval()
input_data = data_fromMyUser
output = network(input_data)
Then you would collect the output to display it, or do whatever you want it.
If your framework is not giving you the ability to use Python, i think it would be a good idea to have a tiny Python script, to which you would give the input data, and which would return the output.
If you would like to give the possibility to the user to train networks, you should just give them the possibility to start the training of one, and then use torch.save on a stateDict object to save the stateDict to a file.
You or they could later use the trained networks (you should also need to create a little function to make sure that you do not override previous stateDict files).
How to apply it to reinforcement learning :
I did not deploy a reinforcement learning model, yet i can offer you some ideas and leads to explore to deploy one.
You could store and add the inputs that you get from your user to a file or a database, and write a little program, that say every 24 hours or every hour, re-run the neural network with the now bigger dataset.
You could then totally apply the suggestions in this answer, of running the network, saving the stateDict of the model and then changing the stateDict that your network is using in production.
This is a bit hacky, yet would allow you to save in a "static way" your trained networks, and still have them evolving and changing their stateDicts.
Conclusion
This is clearly not the most mass-scale production approach that you could employ, yet it is in my opinion the easiest to put in place.
You also know that the output that you will get, will be the actual output of your neural network, without any distorsions or errors in the values.
Have a great day !
save the trained model however you want (HD5 or with pickle)
write the program to handle in production by loading the trained model
deploy the program on distributed system for real time computation like on Apache storm, Flink, Alink, Apache Samoa etc..
if you feel you need to retrain the model depending on feedback then retrain the model on different cluster or parallel environment and observe the model accuracy if looks good then move the model to production (initial days you need to retrain multiple times and it will decrease time goes on if your model is designed in a good way)

Opening tensorboard saved scalars on windows

I am using Pytorch on Windows 10 OS, and having trouble understanding the correct use of Pytorch TensorboardX.
After instantiating a writer (writer = SummaryWriter()), and adding the value of the loss function (in every iteration) to it (write.add_scalar('data/loss_func', loss.data[0].item(), iteration)), I have a folder which contains the saved run.
My questions are:
1) As far as I understand, after the training is complete, I need to write in the terminal (which corresponds to the command line prompt in Windows):
tensorboard --logdir=C:\Users\DrJohn\Documents\runs
where this is the folder which contains the file created by tensorboardX. What is the valid syntax in Windows command prompt? I couldn't understand this from the online tutorials
2) Is it possible to see the learning during the training, by using tensorboardX? (i.e. to plot the learning curve during the iterations?)Is the only option is to see everything once the training ends?
Thanks in advance

Resources