I am learning to implement a hand gesture recognition project. For this, I have gone through several tutorials where they use color information, background subtraction, various object segmentation techniques.
However, one that I would like to use is a method using cascading classifiers however I dont have much understanding in this approach. I have read several text and papers and I understand its theory however, I still dont understand what are good images to train the cascading classifer on. Is it better to train it on natural color images or images with hand gestures processed with canny edge detection or some other way.
Also, is there any method that uses online training and testing methods similar to openTLD but where the steps are explained. The openCV documentation for 2.3-2.4.3 are incomplete with respect to the machine learning and object recognition and tracking except for the code available at: http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html
I know this is a long question but I wanted to explain my problem thoroughly. It would help me to understand the concept better than just to use online code.
Sincere thanks in advance!
if you think about haar classifier, a good tutorial is here
Related
I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model
I'm trying to make a OpenCV program in Python 3 to detect the faces of my friends. I've seen that one can train a Cascade Classifier using OpenCV to specify a certain type of object. However, it isn't clear whether that could create a classifier refined enough to pick only my friends' faces out of a large sample size, or whether this is something I could achieve without making my own Cascade Classifier. Can anyone help?
Cascade classifiers usually are built for face detection. You are trying to solve a different problem, face recognition.
Deep learning is a common framework nowdays, but other models do exist. http://www.face-rec.org/algorithms/ makes a very good job at presenting the main algorithms.
This presents an interesting implementation in OpenCV.
I am doing a License Plate Recognition system using python. I browsed through the net and I found many people have done the recognition of characters in the license plate using kNN algorithm.
Can anyone explain how we predict the characters in the License Plate using kNN ?
Is there any other algorithm or method that can do the prediction better ?
I am referring to this Git repo https://github.com/MicrocontrollersAndMore/OpenCV_3_License_Plate_Recognition_Python
Well, I did this 5 years ago. I will suggest you that, maybe right now is so much better to do this using ML Classifier Models, but if you want to use OpenCV. OpenCV has a pretty cool way to make ANPR using an OCR.
When I did it, I used a RasberryPi for processing and capture images and with c++ run openCV in another computer. I recommend you check this repo and if you're interested look for the book reference there. I hope my answer helps you to find your solution.
https://github.com/MasteringOpenCV/code.
I am new to computer vision, and now I am do some research on object detection. I have read papers about faster RCNN and RFCN, also read YOLO. It seems the biggest problem is the speed? And all of them use image data data only. Are there any models that combines text and image data? Which means we can use the information from text to help detection when the training data is small. For example, when the training data is small, the model cannot tell dogs and cats clearly, but the model could tell there is a bone near that object, and the model gets some information from text that the object near a bone is most likely a dog, thus the model now could tell what the object is. Does this kind of algorithm exist? I haven't found them, hope you could help me. Thanks a lot.
It seems you have mostly referred to research on Deep Networks for Object Detection. Prior to the success of deep networks, researchers were looking to to the possibility of using text with image features to implement ideas similar to yours. You might want to refer to papers from ACM Multimedia and IEEE TMM, especially those before 2014.
The problem was that those approaches could not perform as well as the simplest of the deep networks that use only images. There is some work on combining both images and text, such as this paper. I am sure at least some researchers are already working on this.
Objective: a node.js function that can be passed a news article (title, text, tags, etc.) and will return a category for that article ("Technology", "Fashion", "Food", etc.)
I'm not picky about exactly what categories are returned, as long as the list of possible results is finite and reasonable (10-50).
There are Web APIs that do this (eg, alchemy), but I'd prefer not to incur the extra cost (both in terms of external HTTP requests and also $$) if possible.
I've had a look at the node module "natural". I'm a bit new to NLP, but it seems like maybe I could achieve this by training a BayesClassifier on a reasonable word list. Does this seem like a good/logical approach? Can you think of anything better?
I don't know if you are still looking for an answer, but let me put my two cents for anyone who happens to come back to this question.
Having worked in NLP i would suggest you look into the following approach to solve the problem.
Don't look for a single package solution. There are great packages out there, no doubt for lots of things. But when it comes to active research areas like NLP, ML and optimization, the tools tend to be atleast 3 or 4 iterations behind whats there is academia.
Coming to the core problem. What you want to achieve is text classification.
The simplest way to achieve this would be an SVM multiclass classifier.
Simplest yes, but also with very very (see the double stress) reasonable classification accuracy, runtime performance and ease of use.
The thing which you would need to work on would be the feature set used to represent your news article/text/tag. You could use a bag of words model. add named entities as additional features. You can use article location/time as features. (though for a simple category classification this might not give you much improvement).
The bottom line is. SVM works great. they have multiple implementations. and during runtime you don't really need much ML machinery.
Feature engineering on the other hand is very task specific. But given some basic set of features and a good labelled data you can train a very decent classifier.
here are some resources for you.
http://svmlight.joachims.org/
SVM multiclass is what you would be interested in.
And here is a tutorial by SVM zen himself!
http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
I don't know about the stability of this but from the code its a binary classifier SVM. which means if you have a known set of tags of size N you want to classify the text into, you will have to train N binary SVM classifiers. One each for the N category tags.
Hope this helps.