Using tensorflow object detection for either or detection - python-3.x
I have used Tensorflow object detection for quite awhile now. I am more of a user, I dont really know how it works. I am wondering is it possible to train it to recognize an object is something and not something? For example, I want to detect cracks on the tiles. Can i use object detection to do so where i show an image of a tile and it can tell me if there is a crack (and also show the location), or it will tell me if there is no crack on the tile?
I have tried to train using pictures with and without defect, using 2 classes (1 for defect and 1 for no defect). But the results keep showing both (if the picture have defect) in 1 picture. Is there a way to show only the one with defect?
Basically i would like to do defect checking. This is a simplistic case of 1 defect. but the actual case will have a few defects.
Thank you.
In case you're only expecting input images of tiles, either with defects or not, you don't need a class for no defect.
The API adds a background class for everything which is not the other classes.
So you simply need to state one class - defect, and tiles which are not detected as such are not defected.
So in your training set - simply give bounding boxes of defects, and no bounding box in case of no defect, and then your model should learn to detect the defects as mentioned above.
Related
Small object detection using deep learning
I have to count the number of checked and unchecked boxes in a paper sheet.The size of the checkbox is very small.Which will be the best object detection algorithms for this or any other approach.I have some images on which I can do customized training.Note my task is only object detection & recognition not localization.One approach is to extract the portion of the image which is containing the check boxes & apply contours to classify which is checked or unchecked.My question how I will extract that portion of an image which is containing scanned document or sheet.
I think you have to use Convolutional Neural Network it is the best object detection algorithm that I have ever used, although the matter of small objects this algorithm is so good at identifying small hidden patterns so, I think its work best for you, just try it.
create 3d model of an equipment from 2d images
GOAL: I have to create a 3d model of a machine part. I have about 25 images of the same thing taken from different angles. Progress: I am able to extract the coordinates for a label that is on the machine for most of the images. Problem: but I have no idea how to proceed. I have read a bit about aero-triangulation, but I couldn't figure out how to implement it. I would really appreciate it, if you could guide me in the right direction. It would be really helpful, if you could provide your solutions using python and opencv. Edit: sorry but I cannot upload the code for this one as it is confidential. don't blame me please I am just an intern. Although I can tell that I cropped a template of the label from an image and then used Sift to match that template on all the images to get the coordinates of the label.
If you want to implement things yourself with OpenCV, I would command looking at SIFT (or SURF) features, RANSAC and the epipolar constraint. I believe the OpenCV cookbook describe those. Warning: math involved. And I don't know how to do dense mapping in OpenCV. I know the GUI program "VisualSFM" that can automatically recreate 3D model from images. It uses SFM and other command line utilities behind the scenes. Since everything is opensource, you could create a python wrapper around the actual libraries (I found https://github.com/mapillary/OpenSfM asking Google). VisualSFM prints the command it calls, so a hacky way could be to call the same commands from python. If it is a simple shape and you don't want to automate it, it could be faster to model it yourself (and the result could look better). In 1.5 week I managed to learn the basics of blender and to model a guitar necklace: https://youtu.be/BCGKsh51TNA . And I would now be able to do it in less than 1h. How long are you ready to invest to find a solution with OpenCV?
Haar Cascade Training for Parts of a Known Object
I am working on a project where I am trying to extract key features of a bicycle from an overall image. I am currently investigating the use of Haar Cascades to train my computer to find certain regions of interest from said bicycles, e.g. the pedal-sprocket, seat, handle-bars. Then I will extract local features from these sub regions accordingly. The purpose is to create an overall descriptor of a particular bicycle so I can try to match it throughout a sample set of images of other bicycles. My questions are as follows: Can I train a Haar classifier to look for a sub-component of an overall object? For example, say I want to look for the handlebars on a bicycle. How should I design the training? Should I detect the bicycle first, and then detect the handlebars within the overall bicycle region (Similar to detecting the eyes within a face in terms of facial recognition)? Since I know beforehand that all my images will contain a picture of a bicycle, I'm not sure if there is any point in detecting the bicycle to begin with and then looking for sub components. In terms of training a Haar cascade and creating an XML that I can use (in OpenCV 3.1 and Python 3.6), could I just set up the positive and negative images with pictures of bicycles and no bicycles respectively? With the difference being that I isolate the particular area of interest by cropping the image appropriately each time (e.g. where the handlebars are)? Also open to any recommendations about how others might solve the general problem of extracting key features for object matching. This is just one approach I am currently investigating. Thanks!
image processing / computer vision - body part recognition - posture ( standing/ sitting) - supervised learning
I'm after advice from the image processing / computer vision experts here. Trying to develop a robust, scaled algorithm to extract dimensions of a person's body. For example, his upper-body width. problems: images without faces person sitting multiple faces person is holding something , thus covering part of his body ways of doing this: * haar - unsupervised , a lot of training date of different body parts and hope for the best. * HOG - 1. face detection -> afterwards using HOG and assumptions along the way with different filters Note: all images will be scaled to the same size. Obviously computation time for the second approach MIGHT be more demanding (doubtful though) but for the 1st method, training is almost impossible and would take much more time.. P.S. I know there's a paper about using pedestrian data.. but that would work for full body + standing, not for sitting. I'm open to hearing all your ideas..ask away if you have anything to add. Implementation would be done, hopefully via node.js Thank you
DPM is widely used in computer vision for object detection and it tends to work in the case of occlusion and also when only part of an object is present in the image. The grammar model for humans is very good and has state of the art results on standard datasets. It takes around a second to perform detection on a single image, its matlab code, so its expected to be slow. http://www.cs.berkeley.edu/~rbg/latent/
I need a function that describes a set of sequences of zeros and ones?
I have multiple sets with a variable number of sequences. Each sequence is made of 64 numbers that are either 0 or 1 like so: Set A sequence 1: 0,0,0,0,0,0,1,1,0,0,0,0,1,1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0 sequence 2: 0,0,0,0,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0 sequence 3: 0,0,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0 ... Set B sequence1: 0,0,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1 sequence2: 0,0,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0 ... I would like to find a mathematical function that describes all possible sequences in the set, maybe even predict more and that does not contain the sequences in the other sets. I need this because I am trying to recognize different gestures in a mobile app based on the cells in a grid that have been touched (1 touch/ 0 no touch). The sets represent each gesture and the sequences a limited sample of variations in each gesture. Ideally the function describing the sequences in a set would allow me to test user touches against it to determine which set/gesture is part of. I searched for a solution, either using Excel or Mathematica, but being very ignorant about both and mathematics in general I am looking for the direction of an expert. Suggestions for basic documentation on the subject is also welcome.
It looks as if you are trying to treat what is essentially 2D data in 1D. For example, let s1 represent the first sequence in set A in your question. Then the command ArrayPlot[Partition[s1, 8]] produces this picture: The other sequences in the same set produce similar plots. One of the sequences from the second set produces, in response to the same operations, the picture: I don't know what sort of mathematical function you would like to define to describe these pictures, but I'm not sure that you need to if your objective is to recognise user gestures. You could do something much simpler, such as calculate the 'average' picture for each of your gestures. One way to do this would be to calculate the average value for each of the 64 pixels in each of the pictures. Perhaps there are 6 sequences in your set A describing gesture A. Sum the sequences element-by-element. You will now have a sequence with values ranging from 0 to 6. Divide each element by 6. Now each element represents a sort of probability that a new gesture, one you are trying to recognise, will touch that pixel. Repeat this for all the sets of sequences representing your set of gestures. To recognise a user gesture, simply compute the difference between the sequence representing the gesture and each of the sequences representing the 'average' gestures. The smallest (absolute) difference will direct you to the gesture the user made. I don't expect that this will be entirely foolproof, it may well result in some user gestures being ambiguous or not recognisable, and you may want to try something more sophisticated. But I think this approach is simple and probably adequate to get you started.
In Mathematica the following expression will enumerate all the possible combinations of {0,1} of length 64. Tuples[{1, 0}, {64}] But there are 2^62 or 18446744073709551616 of them, so I'm not sure what use that will be to you. Maybe you just wanted the unique sequences contained in each set, in that case all you need is the Mathematica Union[] function applied to the set. If you have a the sets grouped together in a list in Mathematica, say mySets, then you can apply the Union operator to every set in the list my using the map operator. Union/#mySets If you want to do some type of prediction a little more information might be useful. Thanks you for the clarifications. Machine Learning The task you want to solve falls under the disciplines known by a variety of names, but probably most commonly as Machine Learning or Pattern Recognition and if you know which examples represent the same gestures, your case would be known as supervised learning. Question: In your case do you know which gesture each example represents ? You have a series of examples for which you know a label ( the form of gesture it is ) from which you want to train a model and use that model to label an unseen example to one of a finite set of classes. In your case, one of a number of gestures. This is typically known as classification. Learning Resources There is a very extensive background of research on this topic, but a popular introduction to the subject is machine learning by Christopher Bishop. Stanford have a series of machine learning video lectures Standford ML available on the web. Accuracy You might want to consider how you will determine the accuracy of your system at predicting the type of gesture for an unseen example. Typically you train the model using some of your examples and then test its performance using examples the model has not seen. The two of the most common methods used to do this are 10 fold Cross Validation or repeated 50/50 holdout. Having a measure of accuracy enables you to compare one method against another to see which is superior. Have you thought about what level of accuracy you require in your task, is 70% accuracy enough, 85%, 99% or better? Machine learning methods are typically quite sensitive to the specific type of data you have and the amount of examples you have to train the system with, the more examples, generally the better the performance. You could try the method suggested above and compare it against a variety of well proven methods, amongst which would be Random Forests, support vector machines and Neural Networks. All of which and many more are available to download in a variety of free toolboxes. Toolboxes Mathematica is a wonderful system, is infinitely flexible and my favourite environment, but out of the box it doesn't have a great deal of support for machine learning. I suspect you will make a great deal of progress more quickly by using a custom toolbox designed for machine learning. Two of the most popular free toolboxes are WEKA and R both support more than 50 different methods for solving your task along with methods for measuring the accuracy of the solutions. With just a little data reformatting, you can convert your gestures to a simple file format called ARFF, load them into WEKA or R and experiment with dozens of different algorithms to see how each performs on your data. The explorer tool in WEKA is definitely the easiest to use, requiring little more than a few mouse clicks and typing some parameters to get started. Once you have an idea of how well the established methods perform on your data you have a good starting point to compare a customised approach against should they fail to meet your criteria. Handwritten Digit Recognition Your problem is similar to a very well researched machine learning problem known as hand written digit recognition. The methods that work well on this public data set of handwritten digits are likely to work well on your gestures.