What are the obstacles in today's object detection? - nlp

I am new to computer vision, and now I am do some research on object detection. I have read papers about faster RCNN and RFCN, also read YOLO. It seems the biggest problem is the speed? And all of them use image data data only. Are there any models that combines text and image data? Which means we can use the information from text to help detection when the training data is small. For example, when the training data is small, the model cannot tell dogs and cats clearly, but the model could tell there is a bone near that object, and the model gets some information from text that the object near a bone is most likely a dog, thus the model now could tell what the object is. Does this kind of algorithm exist? I haven't found them, hope you could help me. Thanks a lot.

It seems you have mostly referred to research on Deep Networks for Object Detection. Prior to the success of deep networks, researchers were looking to to the possibility of using text with image features to implement ideas similar to yours. You might want to refer to papers from ACM Multimedia and IEEE TMM, especially those before 2014.
The problem was that those approaches could not perform as well as the simplest of the deep networks that use only images. There is some work on combining both images and text, such as this paper. I am sure at least some researchers are already working on this.

Related

Neuroimage MRI scan CNN model preparation

I would like to know a couple of things to clear my confusion. I want to work on a medical neuroimage MRI image scans dataset from the ADNI database.
Each Alzheimer's Disease (AD) MRI image scan has multiple slices.
Do I have to separate each image scan slice and label each of them as AD or combine all image scan slices as a one-image scan and label it for classification?
Most of the medical neuroimage DICOM, NfINT, NII, etc., format. Is it mandatory to convert them to png or jpg for the CNN network model or keep it in NfNIT or nii format?
I have read several existing papers on neuroimaging regarding Alzheimer's disease but did not find the above question answer. Even I have sent an email to the research paper writer in reply; I got they can not help on this as they are very busy and mention their sincere apology for that.
It will be very helpful if anyone has the answer to clear my confusion and thought.
Thank you.
You can train with NIfTI, using, for example, TorchIO. There's no need to separate each slice, you can use the 3D image as is.
You can find some examples in the documentation.
Disclaimer: I'm the main developer of TorchIO.

Correcting the names in nlp

I have a dataset where lot of names are written like man1sh instead of manish, vikas as v1kas.
How can one correct these names in nlp?
Any help is appreciated.
Try the Deep Neural Network based spell correction https://medium.com/#majortal/deep-spelling-9ffef96a24f6 this method is the state of the art method at the moment. Here is the code https://github.com/MajorTal/DeepSpell and some one already made an improvement over it https://hackernoon.com/improving-deepspell-code-bdaab1c5fb7e.I am not able to find the paper but there is also a paper published that does character level deep neural network for edit distance with good results and a public dataset.
For the above methods, like for all Machine Learning solutions, you need data for training. If you don't have data for your case then the old simple edit distance methods http://norvig.com/spell-correct.html are the only way.

What's the disadvantage of LDA for short texts?

I am trying to understand why Latent Dirichlet Allocation(LDA) performs poorly in short text environments like Twitter. I've read the paper 'A biterm topic model for short text', however, I still do not understand "the sparsity of word co-occurrences".
From my point of view, the generation part of LDA is reasonable for any kind of texts, but what causes bad results in short texts is the sampling procedure. I am guessing LDA samples a topic for a word based on two parts: (1) topics of other words in the same doc (2) topic assignments of other occurrences of this word. Since the (1) part of a short text cannot reflect the true distribution of it, that causes a poor topic assignment for each word.
If you have found this question, please feel free to post your idea and help me understand this.
Probabilistic models such as LDA exploit statistical inference to discover latent patterns of data. In short, they infer model parameters from observations. For instance, there is a black box containing many balls with different colors. You draw some balls out from the box and then infer the distributions of colors of the balls. That is a typical process of statistical inference. The accuracy of statistical inference depends on the number of your observations.
Now consider the problem of LDA over short texts. LDA models a document as a mixture of topics, and then each word is drawn from one of its topic. You can imagine a black box contains tons of words generated from such a model. Now you have seen a short document with only a few of words. The observations is obvious too few to infer the parameters. It is the data sparsity problem we mentioned.
Actually, besides the the lack of observations, the problem also comes from the over-complexity of the model. Usually, a more flexible model requires more observations to infer. The Biterm Topic Model tries to making topic inference easier by reducing the model complexity. First, it models the whole corpus as a mixture of topics. Since inferring the topic mixture over the corpus is easier than inferring the topic mixture over a short document. Second, it supposes each biterm is draw from a topic. Inferring the topic of a biterm is also easier than inferring the topic of a single word in LDA, since more context is added.
I hope the explanation make sense for you. Thanks for mentioning our paper.
Doing a bit of digging, Hong and Davison (2010) showed up as a great example of these not working well on classifying tweets. Unfortunately, they don't really give much insight into why it doesn't work.
I suspect there's two reasons LDA doesn't work well for short documents.
First of all, when working on smaller documents, the extra topic layer doesn't add anything to the classification, and what doesn't help probably hurts. If you have really short documents, like tweets, it's really hard to break documents into topics. There isn't much room for anything but one topic in a tweet, after all. Since the topic layer can't contribute much to the classification, it makes room for error to arise in the system.
Second, linguistically, Twitter users prefer to strip off "unnecessary fluff" when tweeting. When working with full documents, there are features --words, word collocations, etc.--that are probably specific, common, and often repeated within a genre. When tweeting, though, these common elements get dropped first because what's interesting, new, and more perplex is what remains when the fluff is removed.
For example, let's look at my own tweets because I believe in shameless self-promotion:
Progressbar.py is a fun little package, though I don't get
a chance to use it too often. it even does ETAs for you
https://pypi.python.org/pypi/progressbar …
From a capitalist perspective, the social sciences exist so
idiot engineers don't waste money on building **** no one needs.
Abstract enough to be reusable, specific enough to be useful.
The first is about Python. If you're parsing the URLs, you'll get that--and the .py would give it to you too. However, in a more expressive medium, I'd probably have put the word "Python" in somewhere. The second is programming related as well, but a bit more on the business end. Not once does it even mention anything specific to programming, though. The last one too is programming related, but ties more into the art of programming, expressing a sort of double-bind programmers face while coding. It is as difficult as the second, feature-wise.
In both of those last two examples, had I not been writing a microblog post, these would have immediately been followed up with examples that would have been very useful to a classifier, or themselves included more data. Twitter doesn't have room for that kind of stuff, though, and the content that would typify the genre a tweet belongs to is stripped out.
So, in the end, we have two problems. The length is a problem for LDA, because the topics add an extra, unnecessary degree of freedom, and the tweets are a problem for any classifier, because features typically useful in classification get selectively removed by the authors.

News Article Categorization (Subject / Entity Analysis via NLP?); Preferably in Node.js

Objective: a node.js function that can be passed a news article (title, text, tags, etc.) and will return a category for that article ("Technology", "Fashion", "Food", etc.)
I'm not picky about exactly what categories are returned, as long as the list of possible results is finite and reasonable (10-50).
There are Web APIs that do this (eg, alchemy), but I'd prefer not to incur the extra cost (both in terms of external HTTP requests and also $$) if possible.
I've had a look at the node module "natural". I'm a bit new to NLP, but it seems like maybe I could achieve this by training a BayesClassifier on a reasonable word list. Does this seem like a good/logical approach? Can you think of anything better?
I don't know if you are still looking for an answer, but let me put my two cents for anyone who happens to come back to this question.
Having worked in NLP i would suggest you look into the following approach to solve the problem.
Don't look for a single package solution. There are great packages out there, no doubt for lots of things. But when it comes to active research areas like NLP, ML and optimization, the tools tend to be atleast 3 or 4 iterations behind whats there is academia.
Coming to the core problem. What you want to achieve is text classification.
The simplest way to achieve this would be an SVM multiclass classifier.
Simplest yes, but also with very very (see the double stress) reasonable classification accuracy, runtime performance and ease of use.
The thing which you would need to work on would be the feature set used to represent your news article/text/tag. You could use a bag of words model. add named entities as additional features. You can use article location/time as features. (though for a simple category classification this might not give you much improvement).
The bottom line is. SVM works great. they have multiple implementations. and during runtime you don't really need much ML machinery.
Feature engineering on the other hand is very task specific. But given some basic set of features and a good labelled data you can train a very decent classifier.
here are some resources for you.
http://svmlight.joachims.org/
SVM multiclass is what you would be interested in.
And here is a tutorial by SVM zen himself!
http://www.cs.cornell.edu/People/tj/publications/joachims_98a.pdf
I don't know about the stability of this but from the code its a binary classifier SVM. which means if you have a known set of tags of size N you want to classify the text into, you will have to train N binary SVM classifiers. One each for the N category tags.
Hope this helps.

OpenCV 2.4.3 with Visual C++ express cascading classifiers images query

I am learning to implement a hand gesture recognition project. For this, I have gone through several tutorials where they use color information, background subtraction, various object segmentation techniques.
However, one that I would like to use is a method using cascading classifiers however I dont have much understanding in this approach. I have read several text and papers and I understand its theory however, I still dont understand what are good images to train the cascading classifer on. Is it better to train it on natural color images or images with hand gestures processed with canny edge detection or some other way.
Also, is there any method that uses online training and testing methods similar to openTLD but where the steps are explained. The openCV documentation for 2.3-2.4.3 are incomplete with respect to the machine learning and object recognition and tracking except for the code available at: http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html
I know this is a long question but I wanted to explain my problem thoroughly. It would help me to understand the concept better than just to use online code.
Sincere thanks in advance!
if you think about haar classifier, a good tutorial is here

Resources