NLTK MaxentClassifier train with negative cases

NLTK MaxentClassifier train with negative cases - nlp

I am new at nltk library and I try to teach my classifier some labels with my own corpus.
For this I have a file with IOB tags like this :
How O
do B-MYTag
you I-MYTag
know O
, O
where B-MYTag
to O
park O
? O
I do this by:
self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0)
and it works.
How to train my classifier with negative cases?
I would have similar file with IOB tags, and I would specified that this file is set wrong. (Negative weights)
How can I do this?
Example for negative case would be:
How B-MYTag
do O
you O
know O
, O
where B-MYTag
to O
park O
? O
After that, I expect to remember that How is probably not a MYTag...
The reason for this is, classifier to learn faster.
If I could just type the statements, program would process it and at the end ask me if I am satisfied with the result. If I am, this text would be added to the train_set, if not it would be added to the negative_train_set.
This way, it would be easier and faster to teach classifier the right stuff.

I'm guessing that you tried a classifier, saw some errors in the results, and want to feed back the wrong outputs as additional training input. There are learning algorithms that optimize on the basis of which answers are wrong or right (neural nets, Brill rules), but the MaxEnt classifier is not one of them. Classifiers that do work like this do all the work internally: They tag the training data, compare the result to the gold standard, adjust their weights or rules accordingly, and repeat again and again.
In short: You can't use incorrect outputs as a training dataset. The idea doesn't even fit the machine learning model, since training data is by assumption correct so incorrect inputs have probability zero. Focus on improving your classifier by using better features, more data, or a different engine.

Related

Training/Predicting with CNN / ResNet on all classes each iteration - concatenation of input data + Hungarian algorithm

So I've got a simple pytorch example of how to train a ResNet CNN to learn MNIST labeling from this link:
https://zablo.net/blog/post/using-resnet-for-mnist-in-pytorch-tutorial/index.html
It's working great, but I want to hack it a bit so that it does 2 things. First, instead of predicting digits, it predicts animal shapes/colors for a project I'm working on. That's already working quite well already and am happy with it.
Second, I'd like to hack the training (and possibly layers) so that predictions is done in parallel on multiple images at a time. In the MNIST example, basically prediction (or output) would be done for an image that has 10 digits at a time concatenated by me. For clarity, each 10-image input will have the digits 0-9 appearing only once each. The key here is that each of the 10 digit gets a unique class/label from the CNN/ResNet and each class gets assigned exactly once. And that digits that have high confidence will prevent other digits with lower confidence from using that label (a Hungarian algorithm type of approach).
So in my use case I want to train on concatenated images (not single images) as in Fig A below and force the classifier to learn to predict the best unique label for each of the concatenated images and do this all at once. Such an approach should outperform single image classification - and it's particularly useful for my animal classification because otherwise the CNN can sometimes return the same ID for multiple animals which is impossible in my application.
I can already predict in series as in Fig B below. And indeed looking at the confidence of each prediction I am able to implement a Hungarian-algorithm like approach post-prediction to assign the best (most confident) unique IDs in each batch of 4 animals. But this doesn't always work and I'm wondering if ResNet can try and learn the greedy Hungarian assignment as well.
In particular, it's not clear that implementing A simply requires augmenting the data input and labels in the training set will do it automatically - because I don't know how to penalize or dissalow returning the same label twice for each group of images. So for now I can generate these training datasets like this:
print (train_loader.dataset.data.shape)
print (train_loader.dataset.targets.shape)
torch.Size([60000, 28, 28])
torch.Size([60000])
And I guess I would want the targets to be [60000, 10]. And each input image would be [1, 28, 28, 10]? But I'm not sure what the correct approach would be.
Any advice or available links?
I think this is a specific type of training, but I forgot the name.

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's unidirectional nature which for me didn't seem to predict within context. So I was wondering whether there is a way, to calculate the above said using BERT since it's Bidirectional.
I've found this post relatable, which I randomly saw the other day but didn't see any answer which would be useful for me as well.
Hope I will be able to receive ideas or a solution for this. Any help is appreciated. Thank you.

BERT is trained as a masked language model, i.e., it is trained to predict tokens that were replaced by a [MASK] token.
from transformers import AutoTokenizer, BertForMaskedLM
tok = AutoTokenizer.from_pretrained("bert-base-cased")
bert = BertForMaskedLM.from_pretrained("bert-base-cased")
input_idx = tok.encode(f"The {tok.mask_token} were the best rock band ever.")
logits = bert(torch.tensor([input_idx]))[0]
prediction = logits[0].argmax(dim=1)
print(tok.convert_ids_to_tokens(prediction[2].numpy().tolist()))
It prints token no. 11581 which is:
Beatles
To get a normalized probability distribution over BERT's vocabulary, you can normalize the logits using the softmax function, i.e., F.softmax(logits, dim=1), (assuming standart import torch.nn.fucntional as F).
The tricky thing is that words might be split into multiple subwords. You can simulate that by adding multiple [MASK] tokens, but then you have a problem with how to compare the scores of prediction so different lengths reliably. I would probably average the probabilities, but maybe there is a better way.

How to make OneClassSVM model more accurate? (Scikit-learn)

I have been attempting to classify an author using multiple texts written by this author, which I would then use to find similarities in other texts to identify that author in the test group.
I have been successful with some of the predictions, however I am still getting results where it failed to predict the author.
I have done pre-processing the texts beforehand with stemming, tokenizing, stop words, removing punctuation etc. in an attempt to make it more accurate.
I am unfamiliar with how exactly the OneClassSVM parameters work. What parameters could I use to best suit my problem and how could I make my model more accurate in it's predictions?
Here is what I have so far:
vectorizer = TfidfVectorizer()
author_corpus = self.pre_process(author_corpus)
test_corpus = self.pre_process(test_corpus)
train = author_corpus
test = test_corpus
train_vectors = vectorizer.fit_transform(train)
test_vectors = vectorizer.transform(test)
model = OneClassSVM(kernel='linear', gamma='auto', nu=0.01)
model.fit(train_vectors)
test_predictions = model.predict(test_vectors)
print(test_predictions[:10])
print(model.score_samples(test_vectors)[:10])

You can use a SVM, but deep learning is really well-suited for this. I did a Kaggle competition with classifying documents that was amazing for this.
If you don't think you have a big enough dataset, you might want to just take a text classifier model and re-train the last layer on your author, then fine-tune the rest of the model.

I’ve heard positive things about Andrew Ng’s deep learning class on Coursera. I learned all I know about AI using the Microsoft Professional Certification in AI on edx.

Text representations : How to differentiate between strings of similar topic but opposite polarities?

I have been doing clustering of a certain corpus, and obtaining results that group sentences together by obtaining their tf-idf, checking similarity weights > a certain threshold value from the gensim model.
tfidf_dic = DocSim.get_tf_idf()
ds = DocSim(model,stopwords=stopwords, tfidf_dict=tfidf_dic)
sim_scores = ds.calculate_similarity(source_doc, target_docs)
The problem is that despite putting high threshold values, sentences of similar topics but opposite polarities get clustered together as such:
Here is an example of the similarity weights obtained between "don't like it" & "i like it"
Are there any other methods, libraries or alternative models that can differentiate the polarities effectively by assigning them very low similarities or opposite vectors?
This is so that the outputs "i like it" and "dont like it" are in separate clusters.
PS: Pardon me if there are any conceptual errors as I am rather new to NLP. Thank you in advance!

The problem is in how you represent your documents. Tf-idf is good for representing long documents where keywords play a more important role. Here, it is probably the idf part of tf-idf that disregards the polarity because negative particles like "no" or "not" will appear in most documents and they will always receive a low weight.
I would recommend trying some neural embeddings that might capture the polarity. If you want to keep using Gensim, you can try doc2vec but you would need quite a lot of training data for that. If you don't have much data to estimate the representation, I would use some pre-trained embeddings.
Even averaging word embeddings (you can load FastText embeddings in Gensim). Alternatively, if you want a stronger model, you can try BERT or another large pre-trained model from the Transformers package.

Unfortunately, simple text representations based merely on the sets-of-words don't distinguish such grammar-driven reversals-of-meaning very well.
The method needs to be sensitive to meaningful phrases, and the hierarchical, grammar-driven inter-word dependencies, to model that.
Deeper neural networks using convolutional/recurrent techniques do better, or methods which tree-model sentence-structure.
For ideas see for example...
"Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank"
...or a more recent summary presentation...
"Representations for Language: From Word Embeddings to Sentence Meanings"

SVM training to infer point position with respect to two other points in a video

I would like to train a SVM with opencv c++ so as to infer the position of a point in the image with respect to two other points to which the wanted point is related.
Basically I have the trajectories of the three points during a whole video and I would like to use these trajectories as training data of the SVM.
I'm new to machine learning techniques and after some readings I think I've understood that SVM will return a boolean result( true if some conditions are satisfied at the same time, false if not). In my case I need a position in the image as result.
I'm not sure how I should organize the training set, I was thinking to do something like that:
T1 T2 T3 label=1
where T1 T2 and T3 contain all the points belonging to the three trajectories that I know as correct;
T1 T2 T4 label=-1
where T1 and T2 are the same as before while T4 contains random points that don't lie on the trajectory T3.
Once I have trained the SVM with different trajectories from different videos I would like to pass three points: P1(x,y) and P2(x,y) corresponding to T1 and T2 at time t and a random point P(x,y), and the SVM should predict if the random point is in the wanted position or not.
anybody could explain me if this approach is wrong and why?
Thanks

This approach is wrong mostly because yout problem is not a binary classification problem. It is rather a regression problem. Your desired output is a value, not a binary number, so training SVM, or any other binary classifier is a bad idea. Classification problem is a search for a mapping from your input data into some finite (and small) set of possible labels (like "true" and "false", or "cat", "dog" or "face"). Regression on the other hand is a seek for the mapping from your input data into (possibly multi dimensional) real-valued space, so instead of labels - your are looking for actual values. In your case - you seek for coordinates, which are (as I suppose) two real numbers. If you model your problem as a binary classification then:
There is no sensible way of creating a training set (you have only "positive" examples, you can generate "negative" ones by taking points which are not correct, but most of them are, it would be better to train a one-class SVM, but as mentioned before - it is not a classification problem at all)
Actual testing would be of horrible complexity, as you have to ask for each point "is it a correct answer?"
Instead, you should train any regression model with data of form
(point_1, point_2) -> point_3
so model can find a function which maps your two input points onto one output point. There are many possible models for this task:
linear regression
neural network
SVR (support vector reggresion)
In short:
your output is a label, discrete value from the finite set -> classifier
your output is a continuous value -> reggresion model
If it is still not clear for you, I suggest a good video from the Stanford University:
http://www.youtube.com/watch?v=5RLRKkzYWuQ

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string