I'm using official FastText python library (v0.9.2) for intents classification.
import fasttext
model = fasttext.train_supervised(input='./test.txt',
loss='softmax',
dim=200,
bucket=2000000,
epoch=25,
lr=1.0)
Where test.txt contains just one sample file like:
__label__greetings hi
and predict two utterances the results are:
print(model.words)
print('hi', model.predict('hi'))
print('bye', model.predict('bye'))
app_1 | ['hi']
app_1 | hi (('__label__greetings',), array([1.00001001]))
app_1 | bye ((), array([], dtype=float64))
This is my expected output, meanwhile if a set two samples for the same label:
__label__greetings hi
__label__greetings hello
The result for OOV is not correct.
app_1 | ['hi', '</s>', 'hello']
app_1 | hi (('__label__greetings',), array([1.00001001]))
app_1 | bye (('__label__greetings',), array([1.00001001]))
I understand that the problem is with </s> token, maybe \n in text file?, and when there isn't any word on vocabulary the text is replaced by </s>. There are any train option or way to skip this behavior?
Thanks!
In addition to gojomo's answer, we can say that your training dataset is absolutely too small.
If you don't have a significant annotated dataset, you can try zero shot classification: starting from a pretrained language model, you only set some labels and let the model try to classify sentences.
Here you can see and test an interesting demo.
Read also this good article about zero shot classification, with theory and implementation.
FastText is a big, data-hungry algorithm that starts with random-initialization. You shouldn't expect results to be sensible or indeed match any set of expectations on toy-sized datasets - where (for example) 100%-minus-epsilon of your n-gram buckets won't have received any training.
I also wouldn't expect supervised mode to ever reliably predict no labels on realistic data-sets – it expects all of its training data to have labels, and I've not seen mention of its use to predict an implied 'ghost' category of "not in training data" versus a single known label (as in 'one-class classification').
(Speculatively, I think you might have to feed FastText supervised mode explicitly __label__not-greetings labeled contrast data – perhaps just synthesized random strings if you've got nothing else – in order for it to have any hope of meaningfully predicting "not-greetings".)
Given that, I'd not consider your first result for the input bye correct, nor the second result not correct. Both are just noise results from an undertrained model being asked to make a kind of distinction it's not known for being able to make.
Related
I am doing a multi-label text classification using a pre-trained model of BERT. Here is an example of the prediction that has been made for one sentence-
pred_image
I want to get those words from the sentence on which the prediction has been made. Like this one - right_one
If anyone has any idea, Please enlighten me.
Multi-Label Text Classification (first image) and Token Classification (second image) are two different tasks for each which the model needs to be specifally trained for.
The first one returns a probability for each label considering the entire sentence. The second returns such predictions for each single word in the sentence while usually considering the rest of the sentence as context.
So you can not really use the output from a Text Classifier and use it for Token Classification because the information you get is not detailed enough.
What you can and should do is train a Token Classification model, although you obviously will need token-level-annotated data to do so.
i am working with bert for relation extraction from binary classification tsv file, it is the first time to use bert so there is some points i need to understand more?
how can i get an output like giving it a test data and show the classification results whether it is classified correctly or not?
how bert extract features of the sentences, and is there a method to know what are the features that is chosen?
i used once the hidden layers and another time i didn't use i got the accuracy of not using the hidden layer higher than using it, is there an reason for that?
I'm trying to build a keras model to classify text for 45 different classes. I'm a little confused about preparing my data for the input as required by google's BERT model.
Some blog posts insert data as a tf dataset with input_ids, segment ids, and mask ids, as in this guide, but then some only go with input_ids and masks, as in this guide.
Also in the second guide, it notes that the segment mask and attention mask inputs are optional.
Can anyone explain whether or not those two are required for a multiclass classification task?
If it helps, each row of my data can consist of any number of sentences within a reasonably sized paragraph. I want to be able to classify each paragraph/input to a single label.
I can't seem to find many guides/blogs about using BERT with Keras (Tensorflow 2) for a multiclass problem, indeed many of them are for multi-label problems.
I guess it is too late to answer but I had the same question. I went through huggingface code and found that if attention_mask and segment_type ids are None then by default it pays attention to all tokens and all the segments are given id 0.
If you want to check it out, you can find the code here
Let me know if this clarifies it or you think otherwise.
If I have a word2vec model and I use it for embedding all words in train and test set. But with proper words, in word2vec model does not contain. And can I random a vector as a embedding for all proper words.
If can, please give me some tips and some paper references.
Thank you
It's not clear what you're asking; in particular what do you mean by "proper words"?
But, if after training, words that you expect to be in the model aren't in the model, that is usually caused by either:
(1) Problems with how you preprocessed/tokenized your corpus, so that the words you thought were provided were not. So double check what data you're passing to training.
(2) A mismatch of parameters and expectations. For example, if performing training with a min_count of 5 (the default in some word2vec libraries), any words occurring fewer than 5 times will be ignored, and thus not receive word-vectors. (This is usually a good thing for overall word-vector quality, as low-frequency words can't get good word-vectors for themselves, yet by being interleaved with other words can still mildly interfere with those other words' training.)
Usually double-checking inputs, enabling logging and watching for any suspicious indicators of problems, and carefully examining the post-training model for what it does contain can help deduce what went wrong.
I am new at nltk library and I try to teach my classifier some labels with my own corpus.
For this I have a file with IOB tags like this :
How O
do B-MYTag
you I-MYTag
know O
, O
where B-MYTag
to O
park O
? O
I do this by:
self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0)
and it works.
How to train my classifier with negative cases?
I would have similar file with IOB tags, and I would specified that this file is set wrong. (Negative weights)
How can I do this?
Example for negative case would be:
How B-MYTag
do O
you O
know O
, O
where B-MYTag
to O
park O
? O
After that, I expect to remember that How is probably not a MYTag...
The reason for this is, classifier to learn faster.
If I could just type the statements, program would process it and at the end ask me if I am satisfied with the result. If I am, this text would be added to the train_set, if not it would be added to the negative_train_set.
This way, it would be easier and faster to teach classifier the right stuff.
I'm guessing that you tried a classifier, saw some errors in the results, and want to feed back the wrong outputs as additional training input. There are learning algorithms that optimize on the basis of which answers are wrong or right (neural nets, Brill rules), but the MaxEnt classifier is not one of them. Classifiers that do work like this do all the work internally: They tag the training data, compare the result to the gold standard, adjust their weights or rules accordingly, and repeat again and again.
In short: You can't use incorrect outputs as a training dataset. The idea doesn't even fit the machine learning model, since training data is by assumption correct so incorrect inputs have probability zero. Focus on improving your classifier by using better features, more data, or a different engine.