Text Classification using Byt5 - pytorch

I am trying to pre-train Byt5 model for text classification task.
But Byt5 always predict 1 specific label regardless of how much hyper-parameter tuning is done.
Same data for mt5 is giving decent result.
Can anyone please tell me what I should do?

Related

Bert for relation extraction

i am working with bert for relation extraction from binary classification tsv file, it is the first time to use bert so there is some points i need to understand more?
how can i get an output like giving it a test data and show the classification results whether it is classified correctly or not?
how bert extract features of the sentences, and is there a method to know what are the features that is chosen?
i used once the hidden layers and another time i didn't use i got the accuracy of not using the hidden layer higher than using it, is there an reason for that?

Is there any way to classify text based on some given keywords using python?

i been trying to learn a bit of machine learning for a project that I'm working in. At the moment I managed to classify text using SVM with sklearn and spacy having some good results, but i want to not only classify the text with svm, I also want it to be classified based on a list of keywords that I have. For example: If the sentence has the word fast or seconds I would like it to be classified as performance.
I'm really new to machine learning and I would really appreciate any advice.
I assume that you are already taking a portion of your data, classifying it manually and then using the result as your training data for the SVM algorithm.
If yes, then you could just append your list of keywords (features) and desired classifications (labels) to your training data. If you are not doing it already, I'd recommend using the SnowballStemmer on your training data features.

Wrong predictions from MNSIT keras model

I am new to neural networks so I tried my first neural network which is pretty close to one at keras learn page,given below:
https://github.com/aakarsh1011/Neural-Network/blob/master/MNSIT%20classification.ipynb
Kindlly look at the ending where I red a random image and tried to predict it which comes out as a bag, and when trained at epocs=5 it predicted it as a sandal.
Is something wrong with my code or labeling.
UPDATE - Being new to the field I didn't know the importance of epochs so I asked this question, I was afraid that I don't over-fit the model or train train too much. But there is no definite way to do this, it's all try and error. GOOD LUCK!
First of all, as far as I can see, your code is correct. Your model predicting the wrong item can be caused by the model not being trained for long enough. I would highly recommend you to set epochs=100 and you will be able to see the model's accuracy rise. You should generally always try to give your model as many epochs as possible for training. It will simply take some time. Try out some different numbers of epochs to find the one not taking too long, but still giving an acceptable result.

Can I extract significane values for Logistic Regression coefficients in pyspark

Is there a way to get the significance level of each coefficient we receive after we fit a logistic regression model on training data?
I was trying to find out a way and could not figure out myself.
I think I may get the significance level of each feature if I run chi sq test but first of all not sure if I can run the test on all features together and secondly I have numeric data value so if it will give me right result or not that remains a question as well.
Right now I am running the modeling part using statsmodel and scikit learn but certainly, want to know, how can I get these results from PySpark ML or MLLib itself
If anyone can shed some light, it will be helpful
I use only mllib, I think that when you train a model you can use toPMML method to export your model un PMML format (xml file), then you can parse the xml file to get features weights, here an example
https://spark.apache.org/docs/2.0.2/mllib-pmml-model-export.html
Hope that will help

Timeline Detection

I am trying to do a timeline detection problem using text classification. As a newbie I am confused as to how I can go about with this. Is this a classification problem? i.e, Can I use the years(timelines) as outcomes and solve this as a classification problem?
You should be able to solve this as a classification problem as you suggest. An option could be to find or build a corpus consisting of texts tagged with the period in which they're set, and train a classification algorithm on this data set.
Another option could be to train a word space model on such a data set, and generate vectors for different periods of time (e.g. the 50s, 60s etc.). You could then create a document vector for the text you wish to classify, and find which of these time vectors yields the best match.
Might not work, but it could be interesting to see what results you get.
Hope this helps!

Resources