F1 score closely related to accuracy score in a classification model - statistics

I am running into an interesting problem. It might not be a problem, but I think it is. I ran an artificial neural network classification model. I generated an accuracy score and an F1 score to compare after completing the model. My accuracy score was 61.90%, and the F1 score was 61.87%. I am working with an imbalanced data set btw
I found an article saying that score comparison could happen, but I am not sure. This might be considered a "beginner question," and I am sorry if it is.
Have any of you run into this before? What could it possibly suggest about my model or dataset?

Related

training accuracy highprediction of the training set is low

I am using keras 2.2.4 to train a text-based emotion recognition model, which is three categories classification.
I put accuracy in the model compile metric. While training, the accuracy showed in the result bar was normal, around 86%,and the loss is around 0.35, which i believed it was working properly.
After training, I found that the prediction with the testing set was pretty bad, acc was only around 0.44. with my instinct, it might be overfitting issue with the model. However, with my random curiosity, I put the training set into the model prediction, the accuracy was also pretty bad, same as the testing set.
The result showed that it might not be an overfitting issue with the model, and I cannot come up with any possible reason why this happen. Also, the difference between the accuracy output while training with the training set and the accuracy with the training set after training is even more confusing.
Does anyone ever encounter the same situation, and what problem it may be?

Difference between model accuracy from test data and confusion matrix accuracy

I am working for NLP project where I wanted to do text classification
using neural n/w
I am getting very nice accuracy from the test set as 98%.
But, when I tried to check the confusion matrix accuracy (the accuracy score using confusion matrix) it's just 52%.
How is it possible? What am I missing here?
Question
What is the difference between both the accuracy's which one should be considered as the actual accuracy? and why?
Code on test set
loss, acc = model.evaluate(Xtest, y_test_array)
It looks like your dataset has class imbalance, and the metric calculated from confusion matrix (it is NOT accuracy - probably, it is something like F1 score) is low because the minority class is recognized poorly. At the same time, accuracy is high because the majority class is recognized well.

Sentiment analysis using images

I am trying sentiment analysis of images.
I have 4 classes - Hilarious , funny very funny not funny.
I tried pre trained models like VGG16/19 densenet201 but my model is overfitting getting training accuracy more than 95% and testing around 30
Can someone give suggestions what else I can try?
Training images - 6K
You can try the following to reduce overfitting:
Implement Early Stopping: compute the validation loss at each epoch and a patience threshold for stopping
Implement Cross Validation: refer to Section Cross-validation in
https://cs231n.github.io/classification/#val
Use Batch Normalisation: normalises the activations of layers to
unit variance and zero mean, improves model generalisation
Use Dropout (either or with batch norm): randomly zeros some activations to incentivise use of all neurons
Also, if your dataset isn't too challenging, make sure you don't use overly complex models and overkill the task.

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.

Bleu Score in Model Evaluation Metric

In many seq2seq implementations, I saw that they use accuracy metric in compiling the model and Bleu score only in predictions.
Why they don't use Bleu score in training to be more efficient? if I understand correctly!
Bilingual Evaluation Understudy Score was meant to replace humans, hence the word understudy comes in it's name.
Now, When you are training your data, you already have the targeted value and you can directly compare your generated output with it, but when you predict on a dataset, you don't have a way to measure if the sentence you translated into is correct. That is why you use Bleu, because no human can check after each machine translation if what you predicted is correct or not, and Bleu provides a sanity check.
P.S. Understudy means someone learning from a mentor to replace him if need be, Bleu "learns" from humans and then is able to score the translation.
For further reference check out https://www.youtube.com/watch?v=9ZvTxChwg9A&list=PL1w8k37X_6L_s4ncq-swTBvKDWnRSrinI&index=28
If any queries, comment below.

Resources