I get different validation loss for almost same type of accuracy - keras

I made two different convolution neural networks for a multi-class classification. And I tested the performance of the two networks using evaluate_generator function in keras. Both models give me comparable accuracies. One gives me 55.9% and the other one gives me 54.8%. However, the model that gives me 55.9% gives a validation loss of 5.37 and the other 1.24.
How can these test losses be so different when the accuracies are
similar. If anything I would expect the loss for the model with
55.9% accuracy to be lower but it's not.
Isn't loss the total sum of errors the network is making?
Insights would be appreciated.

Isn't loss the total sum of errors the network is making?
Well, not really. Loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.
For exaple, in regression tasks loss function can be mean squared error. In classification - binary or categorical crossentropy. These loss functions measure how your model understanding of data is close to the reality.
Why both loss and accuracy are high?
High loss doesn't mean you model don't know anything. In basic case you can think about it that the smaller the loss, the more confident the model is in its choice.
So model with the higher loss not really sure about its answers.
You can also read this discussion about high loss and accuracy

Even though the accuracies are similar, the loss value is not correlated when comparing different models.
Accuracy measures the fraction of correctly classified samples over the Total Population of your samples.
With regards to the loss value, from keras documentation:
Return value
For scalars, the loss value of the test (if the model does not have a merit function) or > a list of scalars (if the model computes another merit function).
If this doesn't help on your case (I don't have a way to reproduce the issue), please check the following known issues in keras, with regards to the evaluate_generator function:
evaluate_generator

Related

What happens if optimal training loss is too high

I am training a Transformer. In many of my setups I obtain validation and training loss that look like this:
Then, I understand that I should stop training at around epoch 1. But then the training loss is very high. Is this a problem? Does the value of training loss actually mean anything?
Thanks
Regarding your first question - it is not necessarily a problem that your training loss is high, since there is no threshold for what is considered as a high training loss. It depends on your dataset, your actual test metrics and your business goals.
More specifically, the problems with the value of training loss:
The number isn't intuitive, since the loss objective is a metric optimized for gradient descent (i.e. a differentiable function, usually the log version of it).
You probably have intuitive business metrics (e.g., precision, recall) oriented towards your end goal, which you should use to decide if your model is good or not.
Your train loss is calculated on the training dataset, which is not always representative of a good model, as can be seen in the overfitted model you posted. You shouldn't use this number to make decisions for the goodness of the model.
It depends on what you are trying to achieve. Is 80% accuracy high or low?
Regarding your second question - Technically, the higher the number the worse the model did in converging, so you should always try to lower it (while taking into consideration overfitting).
Comparatively, you can say that one model has a higher loss than another and then try multiple hyperparameters (e.g., dropout, different optimizers) to minimize the point where the validation set diverges.
You are describing overfitting: Your model's expressive power is too strong and it is memorizing the training data, rather than learning useful representations that can generalize to the validation data.
To mitigate this issue, you should apply stronger regularization to your model to prevent it from memorizing and steer it towards useful representations.
regularization methods include (but are not limited to):
Input augmentations
DropOut
Early stopping
Weight decay

Loss function negative log likelihood giving loss despite perfect accuracy

I am debugging a sequence-to-sequence model and purposely tried to perfectly overfit a small dataset of ~200 samples (sentence pairs of length between 5-50). I am using negative log-likelihood loss in pytorch. I get low loss (~1e^-5), but the accuracy on the same dataset is only 33%.
I trained the model on 3 samples as well and obtained 100% accuracy, yet during training I had loss. I was under the impression that negative log-likelihood only gives loss (loss is in the same region of ~1e^-5) if there is a mismatch between predicted and target label?
Is a bug in my code likely?
There is no bug in your code.
The way things usually work in deep nets is that the networks predicts the logits (i.e., log-likelihoods). These logits are then transformed to probability using soft-max (or a sigmoid function). Cross-entropy is finally evaluated based on the predicted probabilities.
The advantage of this approach is that is numerically stable, and easy to train with. On the other side, because of the soft-max you can never have "perfect" 0/1 probabilities for your predictions: That is, even when your network has perfect accuracy it will never assign probability 1 to the correct prediction, but "close to one". As a result, the loss will always be positive (albeit small).

Model underfitting

I have trained a model and it took me quite a while to find the correct hyperparameters.
The model has now been trained for 15h and it seems to to its job quite well.
When I observed the training and validation loss though, the training loss is somewhat higher than the validation loss. (red curve: training, green: validation)
I use dropout to regularize my model and as far as I have understood, droput is is only applied during training which might be the reason.
Now Iam wondering if I have trained a valid model?
It doesn't seem like the model is heavily underfitted?
Thanks in advance for any advice,
cheers,
M
First, check whether you have good data set, i.e., if it is a classification, then get equal number of images for all classes and get it from same source not from different sources. And regularization, dropout are used for overfitting/High variance so don't worry about these.
Then, I think your model is doing good when you trained your model the initial error between them are different but as you increased the epochs then they both got into some steady path. So it is good. And may be reason for this is as I mentioned above or you should try shuffle them then using train_test_split for getting better distribution of training and validation sets.
A plot of learning curves shows a good fit if:
The plot of training loss decreases to a point of stability.
The plot of validation loss decreases to a point of stability and has a small gap with the training loss.
In your case these conditions are satisfied.
Still if you want to deal with High Bias/underfitting then here are few methods:
Train bigger models
Train longer. Use better optimization techniques
Try different Neural Network Architecture and also hyper parameters
And also you can use cross-validation or GridSearchCV for finding better optimizer or hyper parameters but it may take really long because you have to train it on different parameters each time considering your time which is 15 hours then it might be very long but you will find better parameters and then train on it.
Above all I think your model is doing okay.
If your model underfits, its performance will be lower, similar as in the case of overfitting, because actually it can not learn effectively to get the optimal result, i.e the proper function to fit the given distribution. So you have to use less regularization technique e.g. less dropout to get the optimal result.
Furthermore the sampling can also be crucial, because there can be training-validation subsets where your model performs well on validation set and less effective on training set and vice-versa. This is one of the reason why we use crossvalidation and different sampling methods e.g. stratified k-fold.

Why in model.evaluate() from Keras the loss is used to calculate accuracy?

It may be a stupid question but:
I noticed that the choice of the loss function modifies the accuracy obtained during evaluation.
I thought that the loss was used only during training and of course from it depends the goodness of the model in making prediction but not the accuracy i.e amount of right predictions over the total number of samples.
EDIT
I didn't explain my self correctly.
My question comes because I recently trained a model with binary_crossentropy loss and the accuracy coming from model.evaluate() was 96%.
But it wasn't correct!
I checked "manually" and the model was getting 44% of the total predictions. Then I changed to categorical_crossentropy and then the accuracy was correct.
MAYBE ANSWER
From: another question
I have found the problem. metrics=['accuracy'] calculates accuracy automatically from cost function. So using binary_crossentropy shows binary accuracy, not categorical accuracy. Using categorical_crossentropy automatically switches to categorical accuracy and now it is the same as calculated manually using model1.predict().
Keras chooses the performace metric to use based on your loss funktion. When you use binary_crossentropy it although uses binary_accuracy which is computed differently than categorical_accuracy. You should always use categorical_crossentropy if you have more than one output neuron.
The model tries to minimize the loss function chosen. It adjusts the weights to do this. A different loss function results in different weights.
Those weights determine how many correct predictions are made over the total number of samples. So it is correct behavior to see that the loss function chosen will affect the model accuracy.
From: another question
I have found the problem. metrics=['accuracy'] calculates accuracy
automatically from cost function. So using binary_crossentropy shows
binary accuracy, not categorical accuracy. Using
categorical_crossentropy automatically switches to categorical
accuracy and now it is the same as calculated manually using
model1.predict().

Validation loss in keras while training LSTM and stability of LSTM

I am using Keras now to train my LSTM model for a time series problem. My activation function is linear and the optimizer is Rmsprop.
However, i observe the tendency that while the training loss is decreasing slowly overtime, and fluctuate around a small value, the validation loss jumps up and down with a large variance.
Therefore, I come up with two questions:
1. Does the validation loss affect the training process? Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
2. How can i make the model more stable so that it will return a more stable values of validation loss?
Thanks
Does the validation loss affect the training process?
No. The validation loss is just a small sample of data that is excluded from the training process. It is run through the network at the end of an epoch, to test how well training is going, so that you can check if the model is over fitting (i.e. training loss much < validation loss).
Fluctuation in validation loss
This is bit tougher to answer without the network or data. It could just mean that your model isn't converging well to unseen data, meaning that its not seeing a enough similar trends from training data to validation data, and each time the weights are adjusted to better suit the training data, the model becomes less accurate for the validation set. You could possibly turn down the learning rate, but if your training loss is decreasing slowly, the learning rate is probably fine. I think in this situation, you have to ask yourself a few questions. Do I have enough data? Does a true time series trend exist in my data? Have I normalized my data correctly? Is my network to large for the data I have?
I had this issue - while training loss was decreasing, the validation loss was not decreasing. I checked and found while I was using LSTM:
I simplified the model - instead of 20 layers, I opted for 8 layers.
Instead of scaling within range (-1,1), I choose (0,1), this right there reduced my validation loss by magnitude of one order
I reduced the batch size from 500 to 50 (just trial and error)
I added more features, which I thought intuitively would add some new intelligent information to the X->y pair
Possible reasons:
Your validation set is very small compare to your trainning set which usually happens. A little change of weights makes validation loss fluctuate much more than trainning loss. This may not neccessary mean that your model is overfiting. As long as the overall trendency of validation loss keeps decreasing.
May be your train and validation data are from different sources, they may have different distributions. This may happen when your data is time series, and you split your train/validation data by a specific timestamp.
Does the validation loss affect the training process?
No, validation(forward-pass-once) and training(forward-and-backward) are different processes. Hence a single forword pass does not change how would you train next.
Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
No, But I guess you can implement your own method to do so. However, one thing should be noted, the model is trying to learn the best solution to your cost function which are fed by trainning data only, so changing this learning rate by observing validation loss doesnt make too much sense.
How can i make the model more stable so that it will return a more stable values of validation loss?
The reasons are expained above. If it is the first case, enlarge validation set will make your loss looks more stable but it does NOT mean it fits better. My suggestion is as long as your are sure your model does not overfit (gap between train loss and validation loss are not too large ), you can just save the model which gives the lowest validation loss.
If its the second case, it can be complecated depend on your case. You could try to exclude samples in trainning set which are not "similar" with your validation set, or enlarge your model's capacity if you have enough data. Or perhapes add more metrics to monitor how well the training.

Resources