Loss function negative log likelihood giving loss despite perfect accuracy - nlp

I am debugging a sequence-to-sequence model and purposely tried to perfectly overfit a small dataset of ~200 samples (sentence pairs of length between 5-50). I am using negative log-likelihood loss in pytorch. I get low loss (~1e^-5), but the accuracy on the same dataset is only 33%.
I trained the model on 3 samples as well and obtained 100% accuracy, yet during training I had loss. I was under the impression that negative log-likelihood only gives loss (loss is in the same region of ~1e^-5) if there is a mismatch between predicted and target label?
Is a bug in my code likely?

There is no bug in your code.
The way things usually work in deep nets is that the networks predicts the logits (i.e., log-likelihoods). These logits are then transformed to probability using soft-max (or a sigmoid function). Cross-entropy is finally evaluated based on the predicted probabilities.
The advantage of this approach is that is numerically stable, and easy to train with. On the other side, because of the soft-max you can never have "perfect" 0/1 probabilities for your predictions: That is, even when your network has perfect accuracy it will never assign probability 1 to the correct prediction, but "close to one". As a result, the loss will always be positive (albeit small).

Related

I get different validation loss for almost same type of accuracy

I made two different convolution neural networks for a multi-class classification. And I tested the performance of the two networks using evaluate_generator function in keras. Both models give me comparable accuracies. One gives me 55.9% and the other one gives me 54.8%. However, the model that gives me 55.9% gives a validation loss of 5.37 and the other 1.24.
How can these test losses be so different when the accuracies are
similar. If anything I would expect the loss for the model with
55.9% accuracy to be lower but it's not.
Isn't loss the total sum of errors the network is making?
Insights would be appreciated.
Isn't loss the total sum of errors the network is making?
Well, not really. Loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.
For exaple, in regression tasks loss function can be mean squared error. In classification - binary or categorical crossentropy. These loss functions measure how your model understanding of data is close to the reality.
Why both loss and accuracy are high?
High loss doesn't mean you model don't know anything. In basic case you can think about it that the smaller the loss, the more confident the model is in its choice.
So model with the higher loss not really sure about its answers.
You can also read this discussion about high loss and accuracy
Even though the accuracies are similar, the loss value is not correlated when comparing different models.
Accuracy measures the fraction of correctly classified samples over the Total Population of your samples.
With regards to the loss value, from keras documentation:
Return value
For scalars, the loss value of the test (if the model does not have a merit function) or > a list of scalars (if the model computes another merit function).
If this doesn't help on your case (I don't have a way to reproduce the issue), please check the following known issues in keras, with regards to the evaluate_generator function:
evaluate_generator

For regression model, why the validation set passed to model.fit have different metric result than the model.evaluate?

I have a regression model with Euclidean distance as a loss function and RMSE as a metric evaluation (lower is better). When I passed my train, test sets to model.fit I have train_rmse, and test_rmse which their values made sense to me. But when I pass the test sets into model.evalute after loading the weight of the trained model I got different results which are approximately twice the result with model.fit. And I am aware of the difference that should happen between the train evaluation and test evaluation as I know from Keras that :
the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
But here I am talking about the result of test-set passed to model.fit in which I beleived is evaluated on the final model. In Keras documentation, they said on validation argument that I am passing the test set in it:
validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data.
When I searched about the problem I found several issues
1- Some people like here report that this issue is with the model itself if they have batch normalization layer,or if you do transfer learning and freeze some BN layers like here. my model has BN layers, and I did not freeze any layer. Also, I used the same model for the mulit-class classification problem (not regression) and the result was the same for test set in the model.fit and model.evaluate.
2- Other people like said that this is related with either the prediction or metric calculation like here, in which they found that this difference is related with the different of dtype for y_true and y_pred if someone is float32 and other float64 for example, then the metric calculation will be different. When they unified the dtype the problem is fixed.
I believed that the last case applied to me since in the regression task my labels now is tf.float32. My y_true labels already cast to tf.float32 through tfrecord, so I tried to cast the y_pred to tf.float32 before the rmse calculation, and I still have the difference in the result.
So My questions are:
Why this difference in results
To whom I should rely for test set, on model.fit result or model.evalute
I know that for training loss and accuracy, keras does a running average over the batches, and I know for model.evalute, these metric are calculated by taking all the dataset one time on the final model. But how the validation loss and accuracy calculated for validation set passed to model.fit?
UPDATE:
The problem was in the shape conflict between the y_true and y_pred. As for y_true label I save it in tfrecords as float single value and eventually will be with the size of [batch_size] while the regression model gives the prediction as [batch_size, 1] and then the result of tf.subtract(y_true, y_pred) in rmse equation will result in matrix of [batch_size, batch_sizze] and with taking the mean of this final one you will never guess it is wrong and the code will not throw any error but the calculation of rmse will be wrong. I am still working to make the shape consistent but still didn't find a good solution.

ResUNet Segmentation output is bad although precision and recall values are higher on training and validation

I recently have implemented the RESUNET for a parasite segmentation on blood sample images. The model is described in this papaer, https://arxiv.org/pdf/1711.10684.pdf and here is the code https://github.com/DuFanXin/deep_residual_unet/blob/master/res_unet.py. The segmentation output is a binary image. I trained the model with the weighted Binary cross-entropy Loss, given more weight to the parasite class since there is an imbalance of classes in my images. The last ouput layer has a sigmoid activation.
I calculate precision, recall, and Dice Coefficient value to verify how good is the segmentation on trainning. On training and validation I got good numerical results:
Training
dice_coeff: .6895, f2: 0.8611, precision: 0.6320, recall: 0.9563
Validation
val_dice_coeff: .6433, val_f2: 0.7752, val_precision: 0.6052, val_recall: 0.8499
However, when I try to visually see the segmentations of the validation set my algorithm outputs all black. After analyzing the predictions returned by the model, almost all values are close to zero, so it cannot correctly differenciate between background and foreground. The problems is: Why my metrics shows good numerical values but the segmentation ouput not?
I mean, the metrics are not giving me good information? Why the recall value is higher even if the output is all black?
I trained for about 50 epochs, and my training curves shows constantly learning. Is this because the vanishing gradient problem?
No, you do not have a vanishing gradient issue.
I am almost 100% sure that the problem is related to the way in which you test.
The numbers in your training/validation do not lie.
Ensure that you use the exact same preprocessing on your test dataset, exactly the same preprocessing that is applied during the training.
E.g. : If you use "rescale = 1/255.0" parameter in Keras ImageDataGenerator(), ensure that when you load the test image, divide it by 255.0 before predicting on it.
Note that the aforementioned is a pure example; your inconsistency in train/test preprocessing may stem from other reasons.

Optimum batch_size for model.evaluate() in Keras?

Training accuracy and validation accuracy gives nearly 0.87, but in testing part using evaluate() function gives fluctuated results according to different batch_size parameter values. Testing accuracy varies from 0.5 to 0.66. Is the optimum batch_size value for evaluate has to be same as in fit()?
I don't see how the batch size parameter of the evaluate function can change the accuracy of your model. Only the batch size used during the training can modify the performances of your model (see this). Are you testing the same trained model for your different tests? If you're testing newly trained models every time, it explains the variation of accuracy you observe (because of the random initialization of the weights for example).

Why in model.evaluate() from Keras the loss is used to calculate accuracy?

It may be a stupid question but:
I noticed that the choice of the loss function modifies the accuracy obtained during evaluation.
I thought that the loss was used only during training and of course from it depends the goodness of the model in making prediction but not the accuracy i.e amount of right predictions over the total number of samples.
EDIT
I didn't explain my self correctly.
My question comes because I recently trained a model with binary_crossentropy loss and the accuracy coming from model.evaluate() was 96%.
But it wasn't correct!
I checked "manually" and the model was getting 44% of the total predictions. Then I changed to categorical_crossentropy and then the accuracy was correct.
MAYBE ANSWER
From: another question
I have found the problem. metrics=['accuracy'] calculates accuracy automatically from cost function. So using binary_crossentropy shows binary accuracy, not categorical accuracy. Using categorical_crossentropy automatically switches to categorical accuracy and now it is the same as calculated manually using model1.predict().
Keras chooses the performace metric to use based on your loss funktion. When you use binary_crossentropy it although uses binary_accuracy which is computed differently than categorical_accuracy. You should always use categorical_crossentropy if you have more than one output neuron.
The model tries to minimize the loss function chosen. It adjusts the weights to do this. A different loss function results in different weights.
Those weights determine how many correct predictions are made over the total number of samples. So it is correct behavior to see that the loss function chosen will affect the model accuracy.
From: another question
I have found the problem. metrics=['accuracy'] calculates accuracy
automatically from cost function. So using binary_crossentropy shows
binary accuracy, not categorical accuracy. Using
categorical_crossentropy automatically switches to categorical
accuracy and now it is the same as calculated manually using
model1.predict().

Resources