Validation loss decreases fast while training is slow - keras

Iam training a Unet on CT data containing 64 images for each patient.
While training I observe that the valiation loss is decreasing really fast, while the training loss decreases very slowly.
After about 20 epochs, the validation loss is quite constant while it takes 500 epochs for the training loss to converge.
I already tried a deeper network as well as other Learning rates, but the model behaves the same.
Thanks for any advice,
Cheers,
M

Related

Multilabel text classification with BERT and highly imbalanced training data

I'm trying to train a multilabel text classification model using BERT. Each piece of text can belong to 0 or more of a total of 485 classes. My model consists of a dropout layer and a linear layer added on top of the pooled output from the bert-base-uncased model from Hugging Face. The loss function I'm using is the BCEWithLogitsLoss in PyTorch.
I have millions of labeled observations to train on. But the training data are highly unbalanced, with some labels appearing in less than 10 observations and others appearing in more than 100K observations! I'd like to get a "good" recall.
My first attempt at training without adjusting for data imbalance produced a micro recall rate of 70% (good enough) but a macro recall rate of 45% (not good enough). These numbers indicate that the model isn't performing well on underrepresented classes.
How can I effectively adjust for the data imbalance during training to improve the macro recall rate? I see we can provide label weights to BCEWithLogitsLoss loss function. But given the very high imbalance in my data leading to weights in the range of 1 to 1M, can I actually get the model to converge? My initial experiments show that a weighted loss function is going up and down during training.
Alternatively, is there a better approach than using BERT + dropout + linear layer for this type of task?
In your case it might be helpful to balance the labels in the training data. You have a lot of data, so you could afford to loose a part of it by balancing. But before you do this, I recommend to read this answer about balancing classes in traing data.
If you really only care about recall, you could try to tune your model maximizing recall.

why is the validation accuracy of CNN model behaves like this?

Why my validation accuracy remains constant and then its start increasing gradually with number of epochs

Can the increase in training loss lead to better accuracy?

I'm working on a competition on Kaggle. First, I trained a Longformer base with the competition dataset and achieved a quite good result on the leaderboard. Due to the CUDA memory limit and time limit, I could only train 2 epochs with a batch size of 1. The loss started at about 2.5 and gradually decreased to 0.6 at the end of my training.
I then continued training 2 more epochs using that saved weights. This time I used a little bit larger learning rate (the one on the Longformer paper) and added the validation data to the training data (meaning I no longer split the dataset 90/10). I did this to try to achieve a better result.
However, this time the loss started at about 0.4 and constantly increased to 1.6 at about half of the first epoch. I stopped because I didn't want to waste computational resources.
Should I have waited more? Could it eventually lead to a better test result? I think the model could have been slightly overfitting at first.
Your model got fitted to the original training data the first time you trained it. When you added the validation data to the training set the second time around, the distribution of your training data must have changed significantly. Thus, the loss increased in your second training session since your model was unfamiliar with this new distribution.
Should you have waited more? Yes, the loss would have eventually decreased (although not necessarily to a value lower than the original training loss)
Could it have led to a better test result? Probably. It depends on if your validation data contains patterns that are:
Not present in your training data already
Similar to those that your model will encounter in deployment
In fact it's possible for an increase in training loss to lead to an increase in training accuracy. Accuracy is not perfectly (negatively) correlated with any loss function. This is simply because a loss function is a continuous function of the model outputs whereas accuracy is a discrete function of model outputs. For example, a model that predicts low confidence but always correct is 100% accurate, whereas a model that predicts high confidence but is occasionally wrong can produce a lower loss value but less than 100% accuracy.

After 18th iteration, the loss and validation loss suddenly rise from 0.17 to 0.5 and never reduce from there.. The training has stopped

I'm training a Deep Learning Model using Tensorflow.keras. The Loss function is Triplet Loss. The optimizer is Adam, with learning rate as 0.00005.
Initially the training Loss was 0.38 and it started converging slowly. At 17th Epoch the val_loss became 0.1705. And suddenly in the 18th Epoch training Loss and val_loss both became 0.5 and the same continued for 5-6 epochs. The Loss values didn't change.
Any insight on this behavior would be great. Thank you.
For some models, the loss goes down during training, but at some point an update is applied to a variable that drastically changes your models output. If this is the case, sometimes the model is not able to recover from this new state, and the best it can do from then on is to output 0.5 for every input (I assume that you try to do a binary classification task).
Why can these erroneous updates happen even though they are bad for your model? This is because updates are done using gradient descent. Gradient descent uses the first derivative only though. This means, that the model does know how it needs to change a specific variable, but only very close to its current value. If your learning rate is too high, then the update can be to big and your gradient descent update might be very bad for your models performance.
I read first as training loss much greater than validation loss. That is underfitting.
I read second as training loss much less than validation loss. That is overfitting.
Not an expert but my assumptions have been
Typically validation loss should be similar to but slightly higher than training loss. As long as validation loss is lower than or even equal to training loss one should keep doing more training.
If training loss is reducing without increase in validation loss then again keep doing more training
If validation loss starts increasing then it is time to stop
If overall accuracy still not acceptable then review mistakes model is making and think of what can one change:
More data? More / different data augmentations? Generative data?
Different architecture?
Underfitting – Validation and training error high
Overfitting – Validation error is high, training error low
Good fit – Validation error low, slightly higher than the training
error
Unknown fit - Validation error low, training error ‘high’

Overfitting after one epoch

I am training a model using Keras.
model = Sequential()
model.add(LSTM(units=300, input_shape=(timestep,103), use_bias=True, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(units=536))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
while True:
history = model.fit_generator(
generator = data_generator(x_[train_indices],
y_[train_indices], batch = batch, timestep=timestep),
steps_per_epoch=(int)(train_indices.shape[0] / batch),
epochs=1,
verbose=1,
validation_steps=(int)(validation_indices.shape[0] / batch),
validation_data=data_generator(
x_[validation_indices],y_[validation_indices], batch=batch,timestep=timestep))
It is a multiouput classification accoriding to scikit-learn.org definition:
Multioutput regression assigns each sample a set of target values.This can be thought of as predicting several properties for each data-point, such as wind direction and magnitude at a certain location.
Thus, it is a recurrent neural network I tried out different timestep sizes. But the result/problem is mostly the same.
After one epoch, my train loss is around 0.0X and my validation loss is around 0.6X. And this values keep stable for the next 10 epochs.
Dataset is around 680000 rows. Training data is 9/10 and validation data is 1/10.
I ask for intuition behind that..
Is my model already over fittet after just one epoch?
Is 0.6xx even a good value for a validation loss?
High level question:
Therefore it is a multioutput classification task (not multi class), I see the only way by using sigmoid an binary_crossentropy. Do you suggest an other approach?
I've experienced this issue and found that the learning rate and batch size have a huge impact on the learning process. In my case, I've done two things.
Reduce the learning rate (try 0.00005)
Reduce the batch size (8, 16, 32)
Moreover, you can try the basic steps for preventing overfitting.
Reduce the complexity of your model
Increase the training data and also balance each sample per class.
Add more regularization (Dropout, BatchNorm)

Resources