why is the validation accuracy of CNN model behaves like this? - conv-neural-network

Why my validation accuracy remains constant and then its start increasing gradually with number of epochs

Related

Why my validation loss is always in a certain range of values?

I have been working with an convolutional autoencoder model. My encoder works fine but I am facing problem with my decoder. I have tried two different model.
For Model 1, training loss starts with 0.0436 and then after some epochs, the value was in range betweek 0.0280 and 0.0320. For validation, loss starts with 0.0306 and then after some epochs it the loss was in between 0.0275 and 0.0285.
For Model 2, the training loss decreased nicely but for the validation loss, it started with 0.2702 but after some epochs, the value was in range between 0.1450 and 0.1550.
I have used 15000 images from MS Coco, 'mse' as a loss function, 'Adam' as an optimizer with 0.00001 learning rate.
I have tried to use dropout layer, regularization but nothing is working. First I have tried it with 3000 images but later I increased the dataset into 15000 but still getting same problem. Total number of parameter of my model is 221,449,088.

After 18th iteration, the loss and validation loss suddenly rise from 0.17 to 0.5 and never reduce from there.. The training has stopped

I'm training a Deep Learning Model using Tensorflow.keras. The Loss function is Triplet Loss. The optimizer is Adam, with learning rate as 0.00005.
Initially the training Loss was 0.38 and it started converging slowly. At 17th Epoch the val_loss became 0.1705. And suddenly in the 18th Epoch training Loss and val_loss both became 0.5 and the same continued for 5-6 epochs. The Loss values didn't change.
Any insight on this behavior would be great. Thank you.
For some models, the loss goes down during training, but at some point an update is applied to a variable that drastically changes your models output. If this is the case, sometimes the model is not able to recover from this new state, and the best it can do from then on is to output 0.5 for every input (I assume that you try to do a binary classification task).
Why can these erroneous updates happen even though they are bad for your model? This is because updates are done using gradient descent. Gradient descent uses the first derivative only though. This means, that the model does know how it needs to change a specific variable, but only very close to its current value. If your learning rate is too high, then the update can be to big and your gradient descent update might be very bad for your models performance.
I read first as training loss much greater than validation loss. That is underfitting.
I read second as training loss much less than validation loss. That is overfitting.
Not an expert but my assumptions have been
Typically validation loss should be similar to but slightly higher than training loss. As long as validation loss is lower than or even equal to training loss one should keep doing more training.
If training loss is reducing without increase in validation loss then again keep doing more training
If validation loss starts increasing then it is time to stop
If overall accuracy still not acceptable then review mistakes model is making and think of what can one change:
More data? More / different data augmentations? Generative data?
Different architecture?
Underfitting – Validation and training error high
Overfitting – Validation error is high, training error low
Good fit – Validation error low, slightly higher than the training
error
Unknown fit - Validation error low, training error ‘high’

Validation loss decreases fast while training is slow

Iam training a Unet on CT data containing 64 images for each patient.
While training I observe that the valiation loss is decreasing really fast, while the training loss decreases very slowly.
After about 20 epochs, the validation loss is quite constant while it takes 500 epochs for the training loss to converge.
I already tried a deeper network as well as other Learning rates, but the model behaves the same.
Thanks for any advice,
Cheers,
M

Optimum batch_size for model.evaluate() in Keras?

Training accuracy and validation accuracy gives nearly 0.87, but in testing part using evaluate() function gives fluctuated results according to different batch_size parameter values. Testing accuracy varies from 0.5 to 0.66. Is the optimum batch_size value for evaluate has to be same as in fit()?
I don't see how the batch size parameter of the evaluate function can change the accuracy of your model. Only the batch size used during the training can modify the performances of your model (see this). Are you testing the same trained model for your different tests? If you're testing newly trained models every time, it explains the variation of accuracy you observe (because of the random initialization of the weights for example).

Multivariate LSTM Forecast Loss and evaluation

I have a CNN-RNN model architecture with Bidirectional LSTMS for time series regression problem. My loss does not converge over 50 epochs. Each epoch has 20k samples. The loss keeps bouncing between 0.001 - 0.01.
batch_size=1
epochs = 50
model.compile(loss='mean_squared_error', optimizer='adam')
trainingHistory=model.fit(trainX,trainY,epochs=epochs,batch_size=batch_size,shuffle=False)
I tried to train the model with incorrectly paired X and Y data for which the
loss stays around 0.5, is it reasonable conclusion that my X and Y
have a non linear relationship which can be learned by my model over
more epochs ?
The predictions of my model capture the pattern but with an offset, I use dynamic time warping distance to manually check the accuracy of predictions, is there a better way ?
Model :
model = Sequential()
model.add(LSTM(units=128, dropout=0.05, recurrent_dropout=0.35, return_sequences=True, batch_input_shape=(batch_size,featureSteps,input_dim)))
model.add(LSTM(units=32, dropout=0.05, recurrent_dropout=0.35, return_sequences=False))
model.add(Dense(units=2, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
If you tested with:
Wrong data: loss ~0.5
Correct data: loss ~0.01
Then your model is actually cabable of learning something.
There are some possibilities there:
Your output data does not fit in the range of the last layer's activation
Your model reached a limit for the current learning rate (gradient update steps are too big and can't improve the model anymore).
Your model is not good enough for the task.
Your data has some degree of random factors
Case 1:
Make sure your Y is within the range of your last activation function.
For a tanh (the LSTM's default), all Y data should be between -1 and + 1
For a sigmoid, between 0 and 1
For a softmax, between 0 and 1, but make sure your last dimension is not 1, otherwise all results will be 1, always.
For a relu, between 0 and infinity
For linear, any value
Convergence goes better if you have a limited activation instead of one that goes to infinity.
In the first case, you can recompile (after training) the model with a lower learning rate, usually we divide it by 10, where the default is 0.0001:
Case 2:
If data is ok, try decreasing the learning rate after your model stagnates.
The default learning rate for adam is 0.0001, we often divide it by 10:
from keras.optimizers import Adam
#after training enough with the default value:
model.compile(loss='mse', optimizer=Adam(lr=0.00001)
trainingHistory2 = model.fit(.........)
#you can even do this again if you notice that the loss decreased and stopped again:
model.compile(loss='mse',optimizer=Adam(lr=0.000001)
If the problem was the learning rate, this will make your model learn more than it already did (there might be some difficult at the beginning until the optimizer adjusts itself).
Case 3:
If you got no success, maybe it's time to increase the model's capability.
Maybe add more units to the layers, add more layers or even change the model.
Case 4:
There's probably nothing you can do about this...
But if you increased the model like in case 3, be careful with overfitting (keep some test data to compare the test loss versus the training loss).
Too good models can simply memorize your data instead of learning important insights about it.

Resources