Accuracy graph for neural network is fluctuating a lot - python-3.x

I've made a neural network and it's architecture is as follows:
It has two two branches that are merged. One branch takes matrices as an input to a convolutional network and other branch is a fully connected layer that takes a vector as an input. These two branches are merged and send to a fully connected layer followed by a output layer. My network runs, however, I get the following graphs:
For accuracy:
For Loss:
I think my loss graph is alright. But the accuracy fun is fluctuating a lot. My overall accuracy is 60%. Do you think these graphs suggests under-fitting or it's normal? Insights would be appreciated.

It is a common behaviour to have fluctuations due to batch training. A perfect smooth loss graph/increase in accuracy would be obtained if and only if the neural network would be fed the entire dataset (this is impossible from a computational viewpoint).
When your training loss increases, the validation accuracy decreases, which is a good sign. The last graph together with my previous observation eliminates the possibility of overfitting(at least on the development set).
The graphs do not look out of the ordinary (except for those spikes that I have already mentioned, it is normal to have them during batch training)
It may or may not be the case of underfitting. If using more complex neural networks on both branches(on even on one of the branch) gives you a better result, then this means that it is a case of underfitting.
However, this underfitting phenomenon has nothing to do with the spikes that you see on the graphs.
Hope this helps you with your problem :)

Related

Improve the neural network by analyzing the loss curve

I builted some network based on LSTM. I tuneded parameters. The results are shown in the figure and are not impressive.
How to understand what is bad? Is the dataset bad or the network is not well built?
Since validation loss decreased initially and later increased what you're experiencing is model overfitting.
Since training loss kept decreasing, your model has learnt training set excessively and now model is not generalizing well. Due to this validation loss increased.
To avoid overfitting, you need to regularize your model. You can use L1 or L2 regularization techniques. Additionally, you can also try dropout in your model.
Now coming to your question:
If the dataset is of good quality i.e. it is annotated well and it surely has features which could give result, then dataset and model hand-in-hand decides the quality of the predictions.
Since you're using RNNs that consists a good numbers of parameters, make sure that dataset is also huge to avoid RNNs overfitting on a small dataset. If available dataset is small, start with a small deep learning with less parameters (you can build a small neural network) and gradually scale up the model until you're satisfied with the prediction scores.
You can also refer this: https://towardsdatascience.com/rnn-training-tips-and-tricks-2bf687e67527

Data augmentation affects convergence speed

Data augmentation is surely a great regularization method, and it improves my accuracy on the unseen test set. However, I do not understand why it reduces the convergence speed of the network? I know each epoch takes a longer time to train since image transformations are applied on the fly. But why does it affect the convergence? For my current setup, the network hits a 100% training accuracy after 5 epochs without data augmentation (and clearly overfits) - with data augmentation, it takes 23 epochs to hit 95% training accuracy and never seems to hit 100%.
Any links to research papers or comments on the reasonings behind this?
I guess you are evaluating accuracy on the train set, right? And it is a mistake...
Without augmentation your network simply overfits. You have a predefined number of images, for instance, 1000, and your network during training can easily memorize dataset labels. And you are evaluating the model on the fixed (not augmented) dataset.
When you are training your network with data augmentation, basically, you are training a model on a dataset of infinite size. You are doing augmentation on the fly, which means that the model "sees" new images every time, and it cannot memorize them perfectly with 100% accuracy. And you are evaluating the model on the augmented (infinite) dataset.
When you train your model with and without augmentation, you evaluate it on the different datasets, so it is not correct to compare their accuracy.
Piece of advice:
Do not look at train set accuracy, it is simply misleading when you use augmentations. Instead - evaluate your model on the test set (or validation set), which is not augmented. By doing this - you'll see the real accuracy increase for your model.
P.S. If you want to find out more about image augmentaitons, I really recommend you to check this guide - https://notrocketscience.blog/complete-guide-to-data-augmentation-for-computer-vision/

validation loss higher only for some tasks

I am training a multi task network, it seems that the validation loss is higher than the training loss only for some tasks but for others, the network seems to converge pretty well. For one task in particular, the validation loss is much higher than the training one and it affects the average. I added some data augmentation, normalization, dropouts, batch norm etc. to avoid overfitting in general. How can I handle this one single task???
I recommend you focusing on that single task. Try to study the residuals (regression) or the errors in classification (in case you are working with a classification). Without more info in the problem I cannot help you more than that!

Best Way to Overcome Early Convergence for Machine Learning Model

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No).
In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset.
I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. I've tried many different learning rates, and 0.01 seems to be working fairly well.
After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. However, I would wish for this to be even higher. However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. (A graph of this is below)
What are the most effective ways to get out of this "valley" that the model seems to be stuck in?
Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try:
Try more than one hidden layer
Experiment with LSTM and GRU layer and combination of these layers together with RNN.
Shape of your data i.e. the history you look at to predict the weather.
Make sure your features are scaled properly since you have about 50 input variables.
Your question is little ambiguous as you mentioned RNN with a single hidden layer. Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. So, I would like to add a few points.
You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. Did you try using SGD or Adam optimizers which are very useful?
Do you have any regularization term in the loss function? Are you familiar with dropout? Did you check the training performance? Does your model overfit?
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not?
150 epochs just for a binary classification task looks too much. Why don't you start from an off-the-shelf classifier model? You can find several examples of regression, classification in this tutorial.

Validation loss in keras while training LSTM and stability of LSTM

I am using Keras now to train my LSTM model for a time series problem. My activation function is linear and the optimizer is Rmsprop.
However, i observe the tendency that while the training loss is decreasing slowly overtime, and fluctuate around a small value, the validation loss jumps up and down with a large variance.
Therefore, I come up with two questions:
1. Does the validation loss affect the training process? Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
2. How can i make the model more stable so that it will return a more stable values of validation loss?
Thanks
Does the validation loss affect the training process?
No. The validation loss is just a small sample of data that is excluded from the training process. It is run through the network at the end of an epoch, to test how well training is going, so that you can check if the model is over fitting (i.e. training loss much < validation loss).
Fluctuation in validation loss
This is bit tougher to answer without the network or data. It could just mean that your model isn't converging well to unseen data, meaning that its not seeing a enough similar trends from training data to validation data, and each time the weights are adjusted to better suit the training data, the model becomes less accurate for the validation set. You could possibly turn down the learning rate, but if your training loss is decreasing slowly, the learning rate is probably fine. I think in this situation, you have to ask yourself a few questions. Do I have enough data? Does a true time series trend exist in my data? Have I normalized my data correctly? Is my network to large for the data I have?
I had this issue - while training loss was decreasing, the validation loss was not decreasing. I checked and found while I was using LSTM:
I simplified the model - instead of 20 layers, I opted for 8 layers.
Instead of scaling within range (-1,1), I choose (0,1), this right there reduced my validation loss by magnitude of one order
I reduced the batch size from 500 to 50 (just trial and error)
I added more features, which I thought intuitively would add some new intelligent information to the X->y pair
Possible reasons:
Your validation set is very small compare to your trainning set which usually happens. A little change of weights makes validation loss fluctuate much more than trainning loss. This may not neccessary mean that your model is overfiting. As long as the overall trendency of validation loss keeps decreasing.
May be your train and validation data are from different sources, they may have different distributions. This may happen when your data is time series, and you split your train/validation data by a specific timestamp.
Does the validation loss affect the training process?
No, validation(forward-pass-once) and training(forward-and-backward) are different processes. Hence a single forword pass does not change how would you train next.
Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
No, But I guess you can implement your own method to do so. However, one thing should be noted, the model is trying to learn the best solution to your cost function which are fed by trainning data only, so changing this learning rate by observing validation loss doesnt make too much sense.
How can i make the model more stable so that it will return a more stable values of validation loss?
The reasons are expained above. If it is the first case, enlarge validation set will make your loss looks more stable but it does NOT mean it fits better. My suggestion is as long as your are sure your model does not overfit (gap between train loss and validation loss are not too large ), you can just save the model which gives the lowest validation loss.
If its the second case, it can be complecated depend on your case. You could try to exclude samples in trainning set which are not "similar" with your validation set, or enlarge your model's capacity if you have enough data. Or perhapes add more metrics to monitor how well the training.

Resources