Tensorflow object detection API - validation loss behaviour - python-3.x

I am trying to use the TensorFlow object detection API to recognize a specific object (guitars) in pictures and videos.
As for the data, I downloaded the images from the OpenImage dataset, and derived the .tfrecord files. I am testing with different numbers, but for now let's say I have 200 images in the training set and 100 in the evaluation one.
I'm traininig the model using the "ssd_mobilenet_v1_coco" as a starting point, and the "model_main.py" script, so that I can have training and validation results.
When I visualize the training progress in TensorBoard, I get the following results for train:
and validation loss:
respectively.
I am generally new to computer vision and trying to learn, so I was trying to figure out the meaning of these plots.
The training loss goes as expected, decreasing over time.
In my (probably simplistic) view, I was expecting the validation loss to start at high values, decrease as training goes on, and then start increasing again if the training goes on for too long and the model starts overfitting.
But in my case, I don't see this behavior for the validation curve, which seems to be trending upwards basically all the time (excluding fluctuations).
Have I been training the model for too little time to see the behavior I'm expecting? Are my expectations wrong in the first place? Am I misinterpreting the curves?

Ok, I fixed it by decreasing the initial_learning_rate from 0.004 to 0.0001.
It was the obvious solution, considering the wild oscillations of the validation loss, but at first I thought it wouldn't work since there seems to be already a learning rate scheduler in the config file.
However, immediately below (in the config file) there's a num_steps option, and it's stated that
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
Honestly, I don't remember if I commented out the num_steps option...if I didn't, it seems my learning rate was kept to the initial value of 0.004, which turned out to be too high.
If I did comment it out (so that the learning scheduler was active), I guess that, instead of the decrease, it still started from too high of a value.
Anyway, it's working much better now, I hope this can be useful if anyone is experiencing the same problem.

Related

Can the increase in training loss lead to better accuracy?

I'm working on a competition on Kaggle. First, I trained a Longformer base with the competition dataset and achieved a quite good result on the leaderboard. Due to the CUDA memory limit and time limit, I could only train 2 epochs with a batch size of 1. The loss started at about 2.5 and gradually decreased to 0.6 at the end of my training.
I then continued training 2 more epochs using that saved weights. This time I used a little bit larger learning rate (the one on the Longformer paper) and added the validation data to the training data (meaning I no longer split the dataset 90/10). I did this to try to achieve a better result.
However, this time the loss started at about 0.4 and constantly increased to 1.6 at about half of the first epoch. I stopped because I didn't want to waste computational resources.
Should I have waited more? Could it eventually lead to a better test result? I think the model could have been slightly overfitting at first.
Your model got fitted to the original training data the first time you trained it. When you added the validation data to the training set the second time around, the distribution of your training data must have changed significantly. Thus, the loss increased in your second training session since your model was unfamiliar with this new distribution.
Should you have waited more? Yes, the loss would have eventually decreased (although not necessarily to a value lower than the original training loss)
Could it have led to a better test result? Probably. It depends on if your validation data contains patterns that are:
Not present in your training data already
Similar to those that your model will encounter in deployment
In fact it's possible for an increase in training loss to lead to an increase in training accuracy. Accuracy is not perfectly (negatively) correlated with any loss function. This is simply because a loss function is a continuous function of the model outputs whereas accuracy is a discrete function of model outputs. For example, a model that predicts low confidence but always correct is 100% accurate, whereas a model that predicts high confidence but is occasionally wrong can produce a lower loss value but less than 100% accuracy.

How do I prevent a lack of VRAM halfway through training a Huggingface Transformers (Pegasus) model?

I'm taking a pre-trained pegasus model through Huggingface transformers, (specifically, google/pegasus-cnn_dailymail, and I'm using Huggingface transformers through Pytorch) and I want to finetune it on my own data. This is however quite a large dataset and I've run into the problem of running out of VRAM halfway through training, which because of the size of the dataset can be a few days after training even started, which makes a trial-and-error approach very inefficient.
I'm wondering how I can make sure ahead of time that it doesn't run out of memory. I would think that the memory usage of the model is in some way proportional to the size of the input, so I've passed truncation=True, padding=True, max_length=1024 to my tokenizer, which if my understanding is correct should make all the outputs of the tokenizer of the same size per line. Considering that the batch size is also a constant, I would think that the amount of VRAM in use should be stable. So I should just be able to cut up the dataset into managable parts, just looking at the ram/vram use of the first run, and infer that it will run smoothly from start to finish.
However, the opposite seems to be true. I've been observing the amount of VRAM used at any time and it can vary wildly, from ~12GB at one time to suddenly requiring more than 24GB and crashing (because I don't have more than 24GB).
So, how do I make sure that the amount of vram in use will stay within reasonable bounds for the full duration of the training process, and avoid it crashing due to a lack of vram when I'm already days into the training process?
padding=True actually doesn't pad to max_length, but to the longest sample in the list you pass to the tokenizer. To pad to max_length you need to set padding='max_length'.

Abrupt increase in RMSE in LSTM while working on Time Series Prediction

I have the following LSTM network(Fig 1) for predicting the Bitcoin Price. The input is every hour close price of Bitcoin. I am facing some issues and any advice is appreciated.
Earlier on the same network, my RMSE on testing and training set was 6.71 and 7.41 RMSE. I recompiled the whole code and there was an abrupt increase to 233.51 for the training set and 345.56 for the testing set. Can anyone help me with finding out the reason behind this?
Also, How to improve the accuracy of my network as it very low in every iteration?
How should I decide the parameters for my LSTM network. (units, epochs, batch_size, time_steps to input)
Thank you in advance for any help extended.
Your question requires a lot more information. For example, data size, timesteps lookback, data preprocessing procedure, etc. But I would recommend you to debug your problem with the following method. First, check whether your input/output data are processed properly or not. Then, try to train the simpler model apart from LSTM as it could result in overfitting. But sometimes, if the input signal is too random, it is normal that your model results would highly fluctuate as there's no correlation in data.
PS. never use the Machine Learning model to predict stock price. It never works.

Why does removing validation samples from Keras model improve test accuracy so much

I'm doing a programming assignment for Andrew Ng's Deep Learning course on Convolutional Models that involves training and evaluating a model using Keras. What I've observed after a little playing with various knobs is something curious: The test accuracy of the model greatly improves (from 50 percentile to 90 percentile) by setting the validation_fraction parameter on the Model.fit operation to 0. This is surprising to me; I would have thought that eliminating the validation samples would lead to over-fitting of the model, which would, in turn, reduce accuracy on the test set.
Can someone please explain why this is happening?
You're right, there is more training data, but the increase is pretty negligible since dI was setting the validation fraction to 0.1, so that would increase the training data by 11.111...% However, thinking about it some more, I realized that removing the validation step doesn't have any effect on the model, hence no impact on test accuracy. I think that I must have changed some other parameter, too, though I don't remember which.
As Matias says, it means there is more training data to work with.
However, I'd also make sure that the test accuracy is actually increasing from 50 to 90% consistently. Run it over a couple times to make sure. There is a possibility that, because there is very little validation samples, that the model got lucky. That's why it is important to have a lot of validation data - to make sure the model isn't just getting lucky, and that there's actually a method to the madness.
I go over some of the "norms" when it comes to training and testing data in my book about stock prediction (another great way in my opinion to learn about Deep Learning). Feel free to check it out and learn more, as it's great for beginners.
Good Luck!

Validation loss in keras while training LSTM and stability of LSTM

I am using Keras now to train my LSTM model for a time series problem. My activation function is linear and the optimizer is Rmsprop.
However, i observe the tendency that while the training loss is decreasing slowly overtime, and fluctuate around a small value, the validation loss jumps up and down with a large variance.
Therefore, I come up with two questions:
1. Does the validation loss affect the training process? Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
2. How can i make the model more stable so that it will return a more stable values of validation loss?
Thanks
Does the validation loss affect the training process?
No. The validation loss is just a small sample of data that is excluded from the training process. It is run through the network at the end of an epoch, to test how well training is going, so that you can check if the model is over fitting (i.e. training loss much < validation loss).
Fluctuation in validation loss
This is bit tougher to answer without the network or data. It could just mean that your model isn't converging well to unseen data, meaning that its not seeing a enough similar trends from training data to validation data, and each time the weights are adjusted to better suit the training data, the model becomes less accurate for the validation set. You could possibly turn down the learning rate, but if your training loss is decreasing slowly, the learning rate is probably fine. I think in this situation, you have to ask yourself a few questions. Do I have enough data? Does a true time series trend exist in my data? Have I normalized my data correctly? Is my network to large for the data I have?
I had this issue - while training loss was decreasing, the validation loss was not decreasing. I checked and found while I was using LSTM:
I simplified the model - instead of 20 layers, I opted for 8 layers.
Instead of scaling within range (-1,1), I choose (0,1), this right there reduced my validation loss by magnitude of one order
I reduced the batch size from 500 to 50 (just trial and error)
I added more features, which I thought intuitively would add some new intelligent information to the X->y pair
Possible reasons:
Your validation set is very small compare to your trainning set which usually happens. A little change of weights makes validation loss fluctuate much more than trainning loss. This may not neccessary mean that your model is overfiting. As long as the overall trendency of validation loss keeps decreasing.
May be your train and validation data are from different sources, they may have different distributions. This may happen when your data is time series, and you split your train/validation data by a specific timestamp.
Does the validation loss affect the training process?
No, validation(forward-pass-once) and training(forward-and-backward) are different processes. Hence a single forword pass does not change how would you train next.
Will the algorithm look at the validation loss and slow down the learning rate in case it fluctuates alot?
No, But I guess you can implement your own method to do so. However, one thing should be noted, the model is trying to learn the best solution to your cost function which are fed by trainning data only, so changing this learning rate by observing validation loss doesnt make too much sense.
How can i make the model more stable so that it will return a more stable values of validation loss?
The reasons are expained above. If it is the first case, enlarge validation set will make your loss looks more stable but it does NOT mean it fits better. My suggestion is as long as your are sure your model does not overfit (gap between train loss and validation loss are not too large ), you can just save the model which gives the lowest validation loss.
If its the second case, it can be complecated depend on your case. You could try to exclude samples in trainning set which are not "similar" with your validation set, or enlarge your model's capacity if you have enough data. Or perhapes add more metrics to monitor how well the training.

Resources