Keras Nan value when computing the loss - python-3.x

My question is related to this one
I am working to implement the method described in the article https://drive.google.com/file/d/1s-qs-ivo_fJD9BU_tM5RY8Hv-opK4Z-H/view . The final algorithm to use is here (it is on page 6):
d are units vector
xhi is a non-null number
D is the loss function (sparse cross-entropy in my case)
The idea is to do an adversarial training, by modifying the data in the direction where the network is the most sensible to small changes and training the network with the modified data but with the same label as the original data.
The loss function used to train the model is here:
l is a loss measure on the labelled data
Rvadv is the value inside the gradient in the picture of algorithm 1
the article chose alpha = 1
The idea is to incorporate the performances of the model for the labelled dataset in the loss
I am trying to implement this method in Keras with the MNIST dataset and a mini-batch of 100 data. When I tried to do the final gradient descent to update the weights, after some iterations I have Nan values that appear, and I don't know why. I posted the notebook on a collab session (I
don't for how much time it will stand so I also post the code in a gist):
collab session: https://colab.research.google.com/drive/1lowajNWD-xvrJDEcVklKOidVuyksFYU3?usp=sharing
gist : https://gist.github.com/DridriLaBastos/e82ec90bd699641124170d07e5a8ae4c

It's kind of stander problem of NaN in training, I suggest you read this answer about issue NaN with Adam solver for the cause and solution in common case.
Basically I just did following two change and code running without NaN in gradients:
Reduce the learning rate in optimizer at model.compile to optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
Replace the C = [loss(label,pred) for label, pred in zip(yBatchTrain,dumbModel(dataNoised,training=False))] to C = loss(yBatchTrain,dumbModel(dataNoised,training=False))
If you still have this kind of error then the next few thing you could try is:
Clip the loss or gradient
Switch all tensor from tf.float32 to tf.float64
Next time when you facing this kind of error, you could using tf.debugging.check_numerics to find root cause of the NaN

Related

Training loss for Faster-RCNN either becoming Nan or infinity

I want to implement Pytorch Faster-RCNN module on a custom dataset that I curated and labelled. The implementation detail looks straightforward, there was a demo that showed training and inference on a custom dataset (a person detection problem). Just like the person detection dataset, where there is only one class (person) along with background class, my personal dataset also has only one class. I therefore saw no need to make any changes to the hyper-parameters. It is unclear what is causing the training loss to either become Nan or infinity. This happens on the first epoch. Grateful for any suggestions on this issue.
Edit:
I found that my masks were sometimes out of bounds, which was blowing up the loss, so the cumulative loss was either going positive infinity or negative infinity. It probably would not have mattered but for the fact that train_one_epoch function provided by the PyTorch Detection Colab Notebook example with PennFudan dataset added a math condition that sys.exit() -ed when the loss was going math.isfinite().

NaN error during backpropagation of torch.sqrt() even though an epsilon is added inside the square root

I am training a neural network with custom loss function that calls torch.sqrt() at one point. To avoid getting NaN gradients during backpropagation I add a small epsilon value inside the squareroot, however the model parameters still diverge into NaN values from time to time.
I set torch.autograd.set_detect_anomaly(True), which backtracked the NaN error to the following sqrt function.
b = torch.sqrt((m[...,0,0] - m[...,1,1])**2/4 + m[...,0,1]**2 + torch.finfo(torch.float).tiny)
Since the derivative sqrt(\epsilon)'=1/sqrt(epsilon) (approximately 4e18) is still very big, I increased the size of the epsilon value to torch.finfo(torch.half).tiny (which equals roughly 6e-5). The error still persists though. At this point I am out of ideas and also couldn't find any solutions on the internet, so I appreciate any help.
p.s. the optimizer used is L-BFGS if that matters

What do sklearn.cross_validation scores mean?

I am working on a time-series prediction problem using GradientBoostingRegressor, and I think I'm seeing significant overfitting, as evidenced by a significantly better RMSE for training than for prediction. In order to examine this, I'm trying to use sklearn.model_selection.cross_validate, but I'm having problems understanding the result.
First: I was calculating RMSE by fitting to all my training data, then "predicting" the training data outputs using the fitted model and comparing those with the training outputs (the same ones I used for fitting). The RMSE that I observe is the same order of magnitude the predicted values and, more important, it's in the same ballpark as the RMSE I get when I submit my predicted results to Kaggle (although the latter is lower, reflecting overfitting).
Second, I use the same training data, but apply sklearn.model_selection.cross_validate as follows:
cross_validate( predictor, features, targets, cv = 5, scoring = "neg_mean_squared_error" )
I figure the neg_mean_squared_error should be the square of my RMSE. Accounting for that, I still find that the error reported by cross_validate is one or two orders of magnitude smaller than the RMSE I was calculating as described above.
In addition, when I modify my GradientBoostingRegressor max_depth from 3 to 2, which I would expect reduces overfitting and thus should improve the CV error, I find that the opposite is the case.
I'm keenly interested to use Cross Validation so I don't have to validate my hyperparameter choices by using up Kaggle submissions, but given what I've observed, I'm not clear that the results will be understandable or useful.
Can someone explain how I should be using Cross Validation to get meaningful results?
I think there is a conceptual problem here.
If you want to compute the error of a prediction you should not use the training data. As the name says theese type of data are used only in training, for evaluating accuracy scores you ahve to use data that the model has never seen.
About cross-validation I can tell that it's an approach to find the best training/testing set. The process is as follows: you divide your data into n groups and you do various iterating changing the testing group you pick. If you have n groups you will do n iteration and each time the training and testing set will be different. It's more understamdable in the image below.
Basically what you should do it's kile this:
Train the model using months from 0 to 30 (for example)
See the predictions made with months from 31 to 35 as input.
If the input has to be the same lenght divide feature in half (should be 17 months).
I hope I understood correctly, othewise comment.

How do I test my classifier accuracy against random values?

I've set up my first scikit-learn example to play with and I'm trying to gauge accuracy on my predictions. I've got training and test lists set up fine, but I'm getting ~0.95 accuracy even if I give it random values.
This looks to be because I'm checking for 0/1 labels, and 95% of the labels are zero's, so it's guessing on 0's and getting 0.95 accuracy (I think?). Obviously this isn't what I want.
How do I go about deciding if my classifiers are working, and how do I get meaningful accuracy values?
You have a clear class imbalance issue. Your classifier is predicting 0 all the time knowing it will be right 95% of the time. You can inspect this by calling predict(X_test) on your fitted classifier. If all the values are 0 you know this is the case.
To get a better idea on how the model performs you can upsample the data labelled with 1 or down sample the data labelled with 0. You can use this package which builds off scikit-learn and implements a number of resampling methods. Alternatively, you can use scikit learns resampling method. Which will bootstrap new data points for you.

How to correctly get the weights using spark for synthetic dataset?

I'm doing LogisticRegressionWithSGD on spark for synthetic dataset. I've calculated the error on matlab using vanilla gradient descent and on R which is ~5%. I got similar weight that was used in the model that I used to generate y. The dataset was generated using this example.
While I am able to get very close error rate at the end with different stepsize tuning, the weights for individual feature isn't the same. In fact, it varies a lot. I tried LBFGS for spark and it's able to predict both error and weight correctly in few iterations. My problem is with logistic regression with SGD on spark.
The weight I'm getting:
[0.466521045342,0.699614292387,0.932673108363,0.464446310304,0.231458578991,0.464372487994,0.700369689073,0.928407671516,0.467131704168,0.231629845549,0.46465456877,0.700207596219,0.935570594833,0.465697758292,0.230127949916]
The weight I want:
[2,3,4,2,1,2,3,4,2,1,2,3,4,2,1]
Intercept I'm getting: 0.2638102010832128
Intercept I want: 1
Q.1. Is it the problem with synthetic dataset. I have tried tuning with minBatchFraction, stepSize, iteration and intercept. I couldn't get it right.
Q.2. Why is spark giving me this weird weights? Would it be wrong to expect similar weights from Spark's model?
Please let me know if extra details is needed to answer my question.
It actually did converge, your weights are normalized between 0 and 1, while expected max value is for, multiply everything you got from SGD with 4, you can see the correlation even for intercept value.

Resources