Can the values of the targets be NAN on multiple outputs neural network on lasagne? - theano

Having a dataset for which 3 different outcomes exist, but there might be missing values for these outcomes, can neural networks training in lasagne deal with this NAN or should a neural network be trained for each outcome (where NAN cases are removed from training)?
Lets say we have the following targets:
1,2,3
NAN,1,5
1,NAN,2
0,NAN,NAN
Please note that imputation is not what I am interest in.

Related

How to do backpropagation only on a select few labels instead of all labels in a multilabel classification?

I am using a pretrained neural network (resnet) on multiple datasets.
This neural network would have in it's output all the labels that are present in all the datasets,that is, like an union of all labels.
For example:
If dataset A has labels x,y,z,w
Dataset B has labels -m,l,n,o,x,y
Dataset C has labels-w,z,m,o
Then the neural network would have all labels in it's final layer-that is->m,l,n,o,w,x,y,z.
Now depending on which dataset I have, I want the model to train only on the dataset's own labels and not do backpropagation on other labels.
How can this be achieved?
I am working in Pytorch.
Maybe use three loss functions each of which has different values for the argument pos_weight such that it has zeros corresponding to the classes not included in a dataset.
Why do you care if the network does backpropagation on other labels? That is how it is supposed to work.
If the idea is to reduce the number of output features from what they have in the pretrained network, just remove the last layer of the network and add in your own with the desired output features. Then train as you would.

Can `nan` weights occur, in PyTorch, with increase in the number of a epochs?

I am trying to run a neural network model with a more number of epochs (say 800) in PyTorch. I observed that the network is generating nan values at some epoch (say 200) and may behave normally after some epochs (say 400) or may continue with nan values (till 800). I am guessing that some weights of the neural network are becoming nan.
Is it true that if I increase the number of epochs, then the weights of the neural network may become nan?

Keras Nan value when computing the loss

My question is related to this one
I am working to implement the method described in the article https://drive.google.com/file/d/1s-qs-ivo_fJD9BU_tM5RY8Hv-opK4Z-H/view . The final algorithm to use is here (it is on page 6):
d are units vector
xhi is a non-null number
D is the loss function (sparse cross-entropy in my case)
The idea is to do an adversarial training, by modifying the data in the direction where the network is the most sensible to small changes and training the network with the modified data but with the same label as the original data.
The loss function used to train the model is here:
l is a loss measure on the labelled data
Rvadv is the value inside the gradient in the picture of algorithm 1
the article chose alpha = 1
The idea is to incorporate the performances of the model for the labelled dataset in the loss
I am trying to implement this method in Keras with the MNIST dataset and a mini-batch of 100 data. When I tried to do the final gradient descent to update the weights, after some iterations I have Nan values that appear, and I don't know why. I posted the notebook on a collab session (I
don't for how much time it will stand so I also post the code in a gist):
collab session: https://colab.research.google.com/drive/1lowajNWD-xvrJDEcVklKOidVuyksFYU3?usp=sharing
gist : https://gist.github.com/DridriLaBastos/e82ec90bd699641124170d07e5a8ae4c
It's kind of stander problem of NaN in training, I suggest you read this answer about issue NaN with Adam solver for the cause and solution in common case.
Basically I just did following two change and code running without NaN in gradients:
Reduce the learning rate in optimizer at model.compile to optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
Replace the C = [loss(label,pred) for label, pred in zip(yBatchTrain,dumbModel(dataNoised,training=False))] to C = loss(yBatchTrain,dumbModel(dataNoised,training=False))
If you still have this kind of error then the next few thing you could try is:
Clip the loss or gradient
Switch all tensor from tf.float32 to tf.float64
Next time when you facing this kind of error, you could using tf.debugging.check_numerics to find root cause of the NaN

NAN sometimes for same image on training my yolo tensorflow model for loss

Success instanceI am running my custom YOLO (you only look once) network implemented purely in Tensor-flow. However, the loss calculation for the same image varies with each time I feed it to the network. The confidence and width becomes NAN sometimes. Any suggestion on how to debug this and what could be possible cause for this behavior?
Failed instance
I have checked my input image is in range 0-1. placeholders data type is float 32.
Kernel Initializer of the network was constrained to output in between 0-1. This solved my NAN problem.

NaN loss occurs with custom loss function, even if gradient is set to 0

I've been trying to implement a custom loss function for a TF Estimator, but TensorFlow is returning NaN losses. This occurs even when setting the learning rate to low numbers (1e-10) or 0. When forcefully setting the gradient to 0, the neural network works, implying an issue with the gradients. I have already checked the dataset for NaN values, outliers, etc. In addition, I have attempted to remove any functions that could potentially interfere with the automatic differentiation, to no avail (e.g. map_fn). What else could be causing these problems?
The loss function does not weight all of the neural network's predictions equally; some predictions are taken into account multiple times when generating the loss
The loss function also requires features not run through the neural network
Error message:
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

Resources