'loss: nan' during training of Neural Network in Keras - python-3.x

I am training a neural net in Keras. During training of the first epoch the loss value returns and then suddenly goes loss: nan before the first epoch ends, significantly dropping the accuracy. Then starting the second epoch the loss: nan continues but the accuracy is 0. This goes on for the rest of the epochs.
The frustrating bit is that there seems to be no consistency in the output for each time I train. As to say, the loss: nan shows up at different points in the first epoch.
There have been a couple of questions on this website that give "guides" to problems similar to this I just haven't seen one done so explicitly in keras.
I am trying to get my neural network to classify a 1 or a 0.
Here are some things I have done, post-ceding this will be my output and code.
Standardization // Normalization
I posted a question about my data here. I was able to figure it out and perform sklearn's StandardScaler() and MinMaxScaler() on my dataset. Both standardization and normalization methods did not help my issue.
Learning Rate
The optimizers I have tried are adam and SGD. In both cases I tried lowering the standard learning rate to see if that would help and in both cases. Same issue arose.
Activations
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice.
Batch Size
Tried 32, 50, 128, 200. 50 got me the farthest into the 1st epoch, everything else didn't help.
Combating Overfitting
Put a dropout layer in and tried a whole bunch of numbers.
Other Observations
The epochs train really really fast for the dimensions of the data (I could be wrong).
loss: nan could have something to do with my loss function being binary_crossentropy and maybe some values are giving that loss function a hard time.
kernel_initializer='uniform' has been untouched and unconsidered in my quest to figure this out.
The internet also told me that there could be a nan value in my data but I think that was for an error that broke their script.
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
X_train_total_scale = sc.fit_transform((X_train))
X_test_total_scale = sc.transform((X_test))
print(X_train_total_scale.shape) #(4140, 2756)
print(y_train.shape) #(4140,)
##NN
#adam = keras.optimizers.Adam(lr= 0.0001)
sgd = optimizers.SGD(lr=0.0001, decay=1e-6, momentum=0.9, nesterov=True)
classifier = Sequential()
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu', input_dim=2756))
classifier.add(Dropout(0.6))
classifier.add(Dense(output_dim = 1379, kernel_initializer='uniform', activation='relu'))
classifier.add(Dense(output_dim = 1, kernel_initializer='uniform', activation='sigmoid'))
classifier.compile(optimizer=sgd, loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train_total_scale, y_train, validation_data=(X_test_total_scale, y_test), batch_size=50, epochs=100)
(batch size 200 shown to avoid too-big-a text block)
200/4140 [>.............................] - ETA: 7s - loss: 0.6866 - acc: 0.5400
400/4140 [=>............................] - ETA: 4s - loss: 0.6912 - acc: 0.5300
600/4140 [===>..........................] - ETA: 2s - loss: nan - acc: 0.5300
800/4140 [====>.........................] - ETA: 2s - loss: nan - acc: 0.3975
1000/4140 [======>.......................] - ETA: 1s - loss: nan - acc: 0.3180
1200/4140 [=======>......................] - ETA: 1s - loss: nan - acc: 0.2650
1400/4140 [=========>....................] - ETA: 1s - loss: nan - acc: 0.2271
1600/4140 [==========>...................] - ETA: 1s - loss: nan - acc: 0.1987
1800/4140 [============>.................] - ETA: 1s - loss: nan - acc: 0.1767
2000/4140 [=============>................] - ETA: 0s - loss: nan - acc: 0.1590
2200/4140 [==============>...............] - ETA: 0s - loss: nan - acc: 0.1445
2400/4140 [================>.............] - ETA: 0s - loss: nan - acc: 0.1325
2600/4140 [=================>............] - ETA: 0s - loss: nan - acc: 0.1223
2800/4140 [===================>..........] - ETA: 0s - loss: nan - acc: 0.1136
3000/4140 [====================>.........] - ETA: 0s - loss: nan - acc: 0.1060
3200/4140 [======================>.......] - ETA: 0s - loss: nan - acc: 0.0994
3400/4140 [=======================>......] - ETA: 0s - loss: nan - acc: 0.0935
3600/4140 [=========================>....] - ETA: 0s - loss: nan - acc: 0.0883
3800/4140 [==========================>...] - ETA: 0s - loss: nan - acc: 0.0837
4000/4140 [===========================>..] - ETA: 0s - loss: nan - acc: 0.0795
4140/4140 [==============================] - 2s 368us/step - loss: nan - acc: 0.0768 - val_loss: nan - val_acc: 0.0000e+00
Epoch 2/100
200/4140 [>.............................] - ETA: 1s - loss: nan - acc: 0.0000e+00
400/4140 [=>............................] - ETA: 0s - loss: nan - acc: 0.0000e+00
600/4140 [===>..........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
800/4140 [====>.........................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1000/4140 [======>.......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1200/4140 [=======>......................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1400/4140 [=========>....................] - ETA: 0s - loss: nan - acc: 0.0000e+00
1600/4140 [==========>...................] - ETA: 0s - loss: nan - acc: 0.0000e+00
... and so on...
I hope to be able to get a full training done (duh) but I would also like to learn about some of the intuition people have to figure out these problems on their own!

Firstly, check for NaNs or inf in your dataset.
You could try different optimizers, e.g. rmsprop.
Learning rate could be smaller, though I haven't used anything lower than 0.0001 (which is what you're using) myself.
I thought that it was pretty standard to use relu but I saw on the internet somewhere someone talking about using tanh, tried it, no dice
Try leaky relu, elu if you're concerned about this.

Related

using tfa.layers.crf on top of biLSTM

I am trying to implement NER model based on CRF with tensorflow-addons library. The model gets sequence of words in word to index and char level format and the concatenates them and feeds them to the BiLSTM layer. Here is the code of implementation:
import tensorflow as tf
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from tensorflow.keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
from tensorflow_addons.layers import CRF
word_input = Input(shape=(max_sent_len,))
word_emb = Embedding(input_dim=n_words + 2, output_dim=dim_word_emb,
input_length=max_sent_len, mask_zero=True)(word_input)
char_input = Input(shape=(max_sent_len, max_word_len,))
char_emb = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=dim_char_emb,
input_length=max_word_len, mask_zero=True))(char_input)
char_emb = TimeDistributed(LSTM(units=20, return_sequences=False,
recurrent_dropout=0.5))(char_emb)
# main LSTM
main_input = concatenate([word_emb, char_emb])
main_input = SpatialDropout1D(0.3)(main_input)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
recurrent_dropout=0.6))(main_input)
kernel = TimeDistributed(Dense(50, activation="relu"))(main_lstm)
crf = CRF(n_tags+1) # CRF layer
decoded_sequence, potentials, sequence_length, chain_kernel = crf(kernel) # output
model = Model([word_input, char_input], potentials)
model.add_loss(tf.abs(tf.reduce_mean(kernel)))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy')
When I start to fit the model, I get this Warnings:
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
And training process goes like this:
438/438 [==============================] - 80s 163ms/step - loss: nan - val_loss: nan
Epoch 2/10
438/438 [==============================] - 71s 163ms/step - loss: nan - val_loss: nan
Epoch 3/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 4/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 5/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 6/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 7/10
438/438 [==============================] - 70s 161ms/step - loss: nan - val_loss: nan
Epoch 8/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 9/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 10/10
438/438 [==============================] - 70s 159ms/step - loss: nan - val_loss: nan
I am almost sure that the problem is with the way I am setting the loss functions, but I do not know how exactly I should set them. I also searched for my problem but I did not get any answer.
Also When I test my model, it can not predict labels correct and gives them same labels. Can anybody describe for me that how should I solve this problem?
Ghange your loss function to tensorflow_addons.losses.SigmoidFocalCrossEntropy().I guess categorical crossentropy is not a good choice.

What's the meaning of the number before the progress bar when tensorflow is training

Could anyone tell me what's the meaning of '10' and '49' in the following log of tensorflow?
Much Thanks
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 5.899410247802734 secs
10/10 [==============================] - 23s 2s/step - loss: 2.6726 - acc: 0.1459
49/49 [==============================] - 108s 2s/step - loss: 2.3035 - acc: 0.2845 - val_loss: 2.6726 - val_acc: 0.1459
Epoch 2/100
10/10 [==============================] - 1s 133ms/step - loss: 2.8799 - acc: 0.1693
49/49 [==============================] - 17s 337ms/step - loss: 1.9664 - acc: 0.4042 - val_loss: 2.8799 - val_acc: 0.1693
10 and 49 corresponds to the number of batches which your dataset has been divided into in each epoch.
For example, in your train dataset, there are totally 10000 images and your batch size is 64, then there will be totally math.ceil(10000/64) = 157 batches possible in each epoch.

Predicting the price of the natural gas using LSTM neural network

I want to build a model using Keras to predict the price of the natural gas.
The dataset contains the price for the gas daily and monthly since 1997 and it is available Here.
The following graph shows the prices during a sequence of days. X is days and Y is the price.
I have tried LSTM with 4,50,100 cell in hidden layer but the accuracy still not was bad and the model failed to predict future price.
I have added another two hidden layers (full connected) with 100 and 128 cell but it did not work too.
This is the model and the result form training process:
num_units = 100
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 5
num_epochs = 10
log_file_name = f"{SEQ_LEN}-SEQ-{1}-PRED-{int(time.time())}"
# Initialize the model (of a Sequential type)
model = Sequential()
# Adding the input layer and the LSTM layer
model.add(LSTM(units = num_units, activation = activation_function,input_shape=(None, 1)))
# Adding the output layer
model.add(Dense(units = 1))
# Compiling the RNN
model.compile(optimizer = optimizer, loss = loss_function, metrics=['accuracy'])
# Using the training set to train the model
history = model.fit(train_x, train_y, batch_size = batch_size, epochs =num_epochs,validation_data=(test_x, test_y))
and the output is :
Train on 4362 samples, validate on 1082 samples
Epoch 1/10
4362/4362 [==============================] - 11s 3ms/step - loss: 0.0057 - acc: 2.2925e-04 - val_loss: 0.0016 - val_acc: 0.0018
Epoch 2/10
4362/4362 [==============================] - 9s 2ms/step - loss: 6.2463e-04 - acc: 4.5851e-04 - val_loss: 0.0013 - val_acc: 0.0018
Epoch 3/10
4362/4362 [==============================] - 9s 2ms/step - loss: 6.1073e-04 - acc: 2.2925e-04 - val_loss: 0.0014 - val_acc: 0.0018
Epoch 4/10
4362/4362 [==============================] - 8s 2ms/step - loss: 5.4403e-04 - acc: 4.5851e-04 - val_loss: 0.0014 - val_acc: 0.0018
Epoch 5/10
4362/4362 [==============================] - 7s 2ms/step - loss: 5.4765e-04 - acc: 4.5851e-04 - val_loss: 0.0012 - val_acc: 0.0018
Epoch 6/10
4362/4362 [==============================] - 8s 2ms/step - loss: 5.1991e-04 - acc: 4.5851e-04 - val_loss: 0.0013 - val_acc: 0.0018
Epoch 7/10
4362/4362 [==============================] - 7s 2ms/step - loss: 5.7324e-04 - acc: 2.2925e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 8/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.4248e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 9/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.3868e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
Epoch 10/10
4362/4362 [==============================] - 7s 2ms/step - loss: 4.6654e-04 - acc: 4.5851e-04 - val_loss: 0.0011 - val_acc: 0.0018
How to know the number of layers and cells for problem like this? Anyone can suggest a netwrok structure that can solve this problem?

In Keras model fit which parameters can tell whether Data is wrong or model is not good

I am training a simple model in keras for label classification task with following code.
This dataset has 5 classes so final layer of the network has 5 outputs.
Labels are also one-hot encoded. Here are my results:
32/4000 [..............................] - ETA: 0s - loss: 0.2264 - acc:
0.8750
2176/4000 [===============>..............] - ETA: 0s - loss: 0.3092 - acc:
0.8755
4000/4000 [==============================] - 0s 26us/step - loss: 0.2870 -
acc: 0.8805 - val_loss: 15.9636 - val_acc: 0.0070
Epoch 99/100
32/4000 [..............................] - ETA: 0s - loss: 0.1408 - acc:
0.9688
2176/4000 [===============>..............] - ETA: 0s - loss: 0.2696 - acc:
0.8824
4000/4000 [==============================] - 0s 25us/step - loss: 0.2729 -
acc: 0.8868 - val_loss: 15.9731 - val_acc: 0.0070
Epoch 100/100
32/4000 [..............................] - ETA: 0s - loss: 0.2299 - acc:
0.9375
2176/4000 [===============>..............] - ETA: 0s - loss: 0.2861 - acc:
0.8787
4000/4000 [==============================] - 0s 25us/step - loss: 0.2763 -
acc: 0.8865 - val_loss: 15.9791 - val_acc: 0.0070
10/1000 [..............................] - ETA: 0s
1000/1000 [==============================] - 0s 26us/step
32/5000 [..............................] - ETA: 0s
5000/5000 [==============================] - 0s 9us/step
When do tests at the end of training I get almost 100% error on the test data
I have looked at many related posts, but could not figure out what is wrong, but no luck.
Any advice ?

Keras deep learning output format issue

I'm a newbie in deep learning and Keras. I really hope folks with experience in this field could help me answer the following question.
I downloaded the cifar10_cnn.py code from Keras github. I run it with python 3.5.2, Keras 2.0.2 and tried both backends tensorflow 0.12.0-rco and theano 0.9.0. But unfortunately both of them print output something like below:
Epoch 1/200
1/1562 [..............................] - ETA: 92s - loss: 2.2861 - acc: 0.1562
3/1562 [..............................] - ETA: 65s - loss: 2.3133 - acc: 0.1354
5/1562 [..............................] - ETA: 59s - loss: 2.3202 - acc: 0.1125
7/1562 [..............................] - ETA: 57s - loss: 2.3168 - acc: 0.1071
what I expect is some like below:
Epoch 1/200
32/50000 [..............................] - ETA: 3138s - loss: 2.3238 - acc: 0.0625
64/50000 [..............................] - ETA: 1579s - loss: 2.3165 - acc: 0.0625
96/50000 [..............................] - ETA: 1059s - loss: 2.3091 - acc: 0.0625
128/50000 [..............................] - ETA: 798s - loss: 2.3070 - acc: 0.0781
160/50000 [..............................] - ETA: 643s - loss: 2.3056 - acc: 0.0750
you could observe that 50000/32 = 1562.5, but I don't know why the output was changed like that. It's very confusing for new comer to see the numerator is 1 and denominator is 1562. Is this change related to python3?
Another confusion for me is where the output comes from? which API leads to the above output?

Resources