training loss is nan in keras LSTM - keras

I have tun this code in google colab with GPU to create a multilayer LSTM. It is for time series prediction.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM, LSTM, BatchNormalization
from keras.optimizers import SGD
model = Sequential()
model.add(LSTM(units = 50, activation = 'relu', return_sequences=True, input_shape=
(1,len(FeaturesDataFrame.columns))))
model.add(Dropout(0.2))
model.add(LSTM(3, return_sequences=False))
model.add(Dense(1))
opt = SGD(lr=0.01, momentum=0.9, clipvalue=5.0)
model.compile(loss='mean_squared_error', optimizer=opt)
Note that I have used used the gradient-clipping. But still, when I train this model, it return nan as the training loss:
history = model.fit(X_t_reshaped, train_labels, epochs=20, batch_size=96, verbose=2)
This is the result
Epoch 1/20
316/316 - 2s - loss: nan
Epoch 2/20
316/316 - 1s - loss: nan
Epoch 3/20
316/316 - 1s - loss: nan
Epoch 4/20
316/316 - 1s - loss: nan
Epoch 5/20
316/316 - 1s - loss: nan
Epoch 6/20
316/316 - 1s - loss: nan
Epoch 7/20
316/316 - 1s - loss: nan
Epoch 8/20
316/316 - 1s - loss: nan
Epoch 9/20
316/316 - 1s - loss: nan
Epoch 10/20
316/316 - 1s - loss: nan
Epoch 11/20
316/316 - 1s - loss: nan
Epoch 12/20
316/316 - 1s - loss: nan
Epoch 13/20
316/316 - 1s - loss: nan
Epoch 14/20
316/316 - 1s - loss: nan
Epoch 15/20
316/316 - 1s - loss: nan
Epoch 16/20
316/316 - 1s - loss: nan
Epoch 17/20
316/316 - 1s - loss: nan
Epoch 18/20
316/316 - 1s - loss: nan
Epoch 19/20
316/316 - 1s - loss: nan
Epoch 20/20
316/316 - 1s - loss: nan

I'm more familiar with working with PyTorch than Keras. However there are still a couple of things I would recommend doing:
Check your data. Ensure that there are no missing or null values in the data that you pass into your model. This is is the most likely culprit. A single null value will cause the loss to be NaN.
You could try lowering the learning rate (0.001 or something even smaller) and/or removing gradient clipping. I've actually had gradient contributing be the cause of NaN loss before.
Try scaling your data (though unscaled data will usually cause infinite losses rather than NaN loses). Use StandardScaler or one of the other scalers in sklearn.
If all that fails then I'd try to just pass some very simple dummy data into the model and see if the problem persists. Then you will know if it is a code problem or a data problem. Hope this helps and feel free to ask questions if you have them.

Related

neural network always give me accuracy equal zero with Keras [duplicate]

This question already has answers here:
What function defines accuracy in Keras when the loss is mean squared error (MSE)?
(3 answers)
Closed 7 months ago.
import keras
import pandas as pd
import numpy as np
from google.colab import files
uploaded = files.upload()
import io
dataset = pd.read_csv(io.BytesIO(uploaded['kc_house_data.csv']))
dataset.head()
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
0 7129300520 20141013T000000 221900.0 3 1.00 1180 5650 1.0 0 0 ... 7 1180 0 1955 0 98178 47.5112 -122.257 1340 5650
1 6414100192 20141209T000000 538000.0 3 2.25 2570 7242 2.0 0 0 ... 7 2170 400 1951 1991 98125 47.7210 -122.319 1690 7639
2 5631500400 20150225T000000 180000.0 2 1.00 770 10000 1.0 0 0 ... 6 770 0 1933 0 98028 47.7379 -122.233 2720 8062
3 2487200875 20141209T000000 604000.0 4 3.00 1960 5000 1.0 0 0 ... 7 1050 910 1965 0 98136 47.5208 -122.393 1360 5000
4 1954400510 20150218T000000 510000.0 3 2.00 1680 8080 1.0 0 0 ... 8 1680 0 1987 0 98074 47.6168 -122.045 1800 7503
5 rows × 21 columns
features = dataset.drop(columns=['price', 'id', 'date'])
labels = dataset[['price']]
model = keras.models.Sequential([
keras.layers.Dense(19, 'relu', input_shape=(18,)),
keras.layers.Dense(19, 'relu'),
keras.layers.Dense(1)
])
from keras import metrics
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
model.fit(features, labels, epochs=10, batch_size=5)
Epoch 1/10
676/676 [==============================] - 2s 3ms/step - loss: 65482420224.0000 - accuracy: 0.0000e+00
Epoch 2/10
676/676 [==============================] - 2s 3ms/step - loss: 65566482432.0000 - accuracy: 0.0000e+00
Epoch 3/10
676/676 [==============================] - 2s 3ms/step - loss: 65582268416.0000 - accuracy: 0.0000e+00
Epoch 4/10
676/676 [==============================] - 2s 3ms/step - loss: 65601855488.0000 - accuracy: 0.0000e+00
Epoch 5/10
676/676 [==============================] - 2s 3ms/step - loss: 65537380352.0000 - accuracy: 0.0000e+00
Epoch 6/10
676/676 [==============================] - 2s 2ms/step - loss: 65665077248.0000 - accuracy: 0.0000e+00
Epoch 7/10
676/676 [==============================] - 2s 2ms/step - loss: 65604460544.0000 - accuracy: 0.0000e+00
Epoch 8/10
676/676 [==============================] - 2s 2ms/step - loss: 65511895040.0000 - accuracy: 0.0000e+00
Epoch 9/10
676/676 [==============================] - 2s 3ms/step - loss: 65589620736.0000 - accuracy: 0.0000e+00
Epoch 10/10
676/676 [==============================] - 2s 3ms/step - loss: 65584775168.0000 - accuracy: 0.0000e+00
<keras.callbacks.History at 0x7f3dc673af90>
This is a regression problem, not a classification one.
You can't use accuracy as a metric for a regression problem.
Change metrics to 'mean_squared_error' in the model.compile method.
something like this:-
model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])

Nan loss from Keras Sequential Model Training

I have a Tensorflow Sequential Network which is returning a loss value of Nan consistently during training.
I am using pandas and keras.
An example of the data is :
Actual_GP1 Budgeted_GP_Value_Cleanup Budgeted_GP_Value_New \
0 2.0 2.0 95.00
1 2.0 2.0 63684.55
3 2.0 2.0 26022.57
4 2.0 2.0 440759.17
6 2.0 2.0 95.00
7 2.0 2.0 3519120.00
9 2.0 2.0 4.00
12 2.0 2.0 4.00
13 2.0 2.0 355960.00
14 2.0 2.0 62745.00
Costing_Date Created_Time Date_Time_16 Delivery_Date Engineering_Date \
0 4 1.579523 4.0 4.0 4
1 4 1.575390 4.0 4.0 4
3 4 1.575471 4.0 4.0 4
4 4 1.575020 4.0 4.0 4
6 4 1.579508 4.0 4.0 4
7 4 1.578304 4.0 4.0 4
9 4 1.574600 4.0 4.0 4
12 4 1.570805 4.0 4.0 4
13 4 1.573831 4.0 4.0 4
14 4 1.576153 4.0 4.0 4
Exchange_Rate GP ... Last_Activity_Time Modified_Time \
0 2.0 100.0 ... 4.000000 1.579523
1 2.0 30.0 ... 1.579519 1.579519
3 2.0 44.0 ... 1.579516 1.579516
4 2.0 37.0 ... 1.579516 1.579516
6 2.0 100.0 ... 4.000000 1.579508
7 2.0 44.0 ... 1.579507 1.579507
9 2.0 100.0 ... 1.579506 1.579506
12 2.0 32.0 ... 1.579506 1.579506
13 2.0 44.0 ... 1.579506 1.579506
14 2.0 44.5 ... 1.579506 1.579506
Next_step_actioned_by PO_Date PO_Week Production_End_Date \
0 4.0 1.580429 4.000000 4
1 4.0 1.579824 1.579478 4
3 4.0 1.575850 1.575850 4
4 4.0 1.575418 1.575245 4
6 4.0 1.580429 4.000000 4
7 4.0 1.583798 1.583798 4
9 4.0 1.579219 1.578874 4
12 4.0 1.580429 1.580083 4
13 4.0 1.585613 1.585526 4
14 4.0 1.580429 1.580083 4
Production_Start_Date Project_Value Prototype_Date \
0 4 95.00 4
1 4 212281.82 4
3 4 3.00 4
4 4 4.00 4
6 4 95.00 4
7 4 7998000.00 4
9 4 4.00 4
12 4 4.00 4
13 4 809000.00 4
14 4 141000.00 4
Revenue_Forecast_Probability_Weighting
0 1.0
1 2.0
3 3.0
4 4.0
6 1.0
7 5.0
9 4.0
12 4.0
13 7.0
14 8.0
I understand some of the dates in this sample are categorically labelled, but that is due to missing values.
The target value for this model is a probability of success, which is based on historical data, and i have left that out of this question. It's a value [0,100].
and the network configuration is :
dataset=tf.data.Dataset.from_tensor_slices((df.values, target.values))
train_dataset=dataset.shuffle(len(df)).batch(1)
print(df.shape)
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(24, activation='relu', input_shape=(df.shape[-1],)),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(8, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='mse',metrics=['accuracy'])
return model
model=get_compiled_model()
model.fit(train_dataset, epochs=20)
model.save("keras_saved_model.h5")
with an output of
(574, 24)
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 574 steps
Epoch 1/20
574/574 [==============================] - 2s 3ms/step - loss: nan - acc: 0.3275
Epoch 2/20
574/574 [==============================] - 1s 1ms/step - loss: nan - acc: 0.6655
Epoch 3/20
574/574 [==============================] - 1s 1ms/step - loss: nan - acc: 0.6655
Epoch 4/20
574/574 [==============================] - 1s 1ms/step - loss: nan - acc: 0.6655
Epoch 5/20
574/574 [==============================] - 1s 1ms/step - loss: nan - acc: 0.6655
Epoch 7/20
574/574 [==============================] - 1s 1ms/step - loss: nan - acc: 0.6655
and so on.
Could someone please point me in the right direction regarding this consistent accuracy and these null loss values.
EDIT:
The solution was to divide the target value by 100 so it would fit in the range [0,1], since the final activation layer is a sigmoid function.
Thanks to Matias Valdenegro for pointing this out
Providing answer here for the community even if the answer is provided in the comment section.
Since the target value ranges from [0,100] the user has normalized the value by diving it by 100, and used the sigmoid activation function, which resolved the issue.
You can apply the normalize function for a feature using the below code.
To get min and max value of a numerical column:
def _z_score_params(column):
mean = traindf[column].min()
std = traindf[column].max()
return {'min': min, 'max': max}
def zscore(col):
min_value = _z_score_params(col)[min]
max_value = _z_score_params(col)[max]
return (col - min_value)/max_value
feature_name = ‘column_name_to_normalize’
normalized_feature = tf.feature_column.numeric_column(
feature_name,
normalizer_fn=zscore)

Sample weighting didn't help in imbalanced data training

I am training a two-layer LSTM network with 16 through 32 cells in each layer and had a fairly imbalanced dataset for training. Based on my seven class frequencies, the sample weights calculated through the simple formula of total_samples/class_frequency is [3.7, 5.6, 26.4, 3.2, 191.6, 8.4, 13.2], and I add this weight for each sample to the tuple of (data, label) output of my dataset generator to run my Keras model.fit() function. The training code was:
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
mc = ModelCheckpoint(model_file, monitor='val_acc', mode='max', verbose=1, save_best_only=True)
es = EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=50)
history = model.fit(train_data, epochs=epochs, steps_per_epoch = train_steps, validation_data=val_data,
validation_steps = val_steps, verbose=verbose, callbacks=[es, mc])
Then I used the best saved model to evaluate it and calculate performance statistics by this code (my data is in tensorflow datasets):
saved_model = load_model(model_file)
iterator = test_data.make_one_shot_iterator()
next_element = iterator.get_next()
y_test = y_pred = np.empty(0)
for i in range(test_steps):
batch = sess.run(next_element)
x_test_batch = batch[0]
y_test_batch = batch[1]
y_pred_batch = saved_model.predict_on_batch(x_test_batch)
y_test = np.append(y_test, np.argmax(y_test_batch, axis=1))
y_pred = np.append(y_pred, np.argmax(y_pred_batch, axis=1))
print('\nTest data classification report:\n{}\n'.format(classification_report(y_test, y_pred)))
But what I see in the output statistics is that the weighted stats are overall worse than unweighted ones (setting all weights equally to 1), even for rare classes (highest weights). Here is the stat:
For weighted run:
class prec. recall f1 support
0.0 1.00 0.97 0.98 79785
1.0 0.89 0.88 0.88 52614
2.0 0.61 0.76 0.68 11090
3.0 0.96 0.93 0.95 91160
4.0 0.59 0.92 0.72 1530
5.0 0.89 0.90 0.89 34746
6.0 0.81 0.87 0.84 22289
accuracy 0.92 293214
macro avg 0.82 0.89 0.85 293214
For unweighted run:
class prec. recall f1 support
0.0 0.99 0.98 0.99 79785
1.0 0.89 0.90 0.90 52614
2.0 0.79 0.66 0.72 11090
3.0 0.95 0.96 0.95 91160
4.0 0.85 0.82 0.83 1530
5.0 0.89 0.92 0.90 34746
6.0 0.88 0.86 0.87 22289
accuracy 0.93 293214
macro avg 0.89 0.87 0.88 293214
what is wrong here?
You should be using the class_weight in fit function or fit_generator to apply weights to your classes.
First you have to create a dictionary with label:weight format:
class_weight = {0: 3.7,
1: 5.6,
2: 2.64,...}
Then apply it to your fit function:
history = model.fit(train_data, epochs=epochs, steps_per_epoch = train_steps, validation_data=val_data,
class_weight=class_weight, validation_steps = val_steps, verbose=verbose, callbacks=[es, mc])
If you want to apply a weight per instance, then you need to create an array that contains the weight for the corresponding instance in the training data and set it in sample_weight in fit function.

loss NAN when use keras training ANN classification

I have some data and wanting to classification.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2474 entries, 0 to 5961
Data columns (total 4 columns):
Age 2474 non-null int64
Pre_Hospitalization_Disposal 2474 non-null object
Injury_to_hospital_time 2474 non-null float64
Discharge_results 2474 non-null int64
dtypes: float64(1), int64(2), object(1)
memory usage: 96.6+ KB
Age, Pre_Hospitalization_Disposal, Injury_to_hospital_time is feature data.
Discharge_results is wanting to predict.
I have check my data not null.
print(len(DataSet.index[(pd.isnull(DataSet['Age'])) |
(pd.isnull(DataSet['Pre_Hospitalization_Disposal'])) |
(pd.isnull(DataSet['Injury_to_hospital_time'])) |
(pd.isnull(DataSet['Discharge_results']))]))
My code:
(train, test) = train_test_split(DataSet, test_size=0.2, random_state=42)
trainY = train["Discharge_results"].astype('float')
testY = test["Discharge_results"].astype('float')
cs = MinMaxScaler()
trainContinuous = cs.fit_transform(train[['Age','Injury_to_hospital_time']])
testContinuous = cs.transform(test[['Age','Injury_to_hospital_time']])
zipBinarizer = LabelBinarizer().fit(DataSet["Pre_Hospitalization_Disposal"])
trainCategorical = zipBinarizer.transform(train["Pre_Hospitalization_Disposal"])
testCategorical = zipBinarizer.transform(test["Pre_Hospitalization_Disposal"])
trainX = np.hstack([trainCategorical, trainContinuous])
testX = np.hstack([testCategorical, testContinuous])
model = Sequential()
model.add(Dense(16, input_dim=trainX.shape[1] ,activation="relu"))
model.add(Dense(8, activation="relu"))
model.add(Dense(1, activation="softmax"))
model.compile(loss="sparse_categorical_crossentropy", optimizer='Adam')
history = model.fit(trainX, trainY, validation_data=(testX, testY),epochs=200, batch_size=32)
but I get loss NAN when training.
results:
Train on 1979 samples, validate on 495 samples
Epoch 1/10
1979/1979 [==============================] - 2s 1ms/step - loss: nan - val_loss: nan
Epoch 2/10
1979/1979 [==============================] - 0s 165us/step - loss: nan - val_loss: nan
Epoch 3/10
1979/1979 [==============================] - 0s 139us/step - loss: nan - val_loss: nan
Epoch 4/10
1979/1979 [==============================] - 0s 137us/step - loss: nan - val_loss: nan
Epoch 5/10
1979/1979 [==============================] - 0s 137us/step - loss: nan - val_loss: nan
Epoch 6/10
1979/1979 [==============================] - 0s 141us/step - loss: nan - val_loss: nan
Epoch 7/10
1979/1979 [==============================] - 0s 138us/step - loss: nan - val_loss: nan
Epoch 8/10
1979/1979 [==============================] - 0s 141us/step - loss: nan - val_loss: nan
Epoch 9/10
1979/1979 [==============================] - 0s 140us/step - loss: nan - val_loss: nan
Epoch 10/10
1979/1979 [==============================] - 0s 144us/step - loss: nan - val_loss: nan
Does anyone can help me? Many thanks!!
I looks like there is a mismatch between your labels and training loss. The loss sparse_categorical_crossentropy is for classification models with multiple categories. If you want to use this loss your labels should be integers (the index of the correct category) but I see in your code that your labels are floats:
trainY = train["Discharge_results"].astype('float')
Moreover, the last Dense layer of your model should have n_classes hidden units instead of just 1.
If your labels are really floats you are probably working on a regression problem and should use a different loss function (for example mean_squared_error).

Keras with TensorFlow: ... always got loss: nan when one input column all zeros

I am using tensorflow==1.2.1 and Keras==2.0.6 to build a model:
input_num = X_norm_keras[:,2:].shape[1]
model_keras = Sequential()
model_keras.add(Dense(5, input_dim=input_num, activation='relu',kernel_regularizer=regularizers.l2(0.2)))
model_keras.add(Dense(1, activation='linear',kernel_regularizer=regularizers.l2(0.2)))
model_keras.compile(loss='mean_squared_error', optimizer='adam')
model_keras.fit(X_norm_train[:,2:], y_norm_train, batch_size=20, epochs=10)
But got the following output:
Epoch 1/10
20/20 [==============================] - 0s - loss: nan
Epoch 2/10
20/20 [==============================] - 0s - loss: nan
Epoch 3/10
20/20 [==============================] - 0s - loss: nan
Epoch 4/10
20/20 [==============================] - 0s - loss: nan
Epoch 5/10
20/20 [==============================] - 0s - loss: nan
Epoch 6/10
20/20 [==============================] - 0s - loss: nan
Epoch 7/10
20/20 [==============================] - 0s - loss: nan
Epoch 8/10
20/20 [==============================] - 0s - loss: nan
Epoch 9/10
20/20 [==============================] - 0s - loss: nan
Epoch 10/10
20/20 [==============================] - 0s - loss: nan
Update:
I found out that the cause was due to one input column is all zero. However, I am wondering why would this become a problem in TensorFlow? In the real life, it is possible that one of the input feature in the training set are all zeros in the desired time period. Also, other algorithms such as random forest or ridge regression handle such cases fine. Why would TensorFlow failed in such case?

Resources