Regression with Keras API not giving consistent result - keras

I am doing a comparative study on a simple regression (one independent variable and one target variable) in two ways:- LinearRegression vs neural network (NN - Keras API). My sample data as follows:
x1 y
121.9114 121.856
121.856 121.4011
121.4011 121.3222
121.3222 121.9502
121.9502 122.0644
LinearRegression Code:
lr = LinearRegression()
lr.fit(X_train, y_train)
Note: LR model gives me RMSE 0.22 consistently in each subsequent run.
NN Code:
nn_model = models.Sequential()
nn_model.add(layers.Dense(2, input_dim=1, activation='relu'))
nn_model.add(layers.Dense(1))
nn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
nn_model.fit(X_train, y_train, epochs=40, batch_size=32)
Training Loss:
Epoch 1/40 539/539 [==============================] - 0s 808us/sample - loss: 16835.0895 -
mean_absolute_error: 129.5276
Epoch 2/40 539/539 [==============================] - 0s 163us/sample - loss: 16830.6868 -
mean_absolute_error: 129.5106
Epoch 3/40 539/539 [==============================] - 0s 204us/sample - loss: 16826.2856 -
mean_absolute_error: 129.4935
...........................................
...........................................
Epoch 39/40 539/539 [==============================] - 0s 187us/sample - loss: 16668.3582 -
mean_absolute_error: 128.8823
Epoch 40/40 539/539 [==============================] - 0s 168us/sample - loss: 16663.9828 -
mean_absolute_error: 128.8654
NN based solution gives me RMSE = 136.7476
Interestingly NN based solution gives me different RMSE in different run because training loss appears different in each run.
For example in first run as shown above loss starts with 16835 and final loss in 40th epoch is 16663. In this case model gives me RMSE=136.74
If i run the same code second time then loss starts with 16144 and final loss in 40th iteration is 5. In this case if RMSE comes to 7.3.
Sometimes i see RMSE as 0.22 also when training loss starts with 400 and ends (40th epoch) with 0.06.
This Keras behavior giving me hard time to understand if there is a problem with Keras API or i am doing something wrong or this problem statement is not suitable for Keras.
Could you please help me in understanding the issue and what could be the best way to stabilize the NN based solution ?
Some Additional Info:
My training and test data is always fixed so no data is shuffled.
number of records in train data = 539
number of records in test data = 154
tried MinMaxScaling also on train & test but doesn't bring stability in prediction.

there are multiple questions regarding the consistency/reproducibility of Keras. I have already answered that here a while ago and since then I have realized that some other modifications need to be done to achieve consistency:
According to Keras FAQ and this Kaggle experiment you CANNOT achieve consistency if you are using GPU processing. So they recommend you to set CUDA_VISIBLE_DEVICES="" and set the python hash generator to a fixed seed with PYTHONHASHSEED=0 (this must be done outside the script you're using Keras in).
You also have to set some seeds:
1)numpy random seed
import numpy as np
np.seed(1)
2)tensor flow random seed
import tensorflow as tf
tf.set_random_seed(2)
3)python random seed
import random
random.seed(3)
Additionally, you have to set two (if you have multiprocessing capabilities) arguments to model.fit. These ones are not often mentioned on the answers I've seen around:
model.fit(..., shuffle=False, use_multiprocessing=False)
Make sure that you are training your model on a cpu. Later versions of tensorflow-gpu might be able to identify and select a GPU even when you set CUDA_VISIBLE_DEVICES="".

Related

VAE in Keras to visualize latent space on 3 classes of images

I am training a Variational Auto-Encoder (VAE) with unlabelled input images. My interest here is to visualize the 3 classes of unlabelled data in the latent space.
I set the latent dimension to 128 and further use PCA to visualize in 2D.
I am new to this and seeking some clarity on this.
Firstly, while I train the network, I see the accuracy and validation accuracy being displayed. Since the input images to the network are not labeled, I wonder what exactly is accuracy computed based on. (According to what I have read, accuracy = number of samples correctly predicted/ total number of samples).
Secondly, my training code looks like this:
vae.compile(optimizer='rmsprop', loss=kl_reconstruction_loss, metrics=['accuracy'])
history=vae.fit_generator(X_train, epochs=15,
validation_data=next(X_val), validation_steps=5,
callbacks=[ReduceLROnPlateau(monitor='val_loss', factor=0.5, verbose=2,
patience=4, cooldown=1, min_lr=0.0001)])
During training, validation loss goes to zero and accuracy to 1 in 2 epochs.
Here, for three classes of unlabelled data, the trained network does not very well cluster 3 different classes. Not very clear about why is accuracy shooting up to 1 and loss to 0 while the network is not able to generalize very well on the test dataset
Epoch 1/2
2164/2164 [==============================] - 872s 403ms/step - loss: 6668.9662 - accuracy: 0.7253 - val_loss: 3785921.0000 - val_accuracy: 0.9982
Epoch 2/2
2164/2164 [==============================] - 869s 401ms/step - loss: 0.0000e+00 - accuracy: 1.0000 - val_loss: 3857374.2500 - val_accuracy: 0.9983
Any insights/suggestions?
The purpose of a VAE is to compress the input into a well-behaved latent representation and then use this latent representation to accurately reconstruct the input. There are two terms in the loss for a VAE, each one corresponding to one of the tasks in the first sentence. The first term in the loss - the KL divergence term - forces the latent representation to be drawn from a multidimensional unit Gaussian distribution. The second term in the loss makes the VAE accurately reconstruct the input. Typically people use something like the L2 loss between the input and the output (there are more fancy things to use, but L2 loss usually does OK).
Since the model is not performing classification, accuracy is not a good metric to use. A better way to see how accurately the VAE is reconstructing the input is to monitor something like mean squared error.
The output of your code (where training loss goes to 0 after 1 epoch) indicates that the model is overfitting the training data. Try regularizing your model (or using a less capacious one) or reducing the number of steps per training epoch so you can monitor the performance of your model on the validation data more often.
Also, your usage of next(X_val) is just grabbing one element from your validation set. You probably want to pass in the validation data generator, not a single element from the validation data. Removing the next() call will achieve that.

UNET training: accuracy starts over 0.99

I am trying to do some image segmentation using UNET (similar to this but 2D ). However, the accuracy starts really high even at the beginning of epoch 1.
32/3616 [..............................] - ETA: 4:59:02 - loss: 0.6761 - accuracy: 0.9964
64/3616 [..............................] - ETA: 5:02:32 - loss: 0.4355 - accuracy: 0.9966
Is this normal? For me, it feels like it does not learn!
Which one is indication of learning; loss, accuracy or both?
p.s.: I am using CPU, I will try on GPUs training to speed up.
You are probably dealing with imbalanced dataset. Your network can have the accuracy of 99% when structures you are trying to segment are small (and take, for example, 1% of the image). Then, if your network predicts only 0s, you will get 99% accuracy (because it will be correct to predict 99% of "empty" pixels).
You should use more informative metrics to keep track of your network performance, including Dice score.
Loss is here a better indicator of learning.
Also, training a U-Net for a real life task with CPU will be virtually impossible (would take weeks-months). You should use GPU for good performance.

Keras autoencoder reconstruction error per feature

I saw that h2o have a anomaly parameter reconstruction error per feature, do Keras have such option? I want to change few cases in my test dataset which will have significantly larger reconstruction error, so I was thinking that RE per feature will help me or you have another suggestion for impacting RE? My model accuracy is pretty good, I think:
0s - loss: 0.0768 - acc: 0.9563 - val_loss: 0.1227 - val_acc: 0.9522

Keras output metrics interpretation

I ran my sample code using Keras.
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224))
Flatten(),
Dense(10, activation='softmax')])
model.compile(Adam(lr=1e-4), loss="categorical_crossentropy", metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2,
validation_data=test_batches, nb_val_samples=test_batches.nb_sample)
It gave this output:
None
Epoch 1/2
500/500 [==============================] - 147s - loss: 2.2464 - acc: 0.3520 - val_loss: 6.4765 - val_acc: 0.1100
Epoch 2/2
500/500 [==============================] - 140s - loss: 0.8074 - acc: 0.7880 - val_loss: 3.8807 - val_acc: 0.1450
I'm not able to find the meaning of loss, acc, val_loss, val_acc. Any explanation or link to the doc will be helpful.
This is closest to what I'm looking for. In above code, I'm fitting the model. But it is also giving a validation accuracy. From which data set is this validation accuracy is calculated?
Loss is the objective function that you are minimizing to train a neural network. The loss value is the mean value of the loss function across batches in the training set. Accuracy (acc) is the mean accuracy across batches, also on the training set. Accuracy is just the fraction of samples in the dataset that the model classified correctly.
But the val metrics are computed on the full validation set, which is the dataset you passed on parameter validation_data. This is done to check for overfitting during training.
Regarding your first question: I respectfully recommend you familiarize yourself with the basic mechanics of a neural network or look into one of the many MOOCs i.e. this excellent one from fast.ai. This is also beyond the scope of this forum since it´s doesn´t seem to about programming.
Your validation accuracy is calculated from the data that you provide by setting the validation_data parameter in your model.fit_generator() function. In your case you have set it to test_batches which is methodically very likely not correct. You need to split your data into three sets: One for training, one for validation (and by that seeing the progress of your training in regard to unseen data and getting useful information for tuning your hyperparameters) and one data set for testing (to evaluate the final score of your model).
One more thing: nb_val_samples is not a parameter of fit_generator anymore. See documentation here.

keras giving same loss on every epoch

I am newbie to keras.
I ran it on a dataset where my objective was to reduce the logloss.
For every epoch it is giving me the same loss value. I am confused whether i am on the right track or not.
For example:
Epoch 1/5
91456/91456 [==============================] - 142s - loss: 3.8019 - val_loss: 3.8278
Epoch 2/5
91456/91456 [==============================] - 139s - loss: 3.8019 - val_loss: 3.8278
Epoch 3/5
91456/91456 [==============================] - 143s - loss: 3.8019 - val_loss: 3.8278
Epoch 4/5
91456/91456 [==============================] - 142s - loss: 3.8019 - val_loss: 3.8278
Epoch 5/5
91456/91456 [==============================] - 142s - loss: 3.8019 - val_loss: 3.8278
Here 3.8019 is same in every epoch. It is supposed to be less.
I ran into this issue as well. After much deliberation, I figured out that it was my activation function on my output layer.
I had this model to predict a binary outcome:
model = Sequential()
model.add(Dense(16,input_shape=(8,),activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(1, activation='softmax'))
and I needed this for binary cross entropy
model = Sequential()
model.add(Dense(16,input_shape=(8,),activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(32,activation='relu'))
model.add(Dense(1, activation='sigmoid'))
I would look towards the problem you are trying to solve and the output needed to ensure that your activation functions are what they need to be.
Try decreasing your learning rate to 0.0001 and use Adam. What is your learning rate?
It's actually not clear to see if its the problem of learning rate or model complexity, could you explain a bit further with these instructions:
What is your data size, what is it about?
What is your model's complexity? We can compare your complexity with analysing your data. If your data is too big, you need more complex model.
Did you normalize your outputs? For inputs, it couldn't be a big deal since not-normalization gives results, but if your outputs are some numbers bigger than 1, you need to normalize your data. If you check for your last layer's activation function of model, they're usually sigmoid, softmax, tanh that frequently squeezes your output to 0-1 and -1 - 1. You need to normalize your data according to your last activation function, and then reverse multiplying them for getting real-life result.
Since you're new to deep learning, these advices are enough to check if there's a problem on your model. Can you please check them and reply?
Usually this problem occurs when the model you are training does not have enough capacity (or the cost function is not appropriate). Or in some cases it happens that by mistake the data which we are feeding into the model is not prepared correctly and therefore the labels for each sample might not be correct which makes the model helpless and it will not able to decrease the loss.
I was having the same issue and was using the following model
model = Sequential([
Dense(10, activation ='relu', input_shape=(n_cols, )),
Dense(3, activation ='softmax')
])
I realized that my problem is actually a Regression problem and was using 'softmax' as the final layer activation (which is applicable for classification problem) instead of something else. When I modified the code as below I was able to resolve the issue of getting same loss values in every epoch
model = Sequential([
Dense(10, activation ='relu', input_shape=(n_cols, )),
Dense(3, activation ='relu'),
Dense(1)
])
So, the problem was actually because of using a classification related activation function for a regression problem or vice versa. You may want to check if you are doing the same mistake by any chance.

Resources