Regularization strategy in Keras - keras

I have trying to setup a non-linear regression problem in Keras. Unfortunately, results show that overfitting is occurring. Here is the code,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results without regularization is shown here Without regularization. The mean absolute error for training is much less compared to validation, and both have a fixed gap which is a sign of over-fitting.
L2 regularization was specified for each layer like so,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results for these are shown here L2 regularized result. The MAE for test is close to training which is good. However, the MAE for training is poor at 0.03 (without regularization it was much lower at 0.0028).
What can i do to reduce the training MAE with regularization?

Based on your results, it looks like you need to find the right amount of regularization to balance training accuracy with good generalization to the test set. This may be as simple as reducing the L2 parameter. Try reducing lambda from 0.001 to 0.0001 and comparing your results.
If you can't find a good parameter setting for L2, you could try dropout regularization instead. Just add model.add(Dropout(0.2)) between each pair of dense layers, and experiment with the dropout rate if necessary. A higher dropout rate corresponds to more regularization.

Related

Keras tuner best model does not work better than a manually configured model and MSE is very high for train set with this best model

I am working on timeseries data and I used keras tuner to find the best model. Keras tuner returns a very good MSE for best model. But when I use this best model to predict train and test set, it returns high MSE for training set and lower MSE for test set, but the RMSE is normal for both. Also, when I use the model that I configured manually, the results are better than best model from keras tuner! I cannot understand why the results does not make sense, am I doing something wrong? Here is the code.
`
def build_model(hp):
model = keras.Sequential()
model.add(keras.layers.ConvLSTM2D(filters=hp.Int('units1',
min_value=25, max_value=512, step=32, default=128),
kernel_size=(1,1),
activation=hp.Choice('activation1',
values=['relu', 'tanh', 'sigmoid'], default='relu'),
input_shape=(n_past, 1, 1, 1)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units=hp.Int('units3',
min_value=10, max_value=128, step=8, default=128),
activation=hp.Choice('activation_2',
values=['relu', 'tanh', 'sigmoid'], default='relu')))
model.add(keras.layers.Dense(1, activation=hp.Choice('activation_2',
values=['relu', 'tanh', 'sigmoid'], default='relu')))
model.compile(loss='mae', optimizer=keras.optimizers.Adam(hp.Float('learning_rate',
min_value=1e-4, max_value=1e-2,
sampling='LOG', default=1e-3)), metrics=['mae'])
return model
bayesian_opt_tuner = BayesianOptimization(build_model, objective='mae', max_trials=20, executions_per_trial=1,
directory=os.path.normpath('C:/keras_tuning'), project_name='timeseries_temp_ts_test_from_TF_ex',
overwrite=True)
EVALUATION_INTERVAL = 200
EPOCHS = 2
bayesian_opt_tuner.search(trainX, trainy,
epochs=EPOCHS,
validation_data=(testX, testy),
validation_steps=50,
steps_per_epoch=EVALUATION_INTERVAL)
model = bayesian_opt_tuner.get_best_models(1)[0]
model.summary()
`
The best MSE score is 0.365387, but when I predict the train and test set the MSE is 28.58 for train set and 6.36 for test set and RMSE is 5.35 and 2.52. While with my own model which is below the MSE of train and test set is 5.95 and 2.39 and RMSE is 2.44 and 1.55.
`
model = Sequential()
model.add(ConvLSTM2D(filters=64, kernel_size=(1,1), activation='relu', input_shape=(n_past, 1, 1, 1))) model.add(Flatten())
model.add(Dense(32))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()
`

Keras NN loss is 1

Getting started with simple NN but my loss remains one at each iteration. Can somebody point out what I'm doing wrong here.
This is from a Kaggle introductory course and my modified training set contains shop id, category id, item id, month and revenue. I'm basically trying to predict revenue per shop per category for the following month.
I've scaled revenue and trained on a simple NN with 2 hidden layers; however, it doesn't seem like the training is working as the loss remains constant. I haven't done anything with the labels (ie shop ids, category ids) but I would still think the loss would change on each iteration.
If you have some comments on coding practice, I would be interested as well.
Thanks.
X_train = grouped_train.drop('revenue', axis=1)
y_train = grouped_train['revenue']
print('X & y trains')
print(X_train.head())
print(y_train.head())
scaler = StandardScaler()
y_train = pd.DataFrame(scaler.fit_transform(y_train.values.reshape(-1,1)))
print('Scaled y train')
print(y_train.head())
keras.backend.clear_session()
model = Sequential()
model.add(Dense(30, activation='relu', input_shape=(4,)))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='relu'))
model.summary()
print('Compile & fit')
model.compile(loss='mean_squared_error', optimizer='RMSprop')
model.fit(X_train, scaled_data, batch_size=128, epochs=13)
predictions = pd.DataFrame(model.predict(test))
print('Scaled predictions')
print(predictions.head())
print('Unscaled predictions')
print(pd.DataFrame(scaler.inverse_transform(predictions)).head())
IN
OUT
Looks like you are using the wrong activation for the final layer. You have a regression problem so the standard final activation layer should be activation = 'linear'
model.add(Dense(1, activation='relu'))
model.add(Dense(1, activation='linear'))
Edit:
Additionally model.fit is using 'scaled_data' shouldn't scaled_data be replaced with y_train

How to get 90%+ test accuracy on IMDB data?

I was trying to train a model using IMDB data. I am getting expected train accuracy about 96%+ but I am not satisfied with the test accuracy.Now my expectation is to get 90%+ test accuracy on test data. I tried by using several classifier but each time I am getting 84% to 89% accuracy on test data. Here I am going to include some classifiers I already tried. Most of the cases I tried some parameter tuning by increasing epoch or changing the optimizer. Now my concern is how can I increase the test accuracy to 90%+ .
Classifiers I tried so far:
First:
model = Sequential()
model.add(Embedding(vocab_size, 32, input_length = max_words))
model.add(Bidirectional(LSTM(32, return_sequences = True)))
model.add(GlobalMaxPool1D())
model.add(Dense(20, activation="relu"))
model.add(Dropout(0.05))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train,validation_data=(x_test, y_test),epochs=10,batch_size=100)
Second:
model = Sequential([
Embedding(vocab_size, 32, input_length=max_words),
Dropout(0.2),
ZeroPadding1D(padding=1),
Convolution1D(64, 5, activation='relu'),
Dropout(0.2),
MaxPooling1D(),
Flatten(),
Dense(100, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train,validation_data=(x_test, y_test),epochs=10,batch_size=100)
By checking on State-of-the-art analysis on IMDB dataset, I don't think you can get to ^90% with simple models like those you are using. However, you may try using pretrained embedding like glove instead of training your own embedding. Also, I found this repo have BERT implementation in keras, providing demo of IMBD classification, it is able to get ~99% acc.

Issue with accuracy never changing ANN in kreas

I am trying to build simple ANN to learn how to tell if the two images are similar or not using two distance equations. So here how I set up things. I created a distance between 3 images (1, an anchor, 2 a positive sample, 3 a negative sample) and then created two different distance measurements. 1 using ResNet features and another using hog features. The two distance measurements are then saved with the two picture paths as well as the correct label (0/1) 0 = Same 1 = not the same.
Now I am trying to build out my ANN to learn the difference between the two values and see if this will allow for me to see if two images a similar. But nothing happens when I train up the ANN. I think there are two possibilities.
1: I didn't set up the ann correctly.
2: There is no connection at all.
Please help me see what the issue is:
Here is my code:
# Load the Pandas libraries with alias 'pd'
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
# fix random seed for reproducibility
np.random.seed(7)
import csv
data = pd.read_csv("encoding.csv")
print(data.columns)
X = data[['resnet', 'hog','label']]
x = X[['resnet', 'hog']]
y = X[['label']]
model = Sequential()
#get number of columns in training data
n_cols = x.shape[1]
#add model layers
model.add(Dense(16, activation='relu', input_shape=(n_cols,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation= 'softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x, y,
epochs=30,
batch_size=32,
validation_split=0.10)
Right now all it does is this over and over again:
167/167 [==============================] - 0s 3ms/step - loss: 8.0189 - acc: 0.4970 - val_loss: 7.5517 - val_acc: 0.5263
Here is the csv file that I am using:
EDIT
So I have changed the setup a bit and now it does bounce up to 73% val accuracy. But then it bounces around and ends at 40% what does than mean?
Here is the new model:
model = Sequential()
#get number of columns in training data
n_cols = x.shape[1]
model.add(Dense(256, activation='relu', input_shape=(n_cols,)))
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(1, activation= 'sigmoid'))
#sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#model.compile(loss = "binary_crossentropy", optimizer = sgd, metrics=['accuracy'])
model.compile(loss = "binary_crossentropy", optimizer = 'rmsprop', metrics=['accuracy'])
#model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x, y,
epochs=100,
batch_size=64,
validation_split=0.10)
This makes no sense:
model.add(Dense(1, activation= 'softmax'))
Softmax with one neuron will produce a constant value of 1.0 due to the normalization. For binary classification with the binary_crossentropy loss, you should use one neuron with sigmoid activation.
model.add(Dense(1, activation= 'sigmoid'))
Two things to try :
First add complexity to your network, it is pretty simple, add more layers/neurons in order to capture more information from your data
Start with something like that, and see if it change something :
model.add(Dense(256, activation='relu', input_shape=(n_cols,)))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation= 'sigmoid'))
Second, think to add more epochs, ANN can be long to converge
Update
More things to try :
Normalize and scale your data
Maybe too small dataset -> the more data you get, the better your model will be
Try differents hyper parameter, maybe decrease your learning rate like 1e-4 or 1e-5, try differents batch_size, ..
Add more regularization: try dropout between each layer

Find Most Important Input from a Neural Network

I trained a neural network with 37 Inputs. It has around 85% accuracy. Is it possible for me to find out which Input has the most effect. I tried this code but I cannot figure out how to find most important Input
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
One possible solution is to wrap your model with keras.wrappers.scikit_learn and then use Recursive Feature elimination in scikit-learn:
def create_model():
# create model
model = Sequential()
model.add(Dense(512, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=128, verbose=0)
rfe = RFE(estimator=model, n_features_to_select=1, step=1)
rfe.fit(X, y)
ranking = rfe.ranking_.reshape(digits.images[0].shape)
# Plot pixel ranking
plt.matshow(ranking, cmap=plt.cm.Blues)
plt.colorbar()
plt.title("Ranking of pixels with RFE")
plt.show()
If you need to visualize weights see here.

Resources