Training the AI model taking so long - keras

I am solving a multi-class classification problem.The data set looks like below :
|---------------------|------------------|----------------------|------------------|
| feature 1 | feature 3 | feature 4 | feature 2 |
|---------------------|------------------|------------------------------------------
| 1.302 | 102.987 | 1.298 | 99.8 |
|---------------------|------------------|----------------------|------------------|
|---------------------|------------------|----------------------|------------------|
| 1.318 | 102.587 | 1.998 | 199.8 |
|---------------------|------------------|----------------------|------------------|
The 4 features are floats and my target variable classes are either 1,2, or 3 .When I build the follow model and train it takes so long to converge (24 hours and still running )
I used a keras model like below :
def create_model(optimizer='adam', init='uniform'):
# create model
if verbose: print("**Create model with optimizer: %s; init: %s" % (optimizer, init) )
model = Sequential()
model.add(Dense(16, input_dim=X.shape[1], kernel_initializer=init, activation='relu'))
model.add(Dense(8, kernel_initializer=init, activation='relu'))
model.add(Dense(4, kernel_initializer=init, activation='relu'))
model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
fitting the model
best_epochs = 200
best_batch_size = 5
best_init = 'glorot_uniform'
best_optimizer = 'rmsprop'
verbose=0
model_pred = KerasClassifier(build_fn=create_model, optimizer=best_optimizer, init=best_init, epochs=best_epochs, batch_size=best_batch_size, verbose=verbose)
model_pred.fit(X_train,y_train)
I followed the tutorial here: https://www.kaggle.com/stefanbergstein/keras-deep-learning-on-titanic-data
and also a fast ai model like below :
cont_names = [ 'feature1', 'feature2', 'feature3', 'feature4']
procs = [FillMissing, Categorify, Normalize]
test = TabularList.from_df(test,cont_names=cont_names, procs=procs)
data = (TabularList.from_df(train, path='.', cont_names=cont_names, procs=procs)
.random_split_by_pct(valid_pct=0.2, seed=43)
.label_from_df(cols = dep_var)
.add_test(test, label=0)
.databunch())
learn = tabular_learner(data, layers=[1000, 200, 15], metrics=accuracy, emb_drop=0.1, callback_fns=ShowGraph)
I followed the tutorial below
https://medium.com/#nikkisharma536/applying-deep-learning-on-tabular-data-for-regression-and-classification-problems-1e5f80743259
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)
(138507, 4) (138507, 1) (34627, 4) (34627, 1)
Not sure why both the models are taking so long to run .is there any error in my inputs? Any help is appreciated.

With 200 epochs and over 138k training examples (and almost 35k test examples), you are dealing with a total of 34626800 (~35M) examples shown to the network. Those are big numbers.
Assuming that you are using your CPU for training, this may take several hours, even days, depending on your hardware.
One thing you could do is lower the number of epochs to see if you have and acceptable model earlier.

Related

Choosing a multi-label classifier with high number of labels

First of all, I'm very new to ML and I have this task:
I need to build a ML model to give clients a list of 10 professions for each of them which best fit with their data: bachelor's degree type, favourite subjects, favourite professional sectors, etc...
My team and I have alredy extracted info from an SQL database and created a dataframe with all the relevant information and One Hot Encoded it: this is the result:
df_all_dumm.shape
(773, 1029)
So I have 773 clients and 1029 columns (a lot of them, but we thought it is necessary because all the columns were numeric categorical). Most of the columns are OHE professions columns (from 99 to 998), where there is "1" if the profession has been suggested to the client or "0" if not.
I'm a bit lost about if this dataset approach is fine for multi-label classification, about what method to use (NN, RandomTrees classif, scikit multi-learn models...). I have alredy tried with some multi-learn models like MLkNN or BRkNNaClassifier but the results are very poor (F1 Score = 0.1 - 0.2).
This is the dataset (it doesn't contain any private data, so I think there is no problem uploading it. Also, I don't know if this is the propper way to paste a link, sorry again)
https://drive.google.com/file/d/1nID4q7EfpoiNKdWz6N4FRUgIQEniwFRV/view?usp=sharing
EDIT:
I have created a Sequential Keras model:
# Slicing target columns from the rest of the df
data = pd.read_csv('df_all_dumm.csv')
data_c = data.copy()
data_in = data_c.copy()
data_c.iloc[:,99:999]
data_out = data_c.iloc[:,99:999]
data_in = data_in.drop(data_out.columns,1)
data_in = data_in.drop(['id'],1)
X_train, X_test, y_train, y_test = train_test_split(data_in,
data_out,
test_size = 0.3,
random_state = 42)
print("{0:2.2f}% of data in train set".format(len(X_train)/len(data.index)*100))
print("{0:2.2f}% of data in test set".format(len(X_test)/len(data.index)*100))
# Dataframes to tensors
X_train_tf = tf.convert_to_tensor(X_train.values)
X_test_tf = tf.convert_to_tensor(X_test.values)
y_train_tf = tf.convert_to_tensor(y_train.values)
y_test_tf = tf.convert_to_tensor(y_test.values)
from numpy import asarray
from sklearn.datasets import make_multilabel_classification
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
# get the model
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(512, input_dim=n_inputs, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(600, input_dim=n_inputs, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(700, input_dim=n_inputs, activation='relu'))
model.add(Dense(900, input_dim=n_inputs, activation='relu'))
model.add(Dense(n_outputs, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# load dataset
X, y = X_train_tf, y_train_tf
n_inputs, n_outputs = X.shape[1], y.shape[1]
# get model
model = get_model(n_inputs, n_outputs)
# fit the model on all data
model.fit(X, y, validation_split=0.33, epochs=100, batch_size=10)
prec = model.evaluate(X_test_tf, y_test_tf)[1]
print("La precisiĆ³n de la red es: {} %".format(round(prec*100,2)))
La precisiĆ³n de la red es: 4.74 %
So, this is the model we created. I think the main problem here is that we have 900 different output labels and our data input size is 500...
We also thought about applying some clustering algorithm first to cluster the professions in, let's say 5 groups, train a NN for every group and then predict.

LSTM Autoencoder producing poor results in test data

I'm applying LSTM autoencoder for anomaly detection. Since anomaly data are very few as compared to normal data, only normal instances are used for the training. Testing data consists of both anomalies and normal instances. During the training, the model loss seems good. However, in the test the data the model produces poor accuracy. i.e. anomaly and normal points are not well separated.
The snippet of my code is below:
.............
.............
X_train = X_train.reshape(X_train.shape[0], lookback, n_features)
X_valid = X_valid.reshape(X_valid.shape[0], lookback, n_features)
X_test = X_test.reshape(X_test.shape[0], lookback, n_features)
.....................
......................
N = 1000
batch = 1000
lr = 0.0001
timesteps = 3
encoding_dim = int(n_features/2)
lstm_model = Sequential()
lstm_model.add(LSTM(N, activation='relu', input_shape=(timesteps, n_features), return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=False))
lstm_model.add(RepeatVector(timesteps))
# Decoder
lstm_model.add(LSTM(timesteps, activation='relu', return_sequences=True))
lstm_model.add(LSTM(encoding_dim, activation='relu', return_sequences=True))
lstm_model.add(TimeDistributed(Dense(n_features)))
lstm_model.summary()
adam = optimizers.Adam(lr)
lstm_model.compile(loss='mse', optimizer=adam)
cp = ModelCheckpoint(filepath="lstm_classifier.h5",
save_best_only=True,
verbose=0)
tb = TensorBoard(log_dir='./logs',
histogram_freq=0,
write_graph=True,
write_images=True)
lstm_model_history = lstm_model.fit(X_train, X_train,
epochs=epochs,
batch_size=batch,
shuffle=False,
verbose=1,
validation_data=(X_valid, X_valid),
callbacks=[cp, tb]).history
.........................
test_x_predictions = lstm_model.predict(X_test)
mse = np.mean(np.power(preprocess_data.flatten(X_test) - preprocess_data.flatten(test_x_predictions), 2), axis=1)
error_df = pd.DataFrame({'Reconstruction_error': mse,
'True_class': y_test})
# Confusion Matrix
pred_y = [1 if e > threshold else 0 for e in error_df.Reconstruction_error.values]
conf_matrix = confusion_matrix(error_df.True_class, pred_y)
plt.figure(figsize=(5, 5))
sns.heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d")
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
Please suggest what can be done in the model to improve the accuracy.
If your model is not performing good on the test set I would make sure to check certain things;
Training set is not contaminated with anomalies or any information from the test set. If you use scaling, make sure you did not fit the scaler to training and test set combined.
Based on my experience; if an autoencoder cannot discriminate well enough on the test data but has low training loss, provided your training set is pure, it means that the autoencoder did learn about the underlying details of the training set but not about the generalized idea.
Your threshold value might be off and you may need to come up with a better thresholding procedure. One example can be found here: https://dl.acm.org/citation.cfm?doid=3219819.3219845
If the problem is 2nd one, the solution is to increase generalization. With autoencoders, one of the most efficient generalization tool is the dimension of the bottleneck. Again based on my experience with anomaly detection in flight radar data; lowering the bottleneck dimension significantly increased my multi-class classification accuracy. I was using 14 features with an encoding_dim of 7, but encoding_dim of 4 provided even better results. The value of the training loss was not important in my case because I was only comparing reconstruction errors, but since you are making a classification with a threshold value of RE, a more robust thresholding may be used to improve accuracy, just as in the paper I've shared.

How to merge new features at later stage of model?

I have training data in the form of numpy arrays, that I will use in ConvLSTM.
Following are dimensions of array.
trainX = (5000, 200, 5) where 5000 are number of samples. 200 is time steps per sample, and 8 is number of features per timestep. (samples, timesteps, features).
out of these 8 features, 3 features remains the same throghout all timesteps in a sample (In other words, these features are directly related to samples). for example, day of the week, month number, weekday (these changes from sample to sample). To reduce the complexity, I want to keep these three features separate from initial training set and merge them with the output of convlstm layer before applying dense layer for classication (softmax activiation). e,g
Intial training set dimension would be (7000, 200, 5) and auxiliary input dimensions to be merged would be (7000, 3) --> because these 3 features are directly related to sample. How can I implement this using keras?
Following is my code that I write using Functional API, but don't know how to merge these two inputs.
#trainX.shape=(7000,200,5)
#trainy.shape=(7000,4)
#testX.shape=(3000,200,5)
#testy.shape=(3000,4)
#trainMetadata.shape=(7000,3)
#testMetadata.shape=(3000,3)
verbose, epochs, batch_size = 1, 50, 256
samples, n_features, n_outputs = trainX.shape[0], trainX.shape[2], trainy.shape[1]
n_steps, n_length = 4, 50
input_shape = (n_steps, 1, n_length, n_features)
model_input = Input(shape=input_shape)
clstm1 = ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu',return_sequences = True)(model_input)
clstm1 = BatchNormalization()(clstm1)
clstm2 = ConvLSTM2D(filters=128, kernel_size=(1,3), activation='relu',return_sequences = False)(clstm1)
conv_output = BatchNormalization()(clstm2)
metadata_input = Input(shape=trainMetadata.shape)
merge_layer = np.concatenate([metadata_input, conv_output])
dense = Dense(100, activation='relu', kernel_regularizer=regularizers.l2(l=0.01))(merge_layer)
dense = Dropout(0.5)(dense)
output = Dense(n_outputs, activation='softmax')(dense)
model = Model(inputs=merge_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit([trainX, trainMetadata], trainy, validation_data=([testX, testMetadata], testy), epochs=epochs, batch_size=batch_size, verbose=verbose)
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
y = model.predict(testX)
but I am getting Value error at merge_layer statement. Following is the ValueError
ValueError: zero-dimensional arrays cannot be concatenated
What you are saying can not be done using the Sequential mode of Keras.
You need to use the Model class API Guide to Keras Model.
With this API you can build the complex model you are looking for
Here you have an example of how to use it: How to Use the Keras Functional API for Deep Learning

Predicting a Time Series data with LSTM in Keras

I'm trying to prepare a model that will predict the first two numbers from a given array of numbers. So, the input dataset is like this -
[1 2 3 5]
[4 8 5 9]
[10 2 3 15]
Output will be -
[1 2]
[4 8]
[10 2]
So, the architectures of RNN are like below, (Taken from here)
Then, the basic architecture of I'm trying to achieve should be something close to this -
So, it should be a Many-To-Many network. (Resembles the fourth image)
Question - So, how can I create this type of model with Keras?
My Findings -
I tried something like this -
n_samples = 10000
input = np.random.randint(5,10, (n_samples,5))
output = input[...,0:2]
rinp = input.reshape(n_samples,1,5)
model = Sequential()
model.add(LSTM(10, input_shape=(1,5)))
model.add(Dense(2))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(rinp, output, epochs=1000, batch_size=500, verbose=1)
But as you can see, this is not even close. This is like an MLP. It does not utilize any time steps. Because, the input shape is - (n_samples,1,5). So, there is only one time step.
So, my implementation is wrong.
I've seen some examples of One-to-One, Many-to-One and Many-to-Many examples from here.
In Many-to-Many example, the author used the following code snippet.
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
As you can see from the X and y values, the described model is like the one below -
Which is not the one I'm trying to achieve.
Any example regarding the architecture I'm trying to implement would be greatly helpful.
It looks like you are trying to build a Sequence-to-Sequence (seq2seq) model based on the drawing. There is a very nice tutorial online to get you started. Instead of predicting sentences, you can just predict fixed length tokens of length 2. This architecture and variants are often used for machine translation. Based on your data, I'm guessing you are trying to experiment with the problem of long-term dependencies in recurrent networks; otherwise it wouldn't make sense to use seq2seq for any practical purposes.

LSTM with Keras for mini-batch training and online testing

I would like to implement an LSTM in Keras for streaming time-series prediction -- i.e., running online, getting one data point at a time. This is explained well here, but as one would assume, the training time for an online LSTM can be prohibitively slow. I would like to train my network on mini-batches, and test (run prediction) online. What is the best way to do this in Keras?
For example, a mini-batch could be a sequence of 1000 data values ([33, 34, 42, 33, 32, 33, 36, ... 24, 23]) that occur at consecutive time steps. To train the network I've specified an array X of shape (900, 100, 1), where there are 900 sequences of length 100, and an array y of shape (900, 1). E.g.,
X[0] = [[33], [34], [42], [33], ...]]
X[1] = [[34], [42], [33], [32], ...]]
...
X[999] = [..., [24]]
y[999] = [23]
So for each sequence X[i], there is a corresponding y[i] that represents the next value in the time-series -- what we want to predict.
In test I want to predict the next data values 1000 to 1999. I do this by feeding an array of shape (1, 100, 1) for each step from 1000 to 1999, where the model tries to predict the value at the next step.
Is this the recommended approach and setup for my problem? Enabling statefulness may be the way to go for a purely online implementation, but in Keras this requires a consistent batch_input_shape in training and testing, which would not work for my intent of training on mini-batches and then testing online. Or is there a way I can do this?
UPDATE: Trying to implement the network as #nemo recommended
I ran my own dataset on an example network from a blog post "Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras", and then tried implementing the prediction phase as a stateful network.
The model building and training is the same for both:
# Create and fit the LSTM network
numberOfEpochs = 10
look_back = 30
model = Sequential()
model.add(LSTM(4, input_dim=1, input_length=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=numberOfEpochs, batch_size=1, verbose=2)
# trainX.shape = (6883, 30, 1)
# trainY.shape = (6883,)
# testX.shape = (3375, 30, 1)
# testY.shape = (3375,)
Batch prediction is done with:
trainPredict = model.predict(trainX, batch_size=batch_size)
testPredict = model.predict(testX, batch_size=batch_size)
To try a stateful prediction phase, I ran the same model setup and training as before, but then the following:
w = model.get_weights()
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
trainPredictions, testPredictions = [], []
for trainSample in trainX:
trainPredictions.append(model.predict(trainSample.reshape((1,look_back,1)), batch_size=batch_size))
trainPredict = numpy.concatenate(trainPredictions).ravel()
for testSample in testX:
testPredictions.append(model.predict(testSample.reshape((1,look_back,1)), batch_size=batch_size))
testPredict = numpy.concatenate(testPredictions).ravel()
To inspect the results, the plots below show the actual (normalized) data in blue, the predictions on the training set in green, and the predictions on the test set in red.
The first figure is from using batch prediction, and the second from stateful. Any ideas what I'm doing incorrectly?
If I understand you correctly you are asking if you can enable statefulness after training. This should be possible, yes. For example:
net = Dense(1)(SimpleRNN(stateful=False)(input))
model = Model(input=input, output=net)
model.fit(...)
w = model.get_weights()
net = Dense(1)(SimpleRNN(stateful=True)(input))
model = Model(input=input, output=net)
model.set_weights(w)
After that you can predict in a stateful way.

Resources