Keras LSTM, batch data structuring differences - keras

I am trying to understand the differences and implications of structuring sequential data for use in the Keras LSTM model. I would like to forecast electricity demand which has a natural daily/weekly demand shape that is driven by temperature and weekday and hour of day, for example. Say I have 1 month's worth of demands and inputs i.e. array shape (30 days * 24 hours of demand, 3 features), and I want to predict the next 30 days of demand based on expected future inputs. What are the implications of the following (particularly with respect to statefulness):
#A. feed in 1 batch of 1 hour at a time. this seems the slowest to train
model.add(LSTM(n_neurons, batch_input_shape=(1, 1, 3), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=1, verbose=1, shuffle=False)
model.reset_states()
#B. feed in 720 batches of 1 hour at a time
#is this the same as A, except I need to forecast 720 hours/timesteps at a time
model.add(LSTM(n_neurons, batch_input_shape=(720, 1, 3), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=720, verbose=1, shuffle=False)
model.reset_states()
#C. feed in 1 batches of 720 hour timesteps at a time
#is this the same as A, except I need to forecast 720 hours/timesteps at a time
model.add(LSTM(n_neurons, batch_input_shape=(1, 720, 3), stateful=True)) #probably dont need stateful=True here (?)
model.add(Dense(720))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=1, verbose=1, shuffle=False)
model.reset_states()
#D. some variation so that number_of_batches x timesteps = 720 (no overlap of sequences).
#Timesteps most likely in multiples of 24 timesteps representing hours of the day to capture the profile shape
model.add(LSTM(n_neurons, batch_input_shape=(number_of_batches, timesteps, 3), stateful=True))
model.add(Dense(timesteps))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(n_epoch):
model.fit(X, y, epochs=1, batch_size=number_of_batches, verbose=1, shuffle=False)
model.reset_states()
I've read around so much but still don't quite get the full hang of LSTMs, so any help is much much appreciated! Any recommendations are welcome too.

Related

Issue with accuracy never changing ANN in kreas

I am trying to build simple ANN to learn how to tell if the two images are similar or not using two distance equations. So here how I set up things. I created a distance between 3 images (1, an anchor, 2 a positive sample, 3 a negative sample) and then created two different distance measurements. 1 using ResNet features and another using hog features. The two distance measurements are then saved with the two picture paths as well as the correct label (0/1) 0 = Same 1 = not the same.
Now I am trying to build out my ANN to learn the difference between the two values and see if this will allow for me to see if two images a similar. But nothing happens when I train up the ANN. I think there are two possibilities.
1: I didn't set up the ann correctly.
2: There is no connection at all.
Please help me see what the issue is:
Here is my code:
# Load the Pandas libraries with alias 'pd'
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
# fix random seed for reproducibility
np.random.seed(7)
import csv
data = pd.read_csv("encoding.csv")
print(data.columns)
X = data[['resnet', 'hog','label']]
x = X[['resnet', 'hog']]
y = X[['label']]
model = Sequential()
#get number of columns in training data
n_cols = x.shape[1]
#add model layers
model.add(Dense(16, activation='relu', input_shape=(n_cols,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation= 'softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x, y,
epochs=30,
batch_size=32,
validation_split=0.10)
Right now all it does is this over and over again:
167/167 [==============================] - 0s 3ms/step - loss: 8.0189 - acc: 0.4970 - val_loss: 7.5517 - val_acc: 0.5263
Here is the csv file that I am using:
EDIT
So I have changed the setup a bit and now it does bounce up to 73% val accuracy. But then it bounces around and ends at 40% what does than mean?
Here is the new model:
model = Sequential()
#get number of columns in training data
n_cols = x.shape[1]
model.add(Dense(256, activation='relu', input_shape=(n_cols,)))
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(1, activation= 'sigmoid'))
#sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#model.compile(loss = "binary_crossentropy", optimizer = sgd, metrics=['accuracy'])
model.compile(loss = "binary_crossentropy", optimizer = 'rmsprop', metrics=['accuracy'])
#model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x, y,
epochs=100,
batch_size=64,
validation_split=0.10)
This makes no sense:
model.add(Dense(1, activation= 'softmax'))
Softmax with one neuron will produce a constant value of 1.0 due to the normalization. For binary classification with the binary_crossentropy loss, you should use one neuron with sigmoid activation.
model.add(Dense(1, activation= 'sigmoid'))
Two things to try :
First add complexity to your network, it is pretty simple, add more layers/neurons in order to capture more information from your data
Start with something like that, and see if it change something :
model.add(Dense(256, activation='relu', input_shape=(n_cols,)))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation= 'sigmoid'))
Second, think to add more epochs, ANN can be long to converge
Update
More things to try :
Normalize and scale your data
Maybe too small dataset -> the more data you get, the better your model will be
Try differents hyper parameter, maybe decrease your learning rate like 1e-4 or 1e-5, try differents batch_size, ..
Add more regularization: try dropout between each layer

when training sample increases accuracy decreases

I am testing keras's imdb dataset. questions is, when I split to train and test for 2000 number of words I get close to 87% accuracy:
(X_train, train_labels), (X_test, test_labels) = imdb.load_data(num_words=2000)
but when I bump up the words to like 5000 or 10000, the model perform poorly:
(X_train, train_labels), (X_test, test_labels) = imdb.load_data(num_words=10000)
Here is my model:
model = models.Sequential()
model.add(layers.Dense(256, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu' ))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history =model.fit(X_train, y_train, epochs=10, batch_size=64,validation_data=(x_val, y_val))
Can any one explain why this is the case. I though with more sample (and less over fitting) I should get a very good model.
Thanks for any advice
Increasing num_words doesn't increase the amount of samples but the vocabulary, leading to more words per sample (statistically), going in the direction of the curse of dimensionality, which is harmful for the model.
From the docs:
num_words: integer or None. Top most frequent words to consider. Any less frequent word will appear as oov_char value in the sequence data.

Multiple Sequences RNN/LSTM in Keras

I have multiple sequences of varying length. Each has about 9 features. I want to predict the values of all the continuous features at time t+1. The data is in a list of length 2000 (so, 2000 total sequences). How could one do this in Keras?
model = Sequential()
model.add(LSTM(100, input_shape=(None,9)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=1, batch_size=1, verbose=1)
This is all I really have, but I'm getting some size mismatches. Any suggestions?

Regularization strategy in Keras

I have trying to setup a non-linear regression problem in Keras. Unfortunately, results show that overfitting is occurring. Here is the code,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results without regularization is shown here Without regularization. The mean absolute error for training is much less compared to validation, and both have a fixed gap which is a sign of over-fitting.
L2 regularization was specified for each layer like so,
model = Sequential()
model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation = 'relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu', kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(int(number_of_neurons), activation='relu',kernel_regularizer=regularizers.l2(0.001)))
model.add(Dense(outdim, activation='linear'))
Adam = optimizers.Adam(lr=0.001)
model.compile(loss='mean_squared_error', optimizer=Adam, metrics=['mae'])
model.fit(X, Y, epochs=1000, batch_size=500, validation_split=0.2, shuffle=True, verbose=2 , initial_epoch=0)
The results for these are shown here L2 regularized result. The MAE for test is close to training which is good. However, the MAE for training is poor at 0.03 (without regularization it was much lower at 0.0028).
What can i do to reduce the training MAE with regularization?
Based on your results, it looks like you need to find the right amount of regularization to balance training accuracy with good generalization to the test set. This may be as simple as reducing the L2 parameter. Try reducing lambda from 0.001 to 0.0001 and comparing your results.
If you can't find a good parameter setting for L2, you could try dropout regularization instead. Just add model.add(Dropout(0.2)) between each pair of dense layers, and experiment with the dropout rate if necessary. A higher dropout rate corresponds to more regularization.

Keras: shape error when using validation data

Trying to add validation to model.fit, but whenever I do I get an error:
ValueError: Cannot feed value of shape (6, 4, 10) for Tensor 'lstm_input_1:0', which has shape '(32, 4, 10)'
Model:
data_dim = 10
timesteps = 4
batch_size = 32
model = Sequential()
model.add(LSTM(128, batch_input_shape=(batch_size, timesteps, data_dim), return_sequences=True, stateful=True))
model.add(LSTM(64, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(2, activation='softmax'))
sgd = SGD(lr=0.001, momentum=0.0, decay=0.0, nesterov=False)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, nb_epoch=50, batch_size=batch_size, validation_split=0.5)
What could be the error? If I remove validation_split the training works just fine. I've also tried to manually split my training set into two and add it with validation_data=(x_val, y_val) but I got the exact same error.
The issue comes from the fact that you hard code the batch_size value of your inputs. You have fixed it to 32 and then when you try and validate your model, the validation data is sent with a batch of 6 samples, this might be because you don't have enough validation data or maybe because the number of sample isn't a multiple of 32... However, I would let the batch_size free if I was you. Like this:
model.add(LSTM(128, input_shape=(timesteps, data_dim), return_sequences=True, stateful=True))
You specify input_shape instead of batch_input_shape. That way, your network will accept any size of batch, every layer down in the stream of your model are made to adapt to any batch_size if not hardcoded.
I hope this helps :)

Resources