Keras LSTM network predictions align with input - keras

The above may sound ideal, but I'm trying to predict a step in front - i.e. with a look_back of 1. My code is as follows:
def create_scaled_datasets(data, scaler_transform, train_perc = 0.9):
# Set training size
train_size = int(len(data)*train_perc)
# Reshape for scaler transform
data = data.reshape((-1, 1))
# Scale data to range (-1,1)
data_scaled = scaler_transform.fit_transform(data)
# Reshape again
data_scaled = data_scaled.reshape((-1, 1))
# Split into train and test data keeping time order
train, test = data_scaled[0:train_size + 1, :], data_scaled[train_size:len(data), :]
return train, test
# Instantiate scaler transform
scaler = MinMaxScaler(feature_range=(0, 1))
model.add(LSTM(5, input_shape=(1, 1), activation='tanh', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(12, input_shape=(1, 1), activation='tanh', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(2, input_shape=(1, 1), activation='tanh', return_sequences=False))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# Create train/test data sets
train, test = create_scaled_datasets(data, scaler)
trainY = []
for i in range(len(train) - 1):
trainY = np.append(trainY, train[i + 1])
train = np.reshape(train, (train.shape[0], 1, train.shape[1]))
plotting_test = test
test = np.reshape(test, (test.shape[0], 1, test.shape[1]))
model.fit(train[:-1], trainY, epochs=150, verbose=0)
testPredict = model.predict(test)
plt.plot(testPredict, 'g')
plt.plot(plotting_test, 'r')
plt.show()
with output plot of:
In essence, what I want to achieve is for the model to predict the next value, and I attempt to do this by training on the actual values as the features, and the labels being the actual values shifted along one (look_back of 1). Then I predict on the test data. As you can see from the plot, the model does a pretty good job, except it doesn't seem to be predicting the future, but instead seems to be predicting the present... I would expect the plot to look similar, except the green line (the predictions) to be shifted one point to the left. I have tried increasing the look_back value, but it seems to always do the same thing, which makes me think I'm training the model wrong, or attempting to predict incorrectly. If I am reading this wrong and the model is indeed doing what I want but I'm interpreting wrong (also very possible) how do I then predict further into the future?

To add on #MSalters' comment, and somewhat basing on this, it is possible, although not guaranteed that you could "help" your model learn something better than the identity, if you force it to learn not the actual value of the next step, but instead, make it learn the difference from the current step to the next.
To take this one step further, you could also keep an exponential moving average and learn the difference from that, somewhat like was done here.
In short, it makes statistical sense to predict the same value, as it is a low-risk guess. Maybe learning a difference won't converge to zero.
Other things I noticed:
Dropout - no need to use any normalization before you were able to over-fit. It just complicates debugging.
Just one step into the past - it is likely you are losing a lot of required information, thus in fact forcing your net to have no idea what to do, and thus guess the same value. If you even gave it a single value more into the past, it could have a nice approximation of the derivative. That sounds important (only you know)

Related

keras, LSTM - predict on inputs of different length?

I have fitted an LSTM that deals with inputs of different length:
model = Sequential()
model.add(LSTM(units=10, return_sequences=False, input_shape=(None, 5)))
model.add(Dense(units=1, activation='sigmoid'))
Having fitted the model, I want to test it on inputs of different size.
x_test.shape # = 100000
x_test[0].shape # = (1, 5)
x_test[1].shape # = (3, 5)
x_test[2].shape # = (8, 5)
Testing on single instances j is not a problem (model.predict(x_test[j]), but looping on all of them is really slow.
Is there a way of speeding up the computation? model.predict(x_test) does not work.
Thank you!
The most common way to speed up model inference is to run inference on GPU, instead of the CPU (I'm assuming you are not already doing that). You can set up GPU support by following the official guide here. Unless you are explicitly asking keras to run inference on CPU, your code should work as is, without any changes. To confirm if you are using GPU, you can use this article.
Hope the answer was helpful!
The best solution that I have found so far is grouping together data windows with the same length. For my problem, it's enough to significantly speed up the computation.
Hope this trick would help other people.
import numpy as np
def predict_custom(model, x):
"""x should be a list of np.arrays with different number of rows, but same number of columns"""
# dictionary with key = length of the window, value = indices of samples with such length
dic = {}
for i, x in enumerate(x):
if dic.get(x.shape[0]):
dic[x.shape[0]].append(i)
else:
dic[x.shape[0]] = [i]
y_pred = np.full((len(x),1), np.nan)
# loop over dictionary and predict together samples of the same length
for key, indexes in dic.items():
# select samples of the same length (conversion to np.array is used for subsetting "x" using "indexes")
x = np.asarray(x, dtype=object)[indexes].tolist()
# gather such samples in a 3D np.array
x_3d = np.stack(x, axis=0)
# use dictionary values to insert results in the correspondent row of y_pred
y_pred[indexes] = model.predict(x_3d)
return y_pred

how effective is transfer learning? keeping only two specific output features without resetting features

I want to keep only two specific output features without resetting features.
Resetting features would lose the pre-trained weights.
For example, I don't want to do...
# https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=transfer%20learning%20ant%20bees
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)
Here is code (following the transfer learning tutorial on Pytorch)
I want to do this to see how effective transfer learning is.
Even without transfer learning, a model might be effective. Removing 998 out of 1000 categories and leaving only two categories, ant and bee, could be a great categorical model since you are left with only two choices.
I do not want to re-train the model, I want to use the weights as it is, otherwise, it will be the same as transfer learning.
You can certainly try this. You can reduce the model output to just the two logits you want to compare with:
chosen_cats = torch.Tensor([ant_index, bee_index]).long()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
outputs = torch.index_select(output, 1, chosen_cats)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
In this scenario, the preds will be 0 or 1, with 0 predicting ant and 1 predicting bee, so you will need to also modify your labels to reflect this.

GridSearchCv Pipeline MultiOutputClassifier with XGBoostClassifier - how to pass early_stopping_rounds and eval_set?

I want to do multioutput prediction of labels and continuous data. My data consists of time series, one 10 time-points series of 30 observables per sample. I want to predict 10 labels that are binary, and 5 that are continuous, based on this data.
For the sake of simplicity I have flattened the time series data - ending up with one row per sample.
Since there are many labels to predict about the same system, and since there exists relationships between these, I want to use MutliOutputPrediction to do so. My idea is to divide the task into two parts; one for MultiOutputClassification, another for MultiOutputRegression.
I generally like XGBoost and wish to use it for this task, but of course I want to prevent overfitting when doing so. So I have a piece of code as follows, and I wish to pass the early_stopping_rounds to the fit method of the XGBClassifier, but don't know how to.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
pipeline = Pipeline([
('imputer', SimpleImputer()), # XGBoost can deal with NaNs, but MultiOutputClassifier cannot
('classifier', MultiOutputClassifier(XGBClassifier()))
])
param_grid = dict(
classifier__estimator__n_estimators=[100], # this works
# classifier__estimator__early_stopping_rounds=[30], # needs to be passed to .fit
# classifier__estimator__scale_pos_weight=[scale_pos_weight], # XGBoostError: Invalid Parameter format for scale_pos_weight expect float
)
clf = GridSearchCV(estimator=pipeline, param_grid=param_grid, scoring='roc_auc', refit='roc_auc', cv=5, n_jobs=-1)
clf.fit(X_train, y_train[CLASSIFICATION_LABELS])
y_hat_proba = np.array(clf.predict_proba(X_test))
y_hat = pd.DataFrame(np.array([y_hat_proba[:, i, 0] for i in range(y_hat_proba.shape[1])]), columns=CLASSIFICATION_LABELS)
auc_roc_scores = np.array([roc_auc_score(y_test[label], (y_hat[label] > 0.5).astype(int)) for label in y_hat.columns])
print(f'average ROC AUC score: {np.mean(auc_roc_scores).round(3)}+/-{np.std(auc_roc_scores).round(3)}')
>>> average ROC AUC score: 0.499+/-0.002
I tried passing it to fit as follows:
classifier__estimator__early_stopping_rounds=30
classifier__early_stopping_rounds=30
I get AUC ROC scores of 0.5 on the labels, which means this clearly isn't working and hence why I want to pass the early_stopping_rounds parameter and the eval_set. I suppose that being able to pass scale_pos_weight could also be useful, but probably doesn't work for MultiOutput prediction. At the moment I get the feeling that this is not the way to go to solve this, and in case you agree I would appreciate alternative suggestions.

Basic time series prediction with lstm

I have a sequence and I would like to do the simplest LSTM possible to predict the rest of the sequence.
Meaning I want to start by using only the previous step to predict the next one and then add more steps.
I want to use the predicted values as inputs also.
So I believe what I want is to achieve many to many as mentioned in the answers there Understanding Keras LSTMs .
I have read other questions on the topic on stackoverflow but still didn't manage to make it work. In my code, I'm using the tutorial here https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ and the function create_dataset to create two arrays with only a shift of one step.
Here is my code and the error I got.
"Here I'm scaling my data as advised"
scaler = MinMaxScaler(feature_range=(0, 1))
Rot = scaler.fit_transform(Rot)
"I'm creating the model using batch_size=1 but I'm not sure why this is necessary"
batch_size = 1
model = Sequential()
model.add(LSTM(1,batch_input_shape=(batch_size,1,1),stateful=True,return_sequences=True,input_shape=(None,1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
"I want to use only the previous value for now"
look_back = 1
"as len(Rot) = 41000 I'm taking 36000 for training"
train_size = 36000
X,Y = create_dataset(Rot[:train_size,:],look_back)
X = numpy.reshape(X,(X.shape[0], X.shape[1], 1))
Y = numpy.reshape(Y,(X.shape[0], X.shape[1], 1))
And now I train my network as advised by #Daniel Möller.
epochs = 10
for epoch in range(epochs):
model.reset_states()
model.train_on_batch(X,Y)
" And I get this error "
" PartialTensorShape: Incompatible shapes during merge: [35998,1] vs. [1,1]
[[{{node lstm_11/TensorArrayStack/TensorArrayGatherV3}}]]."
Do you know why I have such an error as it seems I did everything as in the topic mentioned above ?
In this LSTM network batch_size=1, because it is stateful. When stateful=True, the train_set size and test_set size when divided by batch_size should have a modulo of zero.
batch_input_shape=(batch_size,1,1) is already defined, then why again,input_shape=(None,1)
When return_sequences=True, another LSTM is following the existing LSTM layer. But here it is not.

Tensorflow- How to display accuracy rate for a linear regression model

I have a linear regression model that seems to work. I first load the data into X and the target column into Y, after that I implement the following...
X_train, X_test, Y_train, Y_test = train_test_split(
X_data,
Y_data,
test_size=0.2
)
rng = np.random
n_rows = X_train.shape[0]
X = tf.placeholder("float")
Y = tf.placeholder("float")
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
pred = tf.add(tf.multiply(X, W), b)
cost = tf.reduce_sum(tf.pow(pred-Y, 2)/(2*n_rows))
optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate).minimize(cost)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run([init, init_local])
for epoch in range(FLAGS.training_epochs):
avg_cost = 0
for (x, y) in zip(X_train, Y_train):
sess.run(optimizer, feed_dict={X:x, Y:y})
# display logs per epoch step
if (epoch + 1) % FLAGS.display_step == 0:
c = sess.run(
cost,
feed_dict={X:X_train, Y:Y_train}
)
print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c))
print("Optimization Finished!")
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
I cannot figure out how to print out the model's accuracy. For example, in sklearn, it is simple, if you have a model you just print model.score(X_test, Y_test). But I do not know how to do this in tensorflow or if it is even possible.
I think I'd be able to calculate the Mean Squared Error. Does this help in any way?
EDIT
I tried implementing tf.metrics.accuracy as suggested in the comments but I'm having an issue implementing it. The documentation says it takes 2 arguments, labels and predictions, so I tried the following...
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
print(sess.run(accuracy))
But this gives me an error...
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value accuracy/count
[[Node: accuracy/count/read = IdentityT=DT_FLOAT, _class=["loc:#accuracy/count"], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
How exactly does one implement this?
Turns out, since this is a multi-class Linear Regression problem, and not a classification problem, that tf.metrics.accuracy is not the right approach.
Instead of displaying the accuracy of my model in terms of percentage, I instead focused on reducing the Mean Square Error (MSE) instead.
From looking at other examples, tf.metrics.accuracy is never used for Linear Regression, and only classification. Normally tf.metric.mean_squared_error is the right approach.
I implemented two ways of calculating the total MSE of my predictions to my testing data...
pred = tf.add(tf.matmul(X, W), b)
...
...
Y_pred = sess.run(pred, feed_dict={X:X_test})
mse = tf.reduce_mean(tf.square(Y_pred - Y_test))
OR
mse = tf.metrics.mean_squared_error(labels=Y_test, predictions=Y_pred)
They both do the same but obviously the second approach is more concise.
There's a good explanation of how to measure the accuracy of a Linear Regression model here.
I didn't think this was clear at all from the Tensorflow documentation, but you have to declare the accuracy operation, and then initialize all global and local variables, before you run the accuracy calculation:
accuracy, accuracy_op = tf.metrics.accuracy(labels=tf.argmax(Y_test, 0), predictions=tf.argmax(pred, 0))
# ...
init_global = tf.global_variables_initializer
init_local = tf.local_variables_initializer
sess.run([init_global, init_local])
# ...
# run accuracy calculation
I read something on Stack Overflow about the accuracy calculation using local variables, which is why the local variable initializer is necessary.
After reading the complete code you posted, I noticed a couple other things:
In your calculation of pred, you use
pred = tf.add(tf.multiply(X, W), b). tf.multiply performs element-wise multiplication, and will not give you the fully connected layers you need for a neural network (which I am assuming is what you are ultimately working toward, since you're using TensorFlow). To implement fully connected layers, where each layer i (including input and output layers) has ni nodes, you need separate weight and bias matrices for each pair of successive layers. The dimensions of the i-th weight matrix (the weights between the i-th layer and the i+1-th layer) should be (ni, ni + 1), and the i-th bias matrix should have dimensions (ni + 1, 1). Then, going back to the multiplication operation - replace tf.multiply with tf.matmul, and you're good to go. I assume that what you have is probably fine for a single-class linear regression problem, but this is definitely the way you want to go if you plan to solve a multiclass regression problem or implement a deeper network.
Your weight and bias tensors have a shape of (1, 1). You give the variables the initial value of np.random.randn(), which according to the documentation, generates a single floating point number when no arguments are given. The dimensions of your weight and bias tensors need to be supplied as arguments to np.random.randn(). Better yet, you can actually initialize these to random values in Tensorflow: W = tf.Variable(tf.random_normal([dim0, dim1], seed = seed) (I always initialize random variables with a seed value for reproducibility)
Just a note in case you don't know this already, but non-linear activation functions are required for neural networks to be effective. If all your activations are linear, then no matter how many layers you have, it will reduce to a simple linear regression in the end. Many people use relu activation for hidden layers. For the output layer, use softmax activation for multiclass classification problems where the output classes are exclusive (i.e., where only one class can be correct for any given input), and sigmoid activation for multiclass classification problems where the output classes are not exlclusive.

Resources