Let's say I want to predict the consumption of electricity of my bulding.
The input is 1 week of consumption and I want the output to be the predicted consumption for the day after, hour by hour i.e. consumption at midnight, consumption at 1 hour etc.
For simulation and simplicity, I use a very simple consumption function : c(h)=h where h is the hour of the day and c(h) is the consumption for this hour, here identity. Hence, the consumption for 1 day is ```[0, 1, 2,..., 23, 24]
I first learned 1 example of 7 vectors of 24 hours of consumption (i.e. 1 week) and got quite sensible results:
from keras.models import Sequential
from keras import layers
import matplotlib.pyplot as plt
import numpy as np
from keras.layers import RepeatVector
Inputs = np.tile(np.arange(24), (7)).reshape(1, 7, 24)
Outputs = np.arange(24).reshape(1, 24)
model = Sequential()
model.add(layers.LSTM(32, # 32 is choosen at random
input_shape=(None, Inputs.shape[-1]),
activation= 'relu'))
model.add(layers.Dense(24, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit(x=Inputs, y=Outputs, epochs=150,
batch_size= 1,
validation_data = (Inputs, Outputs))
plt.figure()
plt.plot(model(Inputs, training=False)[0])
plt.show()
But now, I want to learn scalar values (no more vectors) since I guess the results should be different. And that's where I get lost.
If I only change the shapes, I've got something that works:
Inputs = np.tile(np.arange(24), (7)).reshape(1, 7*24, 1)
Outputs = np.arange(24).reshape(1, 24, 1)
model2 = Sequential()
model2.add(layers.LSTM(32, # 32 is choosen at random
input_shape=(7*24, Inputs.shape[-1]),
# activation= 'relu'
))
model2.add(layers.Dense(24, activation='linear'))
model2.compile(loss='mean_squared_error', optimizer='adam')
history = model2.fit(x=Inputs, y=Outputs, epochs=200,
batch_size= 1,
validation_data = (Inputs, Outputs))
plt.figure()
y = model2(Inputs, training=False)
plt.plot(y[0])
plt.show()
It works but I'm really not sure this is the good way.
I've tried to use more sophisticated approach (suitable with many-to-many problems) but it never worked:
Inputs = np.tile(np.arange(24), (7)).reshape(1, 7 * 24, 1)
Outputs = np.arange(24).reshape(1, 24, 1)
model2 = Sequential()
model2.add(layers.LSTM(32, input_shape=(7*24, 1))) # encoder layer
model2.add(RepeatVector(7*24)) # repeat vector
model2.add(layers.LSTM(32, return_sequences=True)) # decoder layer
model2.add(layers.TimeDistributed(layers.Dense(1)))
model2.compile(optimizer='adam', loss='mse')
print(model2.summary())
history = model2.fit(Inputs, Outputs, epochs=500, verbose=1, batch_size=1, validation_data = (Inputs, Outputs))
plt.figure()
res = model2(Inputs, training=False)
plt.plot(res[0])
plt.show()
In the version above, Keras complains about the shapes for the loss evaluation, but I've tried many things that did not work :(
What's wrong with this code ?
Optionally is this encoder/decoder approach the good approach ? ;)
Thanks in advance.
Your output shape should be (None, 24, 1). But you have RepeatVector(7*24), so your output shape after the TimeDistributed Dense layer is (None, 7*24, 1). Change it to RepeatVector(24)
Related
I just started to build my first CNN. I'm practicing with the MNIST dataset, this is the code I just wrote:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dropout, Flatten, Dense
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import RobustScaler
import os
import numpy as np
import matplotlib.pyplot as plt
# CONSTANTS
EPOCHS = 300
TIME_STEPS = 30000
NUM_CLASSES = 10
# Loading data
print('Loading data:')
(train_X, train_y), (test_X, test_y) = mnist.load_data()
print('X_train: ' + str(train_X.shape))
print('Y_train: ' + str(train_y.shape))
print('X_test: ' + str(test_X.shape))
print('Y_test: ' + str(test_y.shape))
print('------------------------------')
# Splitting train/val
print('Splitting training/validation set:')
X_train = train_X[0:TIME_STEPS, :]
X_val = train_X[TIME_STEPS:TIME_STEPS*2, :]
print('X_train: ' + str(X_train.shape))
print('X_val: ' + str(X_val.shape))
# Normalizing data
print('------------------------------')
print('Normalizing data:')
X_train = X_train/255
X_val = X_val/255
print('X_train: ' + str(X_train.shape))
print('X_val: ' + str(X_val.shape))
# Building model
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=5, input_shape=(28, 28)))
model.add(Conv1D(filters=16, kernel_size=4, activation="relu"))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(NUM_CLASSES, activation='softmax'))
model.compile(optimizer=Adam(), loss=categorical_crossentropy, metrics=['accuracy'])
model.summary()
model.fit(x=X_train, y=X_train, batch_size=10, epochs=EPOCHS, shuffle=False)
I'm going to explain what I did, any correction would be helpful so I can learn more:
The first thing I did is splitting the training set in two parts: a training part and a validation part, on which I would like to do the training before testing it on the test set.
Then, I normalized the data (is this a standard when we work with images?)
I then built my CNN with a simple structure: the first layer is the one which gets the inputs (with dimension 28x28) and I've chosen 32 filters that should be enough to perform well on this dataset. The kernel size is the one I did not understood since I thought that the kernel was the equivalent of the filter. I selected a low number to avoid problems. The second layer is similar to the previous one, but now it has an activation function (relu, but I'm not convinced, I was thinking to use a softmax to pass a set of probabilities to the full connected layer).
The last 3 layers are the full connected layer to get the output.
In the fit function I used a batch size of 10 and I think that this could be one of the reason I get the error:
ValueError: Shapes (10, 28, 28) and (10, 10) are incompatible
Even removing it I still getting the following error:
ValueError: Shapes (None, 28, 28) and (None, 10) are incompatible
Am I missing something important?
You are passing in the X_train variable twice, once as the x argument and once as the y argument. Instead of passing in X_train as the y argument in .fit() you should pass in an array of values you are trying to predict. Given that you are using MNIST is assume that you are trying to predict the written digit, so your y array should be of shape (n_samples, 10) with the digit being one-hot encoded.
I have training data in the form of numpy arrays, that I will use in ConvLSTM.
Following are dimensions of array.
trainX = (5000, 200, 5) where 5000 are number of samples. 200 is time steps per sample, and 8 is number of features per timestep. (samples, timesteps, features).
out of these 8 features, 3 features remains the same throghout all timesteps in a sample (In other words, these features are directly related to samples). for example, day of the week, month number, weekday (these changes from sample to sample). To reduce the complexity, I want to keep these three features separate from initial training set and merge them with the output of convlstm layer before applying dense layer for classication (softmax activiation). e,g
Intial training set dimension would be (7000, 200, 5) and auxiliary input dimensions to be merged would be (7000, 3) --> because these 3 features are directly related to sample. How can I implement this using keras?
Following is my code that I write using Functional API, but don't know how to merge these two inputs.
#trainX.shape=(7000,200,5)
#trainy.shape=(7000,4)
#testX.shape=(3000,200,5)
#testy.shape=(3000,4)
#trainMetadata.shape=(7000,3)
#testMetadata.shape=(3000,3)
verbose, epochs, batch_size = 1, 50, 256
samples, n_features, n_outputs = trainX.shape[0], trainX.shape[2], trainy.shape[1]
n_steps, n_length = 4, 50
input_shape = (n_steps, 1, n_length, n_features)
model_input = Input(shape=input_shape)
clstm1 = ConvLSTM2D(filters=64, kernel_size=(1,3), activation='relu',return_sequences = True)(model_input)
clstm1 = BatchNormalization()(clstm1)
clstm2 = ConvLSTM2D(filters=128, kernel_size=(1,3), activation='relu',return_sequences = False)(clstm1)
conv_output = BatchNormalization()(clstm2)
metadata_input = Input(shape=trainMetadata.shape)
merge_layer = np.concatenate([metadata_input, conv_output])
dense = Dense(100, activation='relu', kernel_regularizer=regularizers.l2(l=0.01))(merge_layer)
dense = Dropout(0.5)(dense)
output = Dense(n_outputs, activation='softmax')(dense)
model = Model(inputs=merge_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit([trainX, trainMetadata], trainy, validation_data=([testX, testMetadata], testy), epochs=epochs, batch_size=batch_size, verbose=verbose)
_, accuracy = model.evaluate(testX, testy, batch_size=batch_size, verbose=0)
y = model.predict(testX)
but I am getting Value error at merge_layer statement. Following is the ValueError
ValueError: zero-dimensional arrays cannot be concatenated
What you are saying can not be done using the Sequential mode of Keras.
You need to use the Model class API Guide to Keras Model.
With this API you can build the complex model you are looking for
Here you have an example of how to use it: How to Use the Keras Functional API for Deep Learning
I want to have the y_pred output as either +1 or -1 only. It should not have the intermediate real values and not even zero.
classifier = Sequential()
#adding layers
# Adding the input layer and the first hidden l`enter code here`ayer
classifier.add(Dense(output_dim = 6, init = 'uniform', activation ='relu', input_shape = (22,)))
# Adding the second hidden layer classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'tanh'))
# Compiling Neural Network
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting our model
classifier.fit(x_train, y_train, batch_size = 10, epochs = 100)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
The output values of y_pred are in the range of [-1,1] but I expected values only to be either of 1 or -1.
To function properly, neural networks require an activation function that can get non-integer values. If you need rigidly discrete output, you need to translate the output values yourself.
When you are implementing binary_crossentropy loss in your code, Keras automatically takes the output and applies a threshold of 0.5 to the value. This makes anything above 0.5 as 1 and anything below as 0. Unfortunately, in keras there is no easy way to change the threshold. You will have to write your own loss function.
Here is a Stackoverflow link that will guide you in doing that.
Edited to add:
I found what I think is a working solution: https://bleyddyn.github.io/posts/2017/10/keras-lstm/
I'm trying to use a Conv/LSTM network for controlling a robot. I think I have everything set up so I could start training it on batches of data from a replay memory, but I can't figure out how to actually use it to control a robot. Simplified test code is below.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(timesteps,) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', unroll=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, timesteps, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# stateful should be false here
pred = model.train_on_batch( batch, y )
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps and I don't think the LSTM internal state would be correct if I tried a rolling queue of input images.
batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
The error I get on the "model.predict( image..." line is:
ValueError: Error when checking : expected time_distributed_1_input to have shape (None, 10, 84, 84, 3) but got array with shape (1, 1, 84, 84, 3)
Which is understandable, but I can't find a way around it.
I don't know Keras well enough to even know if I'm using the TimeDistributed layers correctly.
So, is this even possible in Keras? If so, how?
If not, is it possible in TF or PyTorch?
Thanks for any suggestions!
Edited to add running code, although it's not necessarily correct. Still needs to be tested on an OpenAI gym task.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(1,None) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), batch_input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', stateful=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, 1, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# Need to find a way to prevent the optimizer from updating every b, but accumulate updates over an entire batch (batch_size).
for b in range(batch_size):
pred = model.train_on_batch( np.reshape(batch[b,:], (1,timesteps,84,84,3)), np.reshape(y[b,:], (1,timesteps,num_actions)) )
#for t in range(timesteps):
# pred = model.train_on_batch( np.reshape(batch[b,t,:], (1,1,84,84,3)), np.reshape(y[b,t,:], (1,1,num_actions)) )
model.reset_states() # Don't carry internal state between batches
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps
#batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
#pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
print( pred )
The first thing you need is to understand your data.
Do these 5 dimensions mean anything?
I'll try to guess:
- 1 learning example
- 1 time step (this is added by TimeDistributed, normal 2D convolutions don't take this)
- 84 image side
- 84 another image side
- 3 channels (RGB)
The purpose of TimeDistributed is to add that extra timesteps dimension, so you can simulate a sequence in layers that are not supposed to work with sequences.
Your error message is telling you this:
Your input_shape parameter is (None, 10, 84, 84, 3), where None is the batch size (number of samples/examples).
Your input data, which is batch in your code is (1, 1, 84, 84, 3).
There is a mismatch, you are supposed to use batches containing 10 time steps (as defined by your input_shape). It's ok for the stateful=False model to pack 10 images in a batch and train with that.
But later, in the stateful=True case, you will need that input_shape to be just one step. (You either create a new model just for predicting and copy all weights from the training model to the predicting model, or you can try to use None in that time steps dimension, meaning you can train and predict with different amounts of time steps)
Now, differently from the convolutionals, the LSTM layer is already expecting time steps. So you should find a way to squeeze your data in less dimensions.
The LSTM will expect (None, timeSteps, features). The time steps are the same as the previous. 10 for training, 1 for predicting, and you could try to go with None there.
So, instead of a Flatten() inside a TimeDistributed, you should simply reshape the data, condensing the dimensions that are not batch size or steps:
model.add(Reshape((8,9*9*32))) #the batch size doesn't participate in this definition, and it will remain as it is.
The 9*9*32 are the sides of the preceding convolutional and its 32 filters. (I'm just not sure the sides are 9, maybe they're 8, you can see in the current model.summary()).
Finally, for the stateful=True case, you will have to define the model with batch_shape instead of input_shape. The amount of samples in a batch must be a fixed number, because the model will assume the samples in the second batch are new steps belonging to the samples in the previous batch. (The number of samples will then need to be the same for all batches).
I have a single training batch of 600 sequential points (x(t), y(t)) with x(t) being a 25 dimensional vector and y(t) being my target (1 dim). I would like to train an LSTM to predict how the series would continue given a few additional x(t) [t> 600]. I tried the following model:
model = Sequential()
model.add(LSTM(128, input_shape = (600,25), batch_size = 1, activation= 'tanh', return_sequences = True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=20 ,verbose=2) prediction
prediction = model.predict(testX, batch_size = 1)
Fitting works fine, but I keep getting the following error at the prediction step:
Error when checking : expected lstm_46_input to have shape (1, 600, 25) but got array with shape (1, 10, 25)
What am I missing?
Here are my shapes:
trainX.shape = (1,600,25)
trainY.shape = (1,600,1)
testX.shape = (1,10,25)
According to Keras documentation input of LSTM (or any RNN) layers should be of shape (batch_size, timesteps, input_dim) where your input shape is
trainX.shape = (1,600,25)
So it means for training you are passing only one data with 600 timesteps and 25 features per timestep. But I got a feeling that you actually have 600 training data each having 25 timesteps and 1 feature per timestep. I guess your input shape (trainX) should be 600 x 25 x 1. Train target (trainY) should be 600 x 1 If my assumption is right then your test data should be of shape 10 x 25 x 1. First LSTM layer should be written as
model.add(LSTM(128, input_shape = (25,1), batch_size = 1, activation= 'tanh', return_sequences = False))
If your training data is in fact (1,600,25) what this means is you are unrolling the LSTM feedback 600 times. The first input has an impact on the 600th input. If this is what you want, you can use the Keras function "pad_sequences" to add append zeros to the test matrix so it has the shape (1,600,25). The network should predict zeros and you will need to add 590 zeros to your testY.
If you only want say 10 previous timesteps to affect your current Y prediction, then you will want to turn your trainX into shape (590,10,25). The input line will be something like:
model.add(LSTM(n_hid, stateful=True, return_sequences=False, batch_input_shape=(1,nTS,x_train.shape[2])))
The processing to get it in the form you want could be something like this:
def formatTS(XX, yy, window_length):
x_train = np.zeros((XX.shape[0]-window_length,window_length,XX.shape[1]))
for i in range(x_train.shape[0]):
x_train[i] = XX[i:i+window_length,:]
y_train = yy[window_length:]
return x_train, y_train
Then your testing will work just fine since it is already in the shape (1,10,25).