I have created the following SimpleRNN using Keras:
X = X.reshape((X.shape[0], X.shape[1], 1))
tr_X, ts_X, tr_y, ts_y = train_test_split(X, y, train_size=.8)
batch_size = 1000
print('RNN model...')
model = Sequential()
model.add(SimpleRNN(64, activation='relu', batch_input_shape=(batch_size, X.shape[1], 1)))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print (model.summary())
print ('\n')
model.fit(tr_X, tr_y,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_X, ts_y))
For the model summary, I get the following:
Layer (type) Output Shape Param #
=================================================================
simple_rnn_1 (SimpleRNN) (1000, 64) 4224
_________________________________________________________________
dense_1 (Dense) (1000, 1) 65
=================================================================
Total params: 4,289
Trainable params: 4,289
Non-trainable params: 0
_________________________________________________________________
Given that I have a dataset of 10,000 samples and 64 features. My goal is to generate a classification model by training it using this dataset (class labels are binary 0 and 1). Now, I am trying to understand what is going on here. As seen in 'Output Shape' column, the simple_rnn_1 has (1000, 64). I interpret it as 1000 rows (which is the batch) and 64 features. Assuming the code above is logically correct, my questions is:
How does RNN handle this matrix (i.e., (1000,64))? Does it input
each column something like this figure?
Should SimpleRNN() units always be equal to the number of features?
Thank you
In the code, you defined batch_input_shape to be with shape: (batch_size, X.shape[1], 1)
which means that you will insert to the RNN, batch_size examples, each example contains X.shape[1] time-stamps (number of pink boxes in your image) and each time-stamp is shape 1 (scalar).
So yes, input shape of (1000,64,1) will be exactly like you said - each column will be input to the RNN.
No! units will be your output dim. Usually more units means more complex network (just like in regular neural network) -> more parameters to learn.
Units will be the shape of the RNN's internal state.
(So, in your example, if you declare units=2000 your output will be (1000,2000).)
Related
In Keras, LSTM is in the shape of [batch, timesteps, feature]. What if I indicate the input as keras.Input(shape=(20, 1)) and feed a matrix of (100, 20, 1) as input? What's the number of batch that it's considering in this case? Is the batch size 100 with 20 time stems in each batch?
TL;DR
The batch, timestep, features in your case is defined as None, 20, 1, where the batch represents the batch_size parameter passed during model.fit. The model does not need to know this before hand. Therefore, when you define your input layer (or your LSTM layer's input shape), you simply defined (timesteps, features) which is (20, 1). A simple model.summary() would show you that that input size is translated to (None, 20, 1) while creating the computation graph.
Deeper dive into the subject
A good way to understand whats going on is to simply print the summary of your model. Let me take a simple example here and walk you through the steps -
#Creating a simple stacked LSTM model
from tensorflow.keras import layers, Model
import numpy as np
inp = layers.Input((20,1)) #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 20, 1)] 0
lstm_14 (LSTM) (None, 20, 5) 140
lstm_15 (LSTM) (None, 4) 160
dense_8 (Dense) (None, 1) 5
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________
As you see here, the flow of tensors (more specifically how the shapes of tensors change as they flow down the network) are displayed. As you can see, I am using the functional API which allows me to specifically create an input layer of the shape 20,1 which I then pass to the LSTM. But interestingly, you can see that the actual shape of this Input layer is (None, 20, 1). This is the batch, timesteps, features that you are also referring to.
The time steps are 20, and a single feature, so thats easy to understand, however, the None is a placeholder for the batch_size parameter which you define during the model.fit
#Fit model
X_train, y_train = np.random.random((100,20,1)), np.random.random((100,))
model.fit(X_train, y_train, batch_size=10, epochs=2)
Epoch 1/2
10/10 [==============================] - 1s 4ms/step - loss: 0.6938
Epoch 2/2
10/10 [==============================] - 0s 3ms/step - loss: 0.6932
In this example, I set the batch_size to 10. This means, that when you train the model, each "step" will pass batches of the shape (10, 20, 1) to the model and there will be 10 such steps in each epoch, because the overall size of the training data is (100, 20, 1). This is indicated by the 10/10 that you see in front of the progress bar for each epoch.
Another interesting thing to note, is that you dont necessarily need to define the dimensions of the input as long as your obey the basic rules of model training and batch size constraints. Here is an example. Here I define the number of timesteps as None which means that I can now pass variable length timesteps (variable length sentences for an example) to encode using the LSTM layers.
from tensorflow.keras import layers, Model
import numpy as np
inp = layers.Input((None,1)) #<------
x = layers.LSTM(5, return_sequences=True)(inp)
x = layers.LSTM(4)(x)
out = layers.Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
model.compile(loss='binary_crossentropy')
model.summary()
Model: "model_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(None, None, 1)] 0
lstm_18 (LSTM) (None, None, 5) 140
lstm_19 (LSTM) (None, 4) 160
dense_10 (Dense) (None, 1) 5
=================================================================
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________
This means that the model doesn't need to know how many timesteps it will have to work with beforehand, similar to the fact that it doesn't need to know what batch_size it would get beforehand. These things can be interpreted during the model.fit or passed as a parameter. Notice the model.summary() simply extends this lack of information around the timesteps dimension to the subsequent layers.
An important note though - LSTMs can work with variable size inputs because all you have to do is pass the timesteps as None in the example above, however, you have to ensure that each batch independently has the same number of time steps. In other words, to work with variable-sized sentences say [(20,1), (25, 1), (20, 1), ...] either use a batch size of 1 so that each batch has a consistent size, or create a generator which creates batches of equal batch_size and combine sentences with constant length. For example the first batch is only 5 (20,1) sentences, the second batch is only 5 (25,1) sentences etc. The second method is faster than the first, but may be more painful to setup.
Bonus
Also, for anyone curious around what is the effect of batch_size on model training, a large batch_size might be very helpful to speed up computation speed as its preferred over decaying the learning rate but it can cause what is known as a Generalization Gap. This topic is well explored in this awesome paper.
These 2 papers should give a lot of clarity around how to use batch_size as a powerful parameter for your model training, which is quite often ignored.
I am trying to predict a time series with LSTM and am writing my code in Python by using Keras.
I have 30 features as input (continuous value) and a binary output.
I would like to use the 20 previous timesteps (t-20, t-19, .. , t-1) of each input feature in order to predict the output of next timestep (t+1).
My batch size is fixed at 52. What does this exactly mean?
I don't understand how to define the shape of the input layer.
The stacked LSTM example in the Keras documentation says that the last dimension of the 3D tensor will be 'data_dim'.
Is it input dimension or output dimension?
If this is output dimension, then I can't use more than one input feature as in my case the input_shape will be (batch_size=52,time_step=20,data_dim=1).
Also, in case data_dim is input shape, then I have tried to define a four layers-LSTM and the model shape results to be like this.
Layer (type) Output Shape Param #
================================================================= input_2 (InputLayer) (52, 20, 30) 0
_________________________________________________________________ lstm_3 (LSTM) (52, 20, 128) 81408
_________________________________________________________________ lstm_4 (LSTM) (52, 128) 131584
_________________________________________________________________ dense_2 (Dense) (52, 1) 129
================================================================= Total params: 213,121 Trainable params: 213,121 Non-trainable params: 0
Does this architecture make sense? Am I making some obvious mistakes?
My snippet of code is the one below:
input_layer=Input(batch_shape=(batch_size,input_timesteps,input_dims))
lstm1=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=True)(input_layer)
lstm2=LSTM(num_neurons,activation = 'relu',dropout=0.0,stateful=False,return_sequences=False)(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
I am getting very poor results and thus trying to debug each step.
If you want to use deep learning techniques you should try to overfit first and then reduce the complexity till you reach a break even point in terms of both neural complexity, training error and test error.
You are actually using a larger feature space in the hidden layer, are you sure your data are able to fit this?
Do you have enough rows to let the model learn this complex representation?
Otherwise I would suggest you something like this, in order to extrapolate the most important dimensions:
num_neurons1 = int(input_dims/2)
num_neurons2 = int(input_dims/4)
input_layer=Input(batch_shape=(batch_size, input_timesteps, input_dims))
lstm1=LSTM(num_neurons, activation = 'relu', dropout=0.0, stateful=False, return_sequences=True, kernel_initializer="he_normal")(input_layer)
lstm2=LSTM(num_neurons2, activation = 'relu', dropout=0.0, stateful=False, return_sequences=False, kernel_initializer="he_normal")(lstm1)
output_layer=Dense(1, activation='sigmoid')(lstm2)
model=Model(inputs=input_layer,outputs=output_layer)
Also, you are using relu as activation function.
Does it fit your data? Would be better you have only positive data after rescaling & normalization.
In case it does fit, you can also use a proper kernel initialization.
To better understand the problem, you could also post the optimizer parameters and the behaviour while training during epochs.
Binary classification problem: I want to have One input layer(optional), One Conv1D layer then output layer of 1 neuron predicting either 1 or 0.
Here is my model:
x_train = np.expand_dims(x_train,axis=1)
x_valid = np.expand_dims(x_valid,axis=1)
#x_train = x_train.reshape(x_train.shape[0], 1, x_train.shape[1])
#x_valid = x_train.reshape(x_valid.shape[0], 1, x_train.shape[1])
model = Sequential()
#hidden layer
model.add(Convolution1D(filters = 1, kernel_size = (3),input_shape=(1,x_train.shape[2])))
#output layer
model.add(Flatten())
model.add(Dense(1, activation = 'softmax'))
sgd = SGD(lr=0.01, nesterov=True, decay=1e-6, momentum=0.9)
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print('model compiled successfully')
model.fit(x_train, y_train, nb_epoch = nb_epochs, validation_data=(x_valid,y_valid), batch_size=100)
Input shape: x_train.shape = (5,1,133906) which is (batch,steps,channels) respectively. Steps added through expand_dims. Actual size (5,133906) which is 5 samples of time series data of length 133906 sampled randomly sometimes at 2 ms and sometimes at 5 ms.
Error Message: ValueError: Negative dimension size caused by subtracting 3 from 1 for 'conv1d_1/convolution/Conv2D' (op: 'Conv2D') with input shapes: [?,1,1,133906], [1,3,133906,1].
How do I resolve this issue? What should the size of x_train and the input_size argument passed inside Conv1D be?
Convolution1D layers takes input in a format of [batch, steps, channels]
Your length of convolution window (kernel size) cannot be larger than number of steps.
Therefore if you want to use your defined input shape of:
x_train.shape = (5,1,133906)
you need to change kernel size to 1
i.e. change line 9 to
model.add(Convolution1D(filters = 1, kernel_size = 1,input_shape=(1,x_train.shape[2])))
However, this will only enable your example to work. Depending on your goals, data type, etc. you might want to try different combinations of your kernel size and dimensions of input data to obtain best results.
I am trying to introduce sparsity in the training samples. My data matrix has a size of (say) NxP and I want to pass it through a layer (keras layer) which has weights of size same as the input size. That is trainable weight matrix W has a shape of NxP. I want to do an hadamard product (element-wise multiplication) of Input matrix to this layer. W multiplied element-wise with input. How to get a trainable layer for W in this case ?
EDIT:
By the way, thank you so much for the quick reply. However, the hadamard product I want to do is between two matrices, one is the input, lets call it X and my X is shape of NxP. And I want my kernel in the hadamard layer to be the same size as X. So kernel should have a size of NxP too. And element wise multiplication of two matrices is achived by the call function.
But the current implementation gives the kernel size as P only. Also,I tried changing the shape of the kernel in the build as follows:
self.kernel = self.add_weight(name='kernel',
shape=input_shape,
initializer='uniform',
trainable=True)
But it gives me the error below:
TypeError: Failed to convert object of type to Tensor. Contents: (None, 16). Consider casting elements to a supported type.
Here P is 16 and I will get my N during the runtime and N is similar to the number of training samples.
Thank you in advance for the help.
Take the example of the documentation to create a layer, and in the call function just define it to be x * self.kernel.
This is my POC:
from keras import backend as K
from keras.engine.topology import Layer
from keras.models import Sequential
from keras.layers import Dense, Activation
import numpy as np
np.random.seed(7)
class Hadamard(Layer):
def __init__(self, **kwargs):
super(Hadamard, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=(1,) + input_shape[1:],
initializer='uniform',
trainable=True)
super(Hadamard, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
print(x.shape, self.kernel.shape)
return x * self.kernel
def compute_output_shape(self, input_shape):
print(input_shape)
return input_shape
N = 10
P = 64
model = Sequential()
model.add(Dense(128, input_shape=(N, P), activation='relu'))
model.add(Dense(64))
model.add(Hadamard())
model.add(Activation('relu'))
model.add(Dense(32))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
model.fit(np.ones((10, N, P)), np.ones((10, N, 1)))
print(model.predict(np.ones((20, N, P))))
If you need to use it as the first layer you should include the input shape parameter:
N = 10
P = 64
model = Sequential()
model.add(Hadamard(input_shape=(N, P)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
This results in:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
hadamard_1 (Hadamard) (None, 10, 64) 640
=================================================================
Total params: 640
Trainable params: 640
Non-trainable params: 0
I have a single training batch of 600 sequential points (x(t), y(t)) with x(t) being a 25 dimensional vector and y(t) being my target (1 dim). I would like to train an LSTM to predict how the series would continue given a few additional x(t) [t> 600]. I tried the following model:
model = Sequential()
model.add(LSTM(128, input_shape = (600,25), batch_size = 1, activation= 'tanh', return_sequences = True))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=20 ,verbose=2) prediction
prediction = model.predict(testX, batch_size = 1)
Fitting works fine, but I keep getting the following error at the prediction step:
Error when checking : expected lstm_46_input to have shape (1, 600, 25) but got array with shape (1, 10, 25)
What am I missing?
Here are my shapes:
trainX.shape = (1,600,25)
trainY.shape = (1,600,1)
testX.shape = (1,10,25)
According to Keras documentation input of LSTM (or any RNN) layers should be of shape (batch_size, timesteps, input_dim) where your input shape is
trainX.shape = (1,600,25)
So it means for training you are passing only one data with 600 timesteps and 25 features per timestep. But I got a feeling that you actually have 600 training data each having 25 timesteps and 1 feature per timestep. I guess your input shape (trainX) should be 600 x 25 x 1. Train target (trainY) should be 600 x 1 If my assumption is right then your test data should be of shape 10 x 25 x 1. First LSTM layer should be written as
model.add(LSTM(128, input_shape = (25,1), batch_size = 1, activation= 'tanh', return_sequences = False))
If your training data is in fact (1,600,25) what this means is you are unrolling the LSTM feedback 600 times. The first input has an impact on the 600th input. If this is what you want, you can use the Keras function "pad_sequences" to add append zeros to the test matrix so it has the shape (1,600,25). The network should predict zeros and you will need to add 590 zeros to your testY.
If you only want say 10 previous timesteps to affect your current Y prediction, then you will want to turn your trainX into shape (590,10,25). The input line will be something like:
model.add(LSTM(n_hid, stateful=True, return_sequences=False, batch_input_shape=(1,nTS,x_train.shape[2])))
The processing to get it in the form you want could be something like this:
def formatTS(XX, yy, window_length):
x_train = np.zeros((XX.shape[0]-window_length,window_length,XX.shape[1]))
for i in range(x_train.shape[0]):
x_train[i] = XX[i:i+window_length,:]
y_train = yy[window_length:]
return x_train, y_train
Then your testing will work just fine since it is already in the shape (1,10,25).