Confusion about Keras RNN Input shape requirement - keras

I have read plenty of posts for this point. They are inconsistent with each other and every answer seems to have a different explanation so I thought to ask based on my analyzing of all of them.
As Keras RNN documentation states, the input shape is always in this form (batch_size, timesteps, input_dim). I am a bit confused about that but I guess, not sure though, that input_dim is always 1 while timesteps depends on your problem (could be the data dimension as well). Is that roughly correct?
The reason for this question is that I always get an error when trying to change the value of input_dim to be my dataset dimension (as input_dim sounds like that!!), so I made an assumption that input_dim represent the shape of the input vector to LSTM at a time. Am I wrong again?
C = C.reshape((C.shape[0], C.shape[1], 1))
tr_C, ts_C, tr_r, ts_r = train_test_split(C, r, train_size=.8)
batch_size = 1000
print('Build model...')
model = Sequential()
model.add(LSTM(8, batch_input_shape=(batch_size, C.shape[1], 1), stateful=True, activation='relu'))
model.add(Dense(1, activation='relu'))
print('Training...')
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(tr_C, tr_r,
batch_size=batch_size, epochs=1,
shuffle=True, validation_data=(ts_C, ts_r))
Thanks!

Indeed, input_dim is the shape of the input vector at a time. In other words, input_dim is the number of the input features.
It's not necessarily 1, though. If you're working with more than one var, it can be any number.
Suppose you have 10 sequences, each sequence has 200 time steps, and you're measuring just a temperature. Then you have one feature:
input_shape = (200,1) -- notice that the batch size (number of sequences) is ignored here
batch_input_shape = (10,200,1) -- only in specific cases, like stateful = True, you will need a batch input shape.
Now suppose you're measuring not only temperature, but also pressure and volume. Now you've got three input features:
input_shape = (200,3)
batch_input_shape = (10,200,3)
In other words, the first dimension is the number of different sequences. The second is the length of the sequence (how many measures along time). And the last is how many vars at each time.

Related

dimension of the input layer for embeddings in Keras

It is not clear to me whether there is any difference between specifying the input dimension Input(shape=(20,)) or not Input(shape=(None,)) in the following example:
input_layer = Input(shape=(None,))
emb = Embedding(86, 300) (input_layer)
lstm = Bidirectional(LSTM(300)) (emb)
output_layer = Dense(10, activation="softmax") (lstm)
model = Model(input_layer, output_layer)
model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc"])
history = model.fit(my_x, my_y, epochs=1, batch_size=632, validation_split=0.1)
my_x (shape: 2000, 20) contains integers referring to characters, while my_y contains the one-hot encoding of some labels. With Input(shape=(None,)), I see that I could use model.predict(my_x[:, 0:10]), i.e., I could give only 10 characters as an input instead of 20: how is that possible? I was assuming that all the 20 dimensions in my_x were needed to predict the corresponding y.
What you say with None is, that the sequences you feed into the model have the strict length of 20. While a model usually needs a fixed length, recurrent neural networks (as the LSTM you use there), do not need a fixed sequence Length. So the LSTM just does not care whether your sequence contains 20 or 100 timesteps, as it simply loops over them. However, when you specify the amount of timesteps to 20, the LSTM expects 20 and will raise an error if it does not get them.
For more information see this post of Tim♦

CNN alternates between good performance and chance

I have a binary classification problem I am trying to solve with a CNN written in Keras. The input are very sparse 200X125X2 tensors (can be though of as two images stacked together), and its nonzero elements are only ones (representing neuron spike trains). The input is generated using a data generator that I have built, so the model is trained using the fit_generator function.
I have tried various architectures, and some show a decent performance (~88%), but the thing is that sometimes when I train new models, they don't seem to work at all, giving a chance (50%) result every epoch. The weird thing is that it happens sometimes to the same architectures that worked well before. I am running the code on Google Colab (GPU) with TensorFlow 2.0. I have check multiple times that I haven't changed anything in the code. I know that random initialization of the weights and biases may cause slight changes in the performance, but it looks like something else.
Any ideas will be very helpful. Thanks!
Here is the relevant code for one of the models that had this problem (I am using unusual kernels, I know):
# General settings
x_max = 10
x_size, t_size, N_features = parameters(x_max)
batch_size = 64
N_epochs = 10
N_final = 10*N_features
N_final = int(N_final - N_final%(batch_size))
N_val = 100*batch_size
N_test = N_final/5
# Setting up the architecture of the network and compiling
model = Sequential()
model.add(SeparableConv2D(50, (50,30), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(SeparableConv2D(100, (10,6), data_format='channels_first', input_shape=(2,x_size, t_size)))
model.add(MaxPooling2D(pool_size=2, data_format='channels_first'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fitiing the model on generated data
filepath="......hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
start = time.time()
fit_history = model_delta.fit_generator(generator = data_generator(batch_size,x_max,'delta','_',100),
steps_per_epoch = N_final//batch_size,
validation_data = data_generator(batch_size,x_max,'delta','_',100),
validation_steps = N_val//batch_size,
callbacks = [checkpoint],
epochs = N_epochs)
end = time.time()
The most suspicious thing I see is a 'relu' near the end of the model. Depending on the initialization and on the learning rate, ReLUs can be unlucky and fall into an all-zeros case. When this happens, they completely stop gradients and don't train anymore.
By the looks of your problem (sometimes it works, sometimes it doesn't), it seems very plausible that it's the relu.
So, the first suggesion (this always solves it) is to add a batch normalization before the activation:
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid')
Hint, if you are going to use it with the 4D tensors before the flatten, remember to use the channels dimension: BatchNormalization(1).

2-dimensional LSTM in Keras

I am new to Keras and LSTMs -- I want to train a model on 2-dimensional sequences (ie, movement in a grid-space), as opposed to 1-dimensional sequences (like characters of text).
As a test, I first tried just one dimension, and I am doing it successfully with the following setup:
model = Sequential()
model.add(LSTM(512, return_sequences=True, input_shape=X[0].shape, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(512, return_sequences=False, dropout=0.2))
model.add(Dense(len(y[0]), activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
model.fit(X, y, epochs=50)
I'm formatting the data like this:
data = ## list of integers (1D)
inputs = []
outputs = []
for i in range(len(data) - SEQUENCE_LENGTH):
inputs.append(data[i:i + SEQUENCE_LENGTH])
outputs.append(data[i + SEQUENCE_LENGTH])
X = np.array([to_categorical(np.array(input), CATEGORY_LENGTH) for input in inputs])
y = to_categorical(np.array(outputs), CATEGORY_LENGTH)
This is straightforward and converges quickly.
But if instead of a list of integers, my data consists of 2D tuples, I can no longer create categorical (one-hot) arrays to pass to the LSTM layers.
I've tried not using categorical arrays and simply passing the tuples to the model. In this case, I've changed my output layer to:
model.add(Dense(1, activation="linear"))
But that does not converge, or at least moves incredibly slowly.
How can I adapt this code to handle input with additional dimensions?
This previous answer should apply to your question as well. The only difference is that you will have to convert your tuple to a data frame beforehand.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Resources