Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?

The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.

I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.


How to prepare data for a many to one binary classification LSTM?

I have a time series data set for 38,000 distinct patients that comprises their 48 hours of physiological data with 30 features, so every patient has 48rows(for every hour) and a binary outcome(0/1) at the end of 48th hour only, the total training set is (38,000*48 = 1,824,000) rows .
To my understanding this is a Many-to-one LSTM binary classification, so should my input shape be (38,000,48,30) (sample_size, time_steps, features) and should the return_sequence be set to False to return output of the last hidden neuron only?
Can somebody review my understanding on this?
Yes, mostly you are on the right track. Refer the code below for a better understanding of this.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Bidirectional
from keras.metrics import binary_crossentropy
# vocab size
total_features = 30
no_of_pateints = 38,000
time_steps = 48
model = Sequential()
# you can also use Bidirectional layer to speed up the learning and reduce
# training time and here you can keep return_sequence as true
# model.add(
input_shape=(no_of_patients, time_steps, total_features),
# return_sequence should be False if there is only one LSTM layer. Otherwise in case of multiple layers,
the last layers should have return_sequence as False
input_shape=(no_of_patients, time_steps, total_features),
model.add(Dense(2, activation='softmax'))
Let me know if you have any confusion in the above code or if you need more explanation
Yes, you're mostly right:
shape of inputs = (patients, 48, 30)
shape of targets = (patients, 1)
You should use return_sequences=False in your last LSTM layer. (If you have more recurrent layers before the last LSTM, keep return_sequences=True in them)

Modify ResNet50 output layer for regression

I am trying to create a ResNet50 model for a regression problem, with an output value ranging from -1 to 1.
I omitted the classes argument, and in my preprocessing step I resize my images to 224,224,3.
I try to create the model with
def create_resnet(load_pretrained=False):
if load_pretrained:
weights = 'imagenet'
weights = None
# Get base model
base_model = ResNet50(weights=weights)
optimizer = Adam(lr=1e-3)
base_model.compile(loss='mse', optimizer=optimizer)
return base_model
and then create the model, print the summary and use the fit_generator to train
history = model.fit_generator(batch_generator(X_train, y_train, 100, 1),
validation_data=batch_generator(X_valid, y_valid, 100, 0),
shuffle = 1)
I get an error though that says
ValueError: Error when checking target: expected fc1000 to have shape (1000,) but got array with shape (1,)
Looking at the model summary, this makes sense, since the final Dense layer has an output shape of (None, 1000)
fc1000 (Dense) (None, 1000) 2049000 avg_pool[0][0]
But I can't figure out how to modify the model. I've read through the Keras documentation and looked at several examples, but pretty much everything I see is for a classification model.
How can I modify the model so it is formatted properly for regression?
Your code is throwing the error because you're using the original fully-connected top layer that was trained to classify images into one of 1000 classes. To make the network working, you need to replace this top layer with your own which should have the shape compatible with your dataset and task.
Here is a small snippet I was using to create an ImageNet pre-trained model for the regression task (face landmarks prediction) with Keras:
def create_model(input_shape, top='flatten'):
if top not in ('flatten', 'avg', 'max'):
raise ValueError('unexpected top layer type: %s' % top)
# connects base model with new "head"
BottleneckLayer = {
'flatten': Flatten(),
'avg': GlobalAvgPooling2D(),
'max': GlobalMaxPooling2D()
base = InceptionResNetV2(input_shape=input_shape,
x = BottleneckLayer(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='linear')(x)
model = Model(inputs=base.inputs, outputs=x)
return model
In your case, I guess you only need to replace InceptionResNetV2 with ResNet50. Essentially, you are creating a pre-trained model without top layers:
base = ResNet50(input_shape=input_shape, include_top=False)
And then attaching your custom layer on top of it:
x = Flatten()(base.output)
x = Dense(NUM_OF_LANDMARKS, activation='sigmoid')(x)
model = Model(inputs=base.inputs, outputs=x)
That's it.
You also can check this link from the Keras repository that shows how ResNet50 is constructed internally. I believe it will give you some insights about the functional API and layers replacement.
Also, I would say that both regression and classification tasks are not that different if we're talking about fine-tuning pre-trained ImageNet models. The type of task mostly depends on your loss function and the top layer's activation function. Otherwise, you still have a fully-connected layer with N outputs but they are interpreted in a different way.

how to build LSTM RNN network for binary classification?

I am trying to build a deep learning network for binary classification using LSTM based RNN.
Here is what I have tried using python
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
import numpy as np
train = np.loadtxt("TrainDatasetFinal.txt", delimiter=",")
test = np.loadtxt("testDatasetFinal.txt", delimiter=",")
y_train = train[:,7]
y_test = test[:,7]
train_spec = train[:,6]
test_spec = test[:,6]
model = Sequential()
model.add(Embedding(8, 256, input_length=1))
model.add(LSTM(output_dim=128, activation='sigmoid',
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
model.fit(train_spec, y_train, batch_size=2000, nb_epoch=11)
score = model.evaluate(test_spec, y_test, batch_size=2000)
Here is a sample from the dataset
(Patient Number, time in millisecond, accelerometer x-axis,y-axis,
z-axis,magnitude, spectrogram,label (0 or 1))
I believe that the my problem in those lines but I cannot recognize the error
model.add(Embedding(8, 256, input_length=1))
model.add(LSTM(output_dim=128, activation='sigmoid',
and this is the error I have got
InvalidArgumentError (see above for traceback): indices[0,0] = -2147483648 is not in [0, 8)
Is the sample from your dataset provided above, the data you are trying to feed into the model? If so, there is a problem because your data is 2-dimensional, but for an RNN you need a 3-dimensional input tensor. You need a feature dimension, a batch size dimension and a time dimension. It looks like you are missing a proper time dimension. You should not have a column with 15, 31, 46,... (time in milliseconds) this should be shaped into its own dimension, so your input data looks like a "cube". Otherwise, you don't need a temporal model at all. Furthermore, you should standardize your input since your features have vastly different orders of magnitude. Moreover, the batch size of 2000 is almost certainly too large. Are you trying to express that your whole training set has 2000 samples? In this case, you may not have enough training data for the model you are building.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Keras: LSTM with class weights

my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.
