Struggle with LSTM and RNN using Keras - python-3.x

I'm working on a speech recognition problem running on Colab using LSTM. The audio files were converted into spectrograms and then normalized. There are 6840 spectrograms in total and the shape of each one is (288, 864, 4).
I already tried a few examples with RNN and CNN and they worked, but when I try an example using a LSTM I get shape errors, every time either there is one more or one less dimension than expected. Here are some of these cases :
rnn = keras.Sequential()
rnn.add(keras.layers.SimpleRNN(500, input_shape = (864, 4)))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.Dense(212, activation = 'softmax'))
rnn.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',metrics = ['accuracy'])
rnn.fit(X_train, y_train, epochs = 5, validation_data=(X_test, y_test))
scores = rnn.evaluate(X_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', '%.2f' % (scores[1] * 100), '%')
The following error is raised on the first LSTM layer : ValueError: Input 0 of layer lstm_54 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 500]
If I remove the SimpleRNN line and feed the input directly to the first LSTM like this
rnn.add(keras.layers.LSTM(500, return_sequences = True, input_shape = (288, 864, 4)))
I get : ValueError: Input 0 of layer lstm_56 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 288, 864, 4]
I tried reshaping the images to (4, 288 * 864) and got the same error when trying to use the RNN layer, but with just the LSTM I got InvalidArgumentError: Incompatible shapes: [32] vs. [32,4].
No idea where the 32 came from, though.
One last thing, not really an issue but more of a request, is there any library that can resize images the simple way? 288x864 is too big for Colab, so I'll have to do it eventually to be able to load all 6840 images and feed it to the neural network. Right now I'm just using 100 samples to test.
Feel free to leave suggestions about other methods, cabalistic number of nodes/layers or anything like that.

LSTM input is 3 dimensions [n_samples, n_timesteps, n_features], so your first line also need to enable return sequences:
rnn.add(keras.layers.SimpleRNN(500, return_sequences = True, input_shape = (864, 4)))
Next, your Dense layer will complain from wrong input size, so you want to remove return_sequence on the last LSTM network:
rnn.add(keras.layers.LSTM(500))
If you still want to keep the return_sequences = True on the last LSTM layer, you might want to wrap the Dense layer in a TimeDistributed.
I tried it on the following input and they seems to work
X_train = np.random.rand(100, 864, 4)
y_train = np.random.rand(100, 1)
The PIL from pillow package has plenty of image manipulation methods.

Related

ValueError Input 0 of layer sequential_13 is incompatible with the layer: expected ndim=3, found ndim=4 Full shape received: (None, None, None, None)

I am trying to work with a Simple RNN to predict Parkinson's Gait using Physionet Database. I am feeding the RNN with Images of height 240 and width of 16 pixels. I am also using Model checkpoint and monitor validation accuracy to save the best weights. While trying the input shape to the RNN I am getting that error as
ValueError: Input 0 of layer sequential_13 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, None, None, None)
RNN model:
model = Sequential()
model.add(SimpleRNN(24, kernel_initializer='glorot_uniform', input_shape=(64,240), return_sequences = True))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2))
model.add(Activation('softmax'))
opt = optimizers.RMSprop(learning_rate=0.001, decay=1e-6)
epoch=10
early_stopping = EarlyStopping(monitor='val_accuracy', patience=60, verbose=1, mode='auto')
checkpoint = ModelCheckpoint("model_parkinsons.h5",
monitor='val_accuracy', verbose=0, save_best_only=True,
save_weights_only=False, mode='auto', save_freq='epoch')
model.compile(loss='binary_crossentropy',
optimizer=opt,
metrics=['accuracy'])
Batch size:64
Height of the image: 240
a.shape
Output: (64, 16, 240, 1)
I tried to feed the input shape as a.shape[1:]
But I am getting the error as expected 3 dimension but got 4 dimension.
Please help me how to resolve this.
In your first layer, you specified the input shape of your network. This shape does not include your batch size. So, if you specify "input_shape=(64, 240)", this would mean that your final input would need to have the shape (batch_size, 64, 240). Since 64 is your batch size, it seems like there definitely went something wrong there. Additionally, your input has four dimensions: (64, 16, 240, 1), but your first layer takes three dimensional inputs. I do not quite understand what you want to achieve with your model, but it should work if you input a[:, :, :, 0] into your model instead of a. Additionally, you need to set "input_shape=(16, 240)" in your first layer. If you do these two things, then your model uses the RNN to process one column of the image at a time. This approach does not make any sense to me (since RNNs are not used for image processing, at least not in this form), but I do not see any other way to interpret what you already did.

ValueError: Input 0 of layer simple_rnn_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 50]

I am new to tensor flow and i am trying to build a multivariate (two features for each time step) multi step (forecast 12 time step in the future) forecast model.
I created tensorflow data set to feed it to my model:
When i print the shape of my data set, i find the following:
<PrefetchDataset shapes: ((None, None, 2), (None, None)), types: (tf.float32, tf.float32)>
This is what i understand:
(None, None, 2) = Input tensor "features" : (batchSize, Timesteps input, Features by time step)
(None, None) =Output Tensor "label" (batchSize, future forecsated time steps )
I follow up by creating my model as following :
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)
model = keras.models.Sequential([
keras.layers.SimpleRNN(50),
keras.layers.SimpleRNN(100),
keras.layers.Dense(12),
])
optimizer = keras.optimizers.SGD(lr=1.5e-6, momentum=0.9)
model.compile(loss="mae",
optimizer=optimizer,
metrics=["mae"])
When i fit the model
model.fit(train_set, epochs=5,
validation_data=valid_set)
I have the following error:
ValueError: Input 0 of layer simple_rnn_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 50]
I do understand that SimpleRNN layer is expecting a 3 dimension tensor. But i think that my input has this dimension.
Thanks a lot for the help.
If you need me to share with you how i am creating my dataset, i would gladly do it.
The issue was coming form the second layer not the first one. Basically, the activation of the first layer issue a vector rather than a sequence, so it will have a rank of 2, eg: (a, b). But the second layer requires a three dimension input. TO solve this i added the return_sequences=True in the first layer of RNN.
If train_set is a numpy array, pass train_set.reshape((1,50)) to model.fit()
model.fit(train_set.reshape((1,50)), epochs=5,
validation_data=valid_set)
Then you wouldn't need to apply return_sequences=True to the first RNN cell either.

How to solve "logits and labels must have the same first dimension" error

I'm trying out different Neural Network architectures for a word based NLP.
So far I've used bidirectional-, embedded- and models with GRU's guided by this tutorial: https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571 and it all worked out well.
When I tried using LSTM's however, I get an error saying:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
How can I solve this?
My source and target dataset consists of 7200 sample sentences. They are integer tokenized and embedded. The source dataset is post padded to match the length of the target dataset.
Here is my model and the relevant code:
lstm_model = Sequential()
lstm_model.add(Embedding(src_vocab_size, 128, input_length=X.shape[1], input_shape=X.shape[1:]))
lstm_model.add(LSTM(128, return_sequences=False, dropout=0.1, recurrent_dropout=0.1))
lstm_model.add(Dense(128, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add((Dense(target_vocab_size, activation='softmax')))
lstm_model.compile(optimizer=Adam(0.002), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = lstm_model.fit(X, Y, batch_size = 32, callbacks=CALLBACK, epochs = 100, validation_split = 0.25) #At this line the error is raised!
With the shapes:
X.shape = (7200, 147)
Y.shape = (7200, 147, 1)
src_vocab_size = 188
target_vocab_size = 186
I've looked at similar question on here already and tried adding a Reshape layer
simple_lstm_model.add(Reshape((-1,)))
but this only causes the following error:
"TypeError: __int__ returned non-int (type NoneType)"
It's really weird as I preprocess the dataset the same way for all models and it works just fine except for the above.
You should have return_sequences=True and return_state=False in calling the LSTM constructor.
In your snippet, the LSTM only return its last state, instead of the sequence of states for every input embedding. In theory, you could have spotted it from the error message:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
The logits should be three-dimensional: batch size × sequence length × number of classes. The length of the sequences is 147 and indeed 32 × 147 = 4704 (number of your labels). This could have told you the length of the sequences disappeared.

TypeError when trying to create a BLSTM network in Keras

I'm a bit new to Keras and deep learning. I'm currently trying to replicate this paper but when I'm compiling the second model (with the LSTMs) I get the following error:
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'"
The description of the model is this:
Input (length T is appliance specific window size)
Parallel 1D convolution with filter size 3, 5, and 7
respectively, stride=1, number of filters=32,
activation type=linear, border mode=same
Merge layer which concatenates the output of
parallel 1D convolutions
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Bidirectional LSTM consists of a forward LSTM
and a backward LSTM, output_dim=128
Dense layer, output_dim=128, activation type=ReLU
Dense layer, output_dim= T , activation type=linear
My code is this:
from keras import layers, Input
from keras.models import Model
def lstm_net(T):
input_layer = Input(shape=(T,1))
branch_a = layers.Conv1D(32, 3, activation='linear', padding='same', strides=1)(input_layer)
branch_b = layers.Conv1D(32, 5, activation='linear', padding='same', strides=1)(input_layer)
branch_c = layers.Conv1D(32, 7, activation='linear', padding='same', strides=1)(input_layer)
merge_layer = layers.Concatenate(axis=-1)([branch_a, branch_b, branch_c])
print(merge_layer.shape)
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
dense_layer = layers.Dense(128, activation='relu')(BLSTM2)
output_dense = layers.Dense(1, activation='linear')(dense_layer)
model = Model(input_layer, output_dense)
model.name = "lstm_net"
return model
model = lstm_net(40)
After that I get the above error. My goal is to give as input a batch of 8 sequences of length 40 and get as output a batch of 8 sequences of length 40 too. I found this issue on Keras Github LSTM layer cannot connect to Dense layer after Flatten #818 and there #fchollet suggests that I should specify the 'input_shape' in the first layer which I did but probably not correctly. I put the two print statements to see how the shape is changing and the output is:
(?, 40, 96)
(?, 256)
The error occurs on the line BLSTM2 is defined and can be seen in full here
Your problem lies in these three lines:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, input_shape=(8,40,96)))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
As a default, LSTM is returning only the last element of computations - so your data is losing its sequential nature. That's why the proceeding layer raises an error. Change this line to:
BLSTM1 = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(merge_layer)
print(BLSTM1.shape)
BLSTM2 = layers.Bidirectional(layers.LSTM(128))(BLSTM1)
In order to make the input to the second LSTM to have sequential nature also.
Aside of this - I'd rather not use input_shape in middle model layer as it's automatically inferred.

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Resources