poor performance keras lstm - python-3.x

I want to create a lstm model to classify signals.
Let's say I have 1000 files of signals. Each file contains a matrix of shape (500, 5) that means that in each file, I have 5 features (columns) and 500 rows.
0 1 2 3 4
0 5 5.3 2.3 4.2 2.2
... ... ... ... ... ...
499 2500 1.2 7.4 6.7 8.6
For each file, there is one output which is a boolean (True or False). the shape is (1,)
I created a database, data, with a shape (1000, 5, 500) and the target vector is of shape (1000, 1).
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give the matrix like this to the lstm model? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer.
The code with the lstm is like this:
input_shape=(X_train.shape[1], X_train.shape[2]) #(5,500), i.e timesteps and features
model = Sequential()
model.add(LSTM(20, return_sequences=True))
model.add(LSTM(20))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
I changed the number of cells in a LSTM layer and the number of layers but the score is basically the same (0.19). Is it normal to have such a bad score in my case? Is there a better way to go ?
Thanks

By transforming your data into (samples, 5, 500) you are giving the LSTM 5 timesteps and 500 features. From your data it seems you would like to process all 500 rows and 5 features of each column to make a prediction. The LSTM input is (samples, timesteps, features). So if your rows represent timesteps in which 5 measurements are taken, then you need to permute the last 2 dimensions and set input_shape=(500, 5) in the first LSTM layer.
Also since your output is Boolean, you get a more stable training by using activation='sigmoid' in your final dense layer and train with loss='binary_crossentropy for binary classification.

Related

Request for improvement suggestion on my CNN learning model?

I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns  per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time
You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)

Predicting with different time step with a model trained for different time step data

I have trained my LSTM with 3 time steps. Following is the Keras LSTM layer.
model.add(LSTM(32, return_sequences=True, input_shape=(None, 3))).
ex:
X Y
[[1,2,3],[2,3,4],[4,5,6]] [[4],[5],[7]]
Now I need to predict the next value of a sequence with different time_steps (ex: 2)
X= [[1,2]]
When I use X= [[1,2]] I am getting following error
ValueError: Error when checking input: expected lstm_1_input to have shape (None, 3)
but got array with shape (1, 2)
Should I provide the same shape while I used for training.
Or can I still use a different timesteps (input shape) for predicting.
Appreciate your help on this issue.
I believe you need to use the same shape when using your model to predict on new data. Your data was trained with 3 timesteps (train_X), so you should feed it a 3-timesteps input when you model.predict your test data (test_X).

How to correctly get layer weights from Conv2D in keras?

I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here

Dimensions not matching in keras LSTM model

I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.

Keras: LSTM with class weights

my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
compile().
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.

Resources