Multi-step Time Series Prediction w/ seq2seq LSTM - keras

I am trying to predict time series data using an encoder/decoder with LSTM layers. So far, I am using 20 points of past data to predict 20 future points. For each sample of 20 past data points, the 1st value in the predicted sequence is very close to the true 1st value in each sequence: predicting 1 step into the future
However, for the 2nd value in each sequence (2 timesteps into the future), the predicted values look like they are "shifted": predicting 2 steps into the future
This "shifted" nature is true for all values of the predicted sequences, with the shifts increasing as I go farther into the predicted sequence. Here is the code for my model:
model = Sequential()
model.add(LSTM(input_dim = 1, output_dim=128,
return_sequences=False))
model.add(RepeatVector(20))
model.add(LSTM(output_dim=128, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
Is it something with RepeatVector? Any help would be appreciated.

Related

Custom loss for single-label, multi-class problem

I have a single-label, multi-class classification problem, i.e., a given sample is in exactly one class (say, class 3), but for training purposes, predicting class 2 or 5 is still okay to not penalise the model that heavily.
For example, the ground truth for 1 sample is [0,1,1,0,1] of 5 classes, instead of a one-hot vector. This implies that, the model predicting any one (not necessarily all) of the above classes (2,3 or 5) is fine.
For every batch, the predicted output dimension is of the shape bs x n x nc, where bs is the batch size, n is the number of samples per point and nc is the number of classes. The ground truth is also of the same shape as the predicted tensor.
For every batch, I'm expecting my loss function to compare n tensors across nc classes and then average it across n.
Eg: When dimensions are 32 x 8 x 5000. There are 32 batch points in a batch (for bs=32). Each batch point has 8 vector points, and each vector point has 5000 classes. For a given batch point, I wish to compute loss across all (8) vector points, compute their average and do so for the rest of the batch points (32). Final loss would be loss over all losses from each batch point.
How can I approach designing such a loss function? Any help would be deeply appreciated
P.S.: Let me know if the question is ambiguous
One way to go about this was to use a sigmoid function on the network output, which removes the implicit interdependency between class scores that a softmax function has.
As for the loss function, you can then calculate the loss based on the highest prediction for any of your target classes and ignore all other class predictions. For your example:
# your model output
y_out = torch.tensor([[0.1, 0.2, 0.95, 0.1, 0.01]], requires_grad=True)
# class labels
y = torch.tensor([[0,1,1,0,1]])
since we only care about the highest class probability, we set all other class scores to the maximum value achieved for one of the classes:
class_mask = y == 1
max_class_score = torch.max(y_out[class_mask])
y_hat = torch.where(class_mask, max_class_score, y_out)
From which we can use a regular Cross-Entropy loss function
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(y_hat, y.float())
loss.backward()
when inspecting the gradients, we see that this only updates the prediction that achieved the highest score as well ass all predictions outside of any of the classes.
>>> y_out.grad
tensor([[ 0.3326, 0.0000, -0.6653, 0.3326, 0.0000]])
Predictions for other target classes do not receive a gradient update. Note that if you have a very high ratio of possible classes, this might slow down your convergence.

Request for improvement suggestion on my CNN learning model?

I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns  per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time
You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)

poor performance keras lstm

I want to create a lstm model to classify signals.
Let's say I have 1000 files of signals. Each file contains a matrix of shape (500, 5) that means that in each file, I have 5 features (columns) and 500 rows.
0 1 2 3 4
0 5 5.3 2.3 4.2 2.2
... ... ... ... ... ...
499 2500 1.2 7.4 6.7 8.6
For each file, there is one output which is a boolean (True or False). the shape is (1,)
I created a database, data, with a shape (1000, 5, 500) and the target vector is of shape (1000, 1).
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give the matrix like this to the lstm model? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer.
The code with the lstm is like this:
input_shape=(X_train.shape[1], X_train.shape[2]) #(5,500), i.e timesteps and features
model = Sequential()
model.add(LSTM(20, return_sequences=True))
model.add(LSTM(20))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
I changed the number of cells in a LSTM layer and the number of layers but the score is basically the same (0.19). Is it normal to have such a bad score in my case? Is there a better way to go ?
Thanks
By transforming your data into (samples, 5, 500) you are giving the LSTM 5 timesteps and 500 features. From your data it seems you would like to process all 500 rows and 5 features of each column to make a prediction. The LSTM input is (samples, timesteps, features). So if your rows represent timesteps in which 5 measurements are taken, then you need to permute the last 2 dimensions and set input_shape=(500, 5) in the first LSTM layer.
Also since your output is Boolean, you get a more stable training by using activation='sigmoid' in your final dense layer and train with loss='binary_crossentropy for binary classification.

How to correctly get layer weights from Conv2D in keras?

I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here

Keras: LSTM with class weights

my question is quite closely related to this question but also goes beyond it.
I am trying to implement the following LSTM in Keras where
the number of timesteps be nb_tsteps=10
the number of input features is nb_feat=40
the number of LSTM cells at each time step is 120
the LSTM layer is followed by TimeDistributedDense layers
From the question referenced above I understand that I have to present the input data as
nb_samples, 10, 40
where I get nb_samples by rolling a window of length nb_tsteps=10 across the original timeseries of shape (5932720, 40). The code is hence
model = Sequential()
model.add(LSTM(120, input_shape=(X_train.shape[1], X_train.shape[2]),
return_sequences=True, consume_less='gpu'))
model.add(TimeDistributed(Dense(50, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(20, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(10, activation='relu')))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(3, activation='relu')))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))
Now to my question (assuming the above is correct so far):
The binary responses (0/1) are heavily imbalanced and I need to pass a class_weight dictionary like cw = {0: 1, 1: 25} to model.fit(). However I get an exception class_weight not supported for 3+ dimensional targets. This is because I present the response data as (nb_samples, 1, 1). If I reshape it into a 2D array (nb_samples, 1) I get the exception Error when checking model target: expected timedistributed_5 to have 3 dimensions, but got array with shape (5932720, 1).
Thanks a lot for any help!
I think you should use sample_weight with sample_weight_mode='temporal'.
From the Keras docs:
sample_weight: Numpy array of weights for the training samples, used
for scaling the loss function (during training only). You can either
pass a flat (1D) Numpy array with the same length as the input samples
(1:1 mapping between weights and samples), or in the case of temporal
data, you can pass a 2D array with shape (samples, sequence_length),
to apply a different weight to every timestep of every sample. In this
case you should make sure to specify sample_weight_mode="temporal" in
compile().
In your case you would need to supply a 2D array with the same shape as your labels.
If this is still an issue.. I think the TimeDistributed Layer expects and returns a 3D array (kind of similar to if you have return_sequences=True in the regular LSTM layer). Try adding a Flatten() layer or another LSTM layer at the end before the prediction layer.
d = TimeDistributed(Dense(10))(input_from_previous_layer)
lstm_out = Bidirectional(LSTM(10))(d)
output = Dense(1, activation='sigmoid')(lstm_out)
Using temporal is a workaround. Check out this stack. The issue is also documented on github.

Resources