I am using keras to predict time series with LSTM and I realize that we can predict using datas that has not the same timestep than the ones we used to train. For example:
import numpy as np
import keras.optimizers
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout,TimeDistributed
from keras.layers import LSTM
Xtrain = np.random.rand(10,3,2) #Here timestep is 3
Ytrain = np.random.rand(10,1)
model = Sequential()
model.add(LSTM(input_dim = Xtrain.shape[2],output_dim =10,return_sequences = False))
model.add(Activation("sigmoid"))
model.add(Dense(1))
KerasOptimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss="mse", optimizer=KerasOptimizer)
model.fit(Xtrain,Ytrain,nb_epoch = 1,batch_size = 1)
XBis = np.random.rand(10,4,2) #here timestep is 4
XTer = np.random.rand(10,2,2) #here timestep is 2
model.predict(Xtrain)
model.predict(XBis)
model.predict(XBis)
So my question is: why is that? If we train a model with n timesteps and we use data with n+1 timestep for prediction maybe the model uses only the first n timesteps. But if we try to predict with n-1 timestep, how is it working?
If you look at how the LSTM layer is defined in your example, you will note that you are not telling specifically what is the size of the time dimension, only the number of features present at each time point (input_dim) and the number of desired output features (output_dim). Also, since you have return_sequences=False it will only output the result at the last time point, so the tensor yielded by the layer will always have the shape [batch size] x [output dim] (in this case, 10 x 10), discarding the time dimension.
So the size of the time dimension does not really affect to the "applicability" of the model; the layer will just go through all the available time steps and give you the last output.
Of course, that does not mean that the model will necessarily work well for any input. If all the examples in your training data have a time dimension of size N but the you try to predict using N+1, N-1, 100 * N or whatever else, you may not have reliable results.
Related
I am trying to decrease the execution time of the Keras sequential model that runs in a loop several times.
My training dataset shape: (1,9526,32736,1) (1,ntimes,ngrid,1)
and test data shape is (1,1059,32736,1)
The test data time dimension is not fixed (variable) but the ngrid is fixed.
I created a dummy dimension in the end so that when I call the training data in the for loop the dimension shape will be (1,ntimes,1)
This is the description of what model does:
First, the model does the convolution along the time axis for a single grid point.
Subtracts the output of the convolution from the input data.
Does the convolution (along the time axis) of the output from the second layer.
The above steps are repeated 32736 ngrid times.
Here is the code:
import tensorflow.keras as keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,Conv1D,subtract
import tensorflow as tf
print(tf.__version__)
2.4.1
import tensorflow.keras as keras
print(keras.__version__)
2.4.0
no_epochs = 1000
validation_split = 0
verbosity = 0
pred = np.ones(xtest.shape[1:3])
for i in tqdm(range(ngrid)):
keras.backend.clear_session()
inputs = Input(shape=(None,1),batch_size=1,name='input_layer')
smoth1 = Conv1D(1, kernel_size=90,padding='same',activation='linear')(inputs)
diff = subtract([inputs, smoth1])
smoth2 = Conv1D(1, kernel_size=30,padding='same',activation='linear')(diff)
model = Model(inputs=inputs, outputs=smoth2)
model.compile(optimizer='adam', loss='mse')
model.fit(xtrain[:,:,i,:],ytrain[:,:,i,:],epochs=no_epochs,validation_split=validation_split,verbose=verbosity)
pred[:,i] = model.predict(xtest[:,:,i,:]).squeeze()
del model
I am looking for other alternatives that can speed up my code. Any suggestions are greatly appreciated.
I am trying to train a model that has more than one output and as a result, also has more than one loss function attached to it when I compile it.
I haven't done something similar in the past (not from scratch at least).
Here's some code I am using to figure out how this works.
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
batch_size = 50
input_size = 10
i = Input(shape=(input_size,))
x = Dense(100)(i)
x_1 = Dense(output_size)(x)
x_2 = Dense(output_size)(x)
model = Model(i, [x_1, x_2])
model.compile(optimizer = 'adam', loss = ["mse", "mse"])
# Data creation
x = np.random.random_sample([batch_size, input_size]).astype('float32')
y = np.random.random_sample([batch_size, output_size]).astype('float32')
loss = model.train_on_batch(x, [y,y])
print(loss) # sample output [0.8311912, 0.3519104, 0.47928077]
I would expect the variable loss to have two entries (one for each loss function), however, I get back three. I thought maybe one of them is the weighted average but that does not look to be the case.
Could anyone explain how passing in multiple loss functions works, because obviously, I am misunderstanding something.
I believe the three outputs are the sum of all the losses, followed by the individual losses on each output.
For example, if you look at the sample output you've printed there:
0.3519104 + 0.47928077 = 0.83119117 ≈ 0.8311912
Your assumption that there should be two losses in incorrect. You have a model with two outputs, and you specified one loss for each output, but the model has to be trained on a single loss, so Keras trains the model on a new loss that is the sum of the per-output losses.
You can control how these losses are mixed using the loss_weights parameter in model.compile. I think by default it takes weights values equal to 1.0.
So in the end what train_on_batch returns is the loss, output one mse, and output two mse. That is why you get three values.
I have a time series data set for 38,000 distinct patients that comprises their 48 hours of physiological data with 30 features, so every patient has 48rows(for every hour) and a binary outcome(0/1) at the end of 48th hour only, the total training set is (38,000*48 = 1,824,000) rows .
To my understanding this is a Many-to-one LSTM binary classification, so should my input shape be (38,000,48,30) (sample_size, time_steps, features) and should the return_sequence be set to False to return output of the last hidden neuron only?
Can somebody review my understanding on this?
Thanks.
Yes, mostly you are on the right track. Refer the code below for a better understanding of this.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Bidirectional
from keras.metrics import binary_crossentropy
# vocab size
total_features = 30
no_of_pateints = 38,000
time_steps = 48
model = Sequential()
# you can also use Bidirectional layer to speed up the learning and reduce
# training time and here you can keep return_sequence as true
# model.add(
Bidirectional(LSTM(
units=100,
input_shape=(no_of_patients, time_steps, total_features),
return_sequences=True
)))
# return_sequence should be False if there is only one LSTM layer. Otherwise in case of multiple layers,
the last layers should have return_sequence as False
model.add(LSTM(
units=100,
input_shape=(no_of_patients, time_steps, total_features),
return_sequences=False
))
model.add(Dense(2, activation='softmax'))
model.compile(
loss=binary_crossentropy,
optimizer='rmsprop',
metrics=['accuracy']
)
Let me know if you have any confusion in the above code or if you need more explanation
Yes, you're mostly right:
shape of inputs = (patients, 48, 30)
shape of targets = (patients, 1)
You should use return_sequences=False in your last LSTM layer. (If you have more recurrent layers before the last LSTM, keep return_sequences=True in them)
I want to create a word embedding pretraining network which adds something on top of word2vec CBOW. Therefore, I'm trying to implement word2vec CBOW first. Since I'm very new to keras, I'm unable to figure out how to implement CBOW in it.
Initialization:
I have calculated the vocabulary and have the mapping of word to integers.
Input to the (yet to be implemented) network:
A list of 2*k + 1 integers (representing the central word and 2*k words in context)
Network Specification
A shared Embedding layer should take this list of integers and give their corresponding vector outputs. Further a mean of 2*k context vector is to be taken (I believe this can be done using add_node(layer, name, inputs=[2*k vectors], merge_mode='ave')).
It will be very helpful if anyone can share a small code-snippet of this.
P.S.: I was looking at word2veckeras, but couldn't follow its code because it also uses a gensim.
UPDATE 1:
I want to share the embedding layer in the network. The embedding layer should be able to take context words (2*k) and the current word as well. I can do this by taking all 2*k + 1 word indices in the input and write a custom lambda function which will do the needful. But, after that I also want to add negative sampling network for which I'll have to take embedding of more words and dot product with the context vector. Can someone provide with an example where Embedding layer is a shared node in the Graph() network
Graph() has been deprecated from keras
Any arbitrary network can be created by using keras functional API.
Following is the demo code which created a word2vec cbow model with negative sampling tested on randomized inputs
from keras import backend as K
import numpy as np
from keras.utils.np_utils import accuracy
from keras.models import Sequential, Model
from keras.layers import Input, Lambda, Dense, merge
from keras.layers.embeddings import Embedding
k = 3 # context windows size
context_size = 2*k
neg = 5 # number of negative samples
# generate weight matrix for embeddings
embedding = []
for i in range(10):
embedding.append(np.full(100, i))
embedding = np.array(embedding)
print embedding
# Creating CBOW model
word_index = Input(shape=(1,))
context = Input(shape=(context_size,))
negative_samples = Input(shape=(neg,))
shared_embedding_layer = Embedding(input_dim=10, output_dim=100, weights=[embedding])
word_embedding = shared_embedding_layer(word_index)
context_embeddings = shared_embedding_layer(context)
negative_words_embedding = shared_embedding_layer(negative_samples)
cbow = Lambda(lambda x: K.mean(x, axis=1), output_shape=(100,))(context_embeddings)
word_context_product = merge([word_embedding, cbow], mode='dot')
negative_context_product = merge([negative_words_embedding, cbow], mode='dot', concat_axis=-1)
model = Model(input=[word_index, context, negative_samples], output=[word_context_product, negative_context_product])
model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])
input_context = np.random.randint(10, size=(1, context_size))
input_word = np.random.randint(10, size=(1,))
input_negative = np.random.randint(10, size=(1, neg))
print "word, context, negative samples"
print input_word.shape, input_word
print input_context.shape, input_context
print input_negative.shape, input_negative
output_dot_product, output_negative_product = model.predict([input_word, input_context, input_negative])
print "word cbow dot product"
print output_dot_product.shape, output_dot_product
print "cbow negative dot product"
print output_negative_product.shape, output_negative_product
Hope it helps!
UPDATE 1:
I've completed the code and uploaded it here
You could try something like this. Here I've initialized the embedding matrix to a fixed value. For an input array of shape (1, 6) you'll get the output of shape (1, 100) where the 100 is the average of the 6 input embedding.
model = Sequential()
k = 3 # context windows size
context_size = 2*k
# generate weight matrix for embeddings
embedding = []
for i in range(10):
embedding.append(np.full(100, i))
embedding = np.array(embedding)
print embedding
model.add(Embedding(input_dim=10, output_dim=100, input_length=context_size, weights=[embedding]))
model.add(Lambda(lambda x: K.mean(x, axis=1), output_shape=(100,)))
model.compile('rmsprop', 'mse')
input_array = np.random.randint(10, size=(1, context_size))
print input_array.shape
output_array = model.predict(input_array)
print output_array.shape
print output_array[0]
I want to use an LSTM neural Network with keras to forecast groups of time series and I am having troubles in making the model match what I want. The dimensions of my data are:
input tensor: (data length, number of series to train, time steps to look back)
output tensor: (data length, number of series to forecast, time steps to look ahead)
Note: I want to keep the dimensions exactly like that, no
transposition.
A dummy data code that reproduces the problem is:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed, LSTM
epoch_number = 100
batch_size = 20
input_dim = 4
output_dim = 3
look_back = 24
look_ahead = 24
n = 100
trainX = np.random.rand(n, input_dim, look_back)
trainY = np.random.rand(n, output_dim, look_ahead)
print('test X:', trainX.shape)
print('test Y:', trainY.shape)
model = Sequential()
# Add the first LSTM layer (The intermediate layers need to pass the sequences to the next layer)
model.add(LSTM(10, batch_input_shape=(None, input_dim, look_back), return_sequences=True))
# add the first LSTM layer (the dimensions are only needed in the first layer)
model.add(LSTM(10, return_sequences=True))
# the TimeDistributed object allows a 3D output
model.add(TimeDistributed(Dense(look_ahead)))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(trainX, trainY, nb_epoch=epoch_number, batch_size=batch_size, verbose=1)
This trows:
Exception: Error when checking model target: expected
timedistributed_1 to have shape (None, 4, 24) but got array with shape
(100, 3, 24)
The problem seems to be when defining the TimeDistributed layer.
How do I define the TimeDistributed layer so that it compiles and trains?
The error message is a bit misleading in your case. Your output node of the network is called timedistributed_1 because that's the last node in your sequential model. What the error message is trying to tell you is that the output of this node does not match the target your model is fitting to, i.e. your labels trainY.
Your trainY has a shape of (n, output_dim, look_ahead), so (100, 3, 24) but the network is producing an output shape of (batch_size, input_dim, look_ahead). The problem in this case is that output_dim != input_dim. If your time dimension changes you may need padding or a network node that removes said timestep.
I think the problem is that you expect output_dim (!= input_dim) at the output of TimeDistributed, while it's not possible. This dimension is what it considers as the time dimension: it is preserved.
The input should be at least 3D, and the dimension of index one will
be considered to be the temporal dimension.
The purpose of TimeDistributed is to apply the same layer to each time step. You can only end up with the same number of time steps as you started with.
If you really need to bring down this dimension from 4 to 3, I think you will need to either add another layer at the end, or use something different from TimeDistributed.
PS: one hint towards finding this issue was that output_dim is never used when creating the model, it only appears in the validation data. While it's only a code smell (there might not be anything wrong with this observation), it's something worth checking.