I'm trying to prepare a model that will predict the first two numbers from a given array of numbers. So, the input dataset is like this -
[1 2 3 5]
[4 8 5 9]
[10 2 3 15]
Output will be -
[1 2]
[4 8]
[10 2]
So, the architectures of RNN are like below, (Taken from here)
Then, the basic architecture of I'm trying to achieve should be something close to this -
So, it should be a Many-To-Many network. (Resembles the fourth image)
Question - So, how can I create this type of model with Keras?
My Findings -
I tried something like this -
n_samples = 10000
input = np.random.randint(5,10, (n_samples,5))
output = input[...,0:2]
rinp = input.reshape(n_samples,1,5)
model = Sequential()
model.add(LSTM(10, input_shape=(1,5)))
model.add(Dense(2))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(rinp, output, epochs=1000, batch_size=500, verbose=1)
But as you can see, this is not even close. This is like an MLP. It does not utilize any time steps. Because, the input shape is - (n_samples,1,5). So, there is only one time step.
So, my implementation is wrong.
I've seen some examples of One-to-One, Many-to-One and Many-to-Many examples from here.
In Many-to-Many example, the author used the following code snippet.
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
As you can see from the X and y values, the described model is like the one below -
Which is not the one I'm trying to achieve.
Any example regarding the architecture I'm trying to implement would be greatly helpful.
It looks like you are trying to build a Sequence-to-Sequence (seq2seq) model based on the drawing. There is a very nice tutorial online to get you started. Instead of predicting sentences, you can just predict fixed length tokens of length 2. This architecture and variants are often used for machine translation. Based on your data, I'm guessing you are trying to experiment with the problem of long-term dependencies in recurrent networks; otherwise it wouldn't make sense to use seq2seq for any practical purposes.
Related
I just started learning about neural networks, and I found that tensorflow (keras) seems to be reasonable tool for this purpose. I'd like to know how to start configuring the neural network for the purpose of optimizing photonic strucutres. Basically, I have a lot of numerical results that relates a certain geometry of the structure and a resulting spectra, e.g. every geometry is defined by 3 numbers: radius, gratind period and grating position, and every spectra is an array of 200 numbers in range (0:1). I have ~1000 of such results. Now I'd like to define the neural network using all spectra as inputs and geometry parameters that I used as outputs, so the input should be a matrix of (1000,200) size and output should be a matrix of (1000,3) size. Then, I hope that a learned network can take as an input a desired spectra to find or predict geometry of corresponding photonic strucutre. Is it doable with tensorflow/keras?
My first guess would be sth like this:
inputs = tf.keras.layers.Input(shape=(1000,200, ),batch_size=1)
x = tf.keras.layers.Dense(300, "relu")(inputs)
x = tf.keras.layers.Dense(300, "relu")(x)
outputs = tf.keras.layers.Dense(3, "relu")(x)
model = tf.keras.models.Model(inputs=inputs, outputs=outputs, name="my_model_name")
however, to be honest, I'm confused with all answers I already found about using multiple inputs of many dimmensions and multiple outputs in the network, so I'd appreciate any help.
#################
I think I found a code which works, however I'm not sure if it is optimum version. Could anyone verify this? I use as input data with shape (600, 201) and (600, 3) to train the network
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(200, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(n_outputs, kernel_initializer='he_uniform'))
model.compile(loss='mae', optimizer='adam')
return model
model = get_model(n_inputs, n_outputs)
history=model.fit(yyy,xxx, verbose=0 ,epochs=100)
yhat = model.predict(yyy2)
print('Predicted: %s' % yhat)
The above may sound ideal, but I'm trying to predict a step in front - i.e. with a look_back of 1. My code is as follows:
def create_scaled_datasets(data, scaler_transform, train_perc = 0.9):
# Set training size
train_size = int(len(data)*train_perc)
# Reshape for scaler transform
data = data.reshape((-1, 1))
# Scale data to range (-1,1)
data_scaled = scaler_transform.fit_transform(data)
# Reshape again
data_scaled = data_scaled.reshape((-1, 1))
# Split into train and test data keeping time order
train, test = data_scaled[0:train_size + 1, :], data_scaled[train_size:len(data), :]
return train, test
# Instantiate scaler transform
scaler = MinMaxScaler(feature_range=(0, 1))
model.add(LSTM(5, input_shape=(1, 1), activation='tanh', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(12, input_shape=(1, 1), activation='tanh', return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(2, input_shape=(1, 1), activation='tanh', return_sequences=False))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# Create train/test data sets
train, test = create_scaled_datasets(data, scaler)
trainY = []
for i in range(len(train) - 1):
trainY = np.append(trainY, train[i + 1])
train = np.reshape(train, (train.shape[0], 1, train.shape[1]))
plotting_test = test
test = np.reshape(test, (test.shape[0], 1, test.shape[1]))
model.fit(train[:-1], trainY, epochs=150, verbose=0)
testPredict = model.predict(test)
plt.plot(testPredict, 'g')
plt.plot(plotting_test, 'r')
plt.show()
with output plot of:
In essence, what I want to achieve is for the model to predict the next value, and I attempt to do this by training on the actual values as the features, and the labels being the actual values shifted along one (look_back of 1). Then I predict on the test data. As you can see from the plot, the model does a pretty good job, except it doesn't seem to be predicting the future, but instead seems to be predicting the present... I would expect the plot to look similar, except the green line (the predictions) to be shifted one point to the left. I have tried increasing the look_back value, but it seems to always do the same thing, which makes me think I'm training the model wrong, or attempting to predict incorrectly. If I am reading this wrong and the model is indeed doing what I want but I'm interpreting wrong (also very possible) how do I then predict further into the future?
To add on #MSalters' comment, and somewhat basing on this, it is possible, although not guaranteed that you could "help" your model learn something better than the identity, if you force it to learn not the actual value of the next step, but instead, make it learn the difference from the current step to the next.
To take this one step further, you could also keep an exponential moving average and learn the difference from that, somewhat like was done here.
In short, it makes statistical sense to predict the same value, as it is a low-risk guess. Maybe learning a difference won't converge to zero.
Other things I noticed:
Dropout - no need to use any normalization before you were able to over-fit. It just complicates debugging.
Just one step into the past - it is likely you are losing a lot of required information, thus in fact forcing your net to have no idea what to do, and thus guess the same value. If you even gave it a single value more into the past, it could have a nice approximation of the derivative. That sounds important (only you know)
I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time
You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)
I have a sequence and I would like to do the simplest LSTM possible to predict the rest of the sequence.
Meaning I want to start by using only the previous step to predict the next one and then add more steps.
I want to use the predicted values as inputs also.
So I believe what I want is to achieve many to many as mentioned in the answers there Understanding Keras LSTMs .
I have read other questions on the topic on stackoverflow but still didn't manage to make it work. In my code, I'm using the tutorial here https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ and the function create_dataset to create two arrays with only a shift of one step.
Here is my code and the error I got.
"Here I'm scaling my data as advised"
scaler = MinMaxScaler(feature_range=(0, 1))
Rot = scaler.fit_transform(Rot)
"I'm creating the model using batch_size=1 but I'm not sure why this is necessary"
batch_size = 1
model = Sequential()
model.add(LSTM(1,batch_input_shape=(batch_size,1,1),stateful=True,return_sequences=True,input_shape=(None,1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
"I want to use only the previous value for now"
look_back = 1
"as len(Rot) = 41000 I'm taking 36000 for training"
train_size = 36000
X,Y = create_dataset(Rot[:train_size,:],look_back)
X = numpy.reshape(X,(X.shape[0], X.shape[1], 1))
Y = numpy.reshape(Y,(X.shape[0], X.shape[1], 1))
And now I train my network as advised by #Daniel Möller.
epochs = 10
for epoch in range(epochs):
model.reset_states()
model.train_on_batch(X,Y)
" And I get this error "
" PartialTensorShape: Incompatible shapes during merge: [35998,1] vs. [1,1]
[[{{node lstm_11/TensorArrayStack/TensorArrayGatherV3}}]]."
Do you know why I have such an error as it seems I did everything as in the topic mentioned above ?
In this LSTM network batch_size=1, because it is stateful. When stateful=True, the train_set size and test_set size when divided by batch_size should have a modulo of zero.
batch_input_shape=(batch_size,1,1) is already defined, then why again,input_shape=(None,1)
When return_sequences=True, another LSTM is following the existing LSTM layer. But here it is not.
I would like to implement an LSTM in Keras for streaming time-series prediction -- i.e., running online, getting one data point at a time. This is explained well here, but as one would assume, the training time for an online LSTM can be prohibitively slow. I would like to train my network on mini-batches, and test (run prediction) online. What is the best way to do this in Keras?
For example, a mini-batch could be a sequence of 1000 data values ([33, 34, 42, 33, 32, 33, 36, ... 24, 23]) that occur at consecutive time steps. To train the network I've specified an array X of shape (900, 100, 1), where there are 900 sequences of length 100, and an array y of shape (900, 1). E.g.,
X[0] = [[33], [34], [42], [33], ...]]
X[1] = [[34], [42], [33], [32], ...]]
...
X[999] = [..., [24]]
y[999] = [23]
So for each sequence X[i], there is a corresponding y[i] that represents the next value in the time-series -- what we want to predict.
In test I want to predict the next data values 1000 to 1999. I do this by feeding an array of shape (1, 100, 1) for each step from 1000 to 1999, where the model tries to predict the value at the next step.
Is this the recommended approach and setup for my problem? Enabling statefulness may be the way to go for a purely online implementation, but in Keras this requires a consistent batch_input_shape in training and testing, which would not work for my intent of training on mini-batches and then testing online. Or is there a way I can do this?
UPDATE: Trying to implement the network as #nemo recommended
I ran my own dataset on an example network from a blog post "Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras", and then tried implementing the prediction phase as a stateful network.
The model building and training is the same for both:
# Create and fit the LSTM network
numberOfEpochs = 10
look_back = 30
model = Sequential()
model.add(LSTM(4, input_dim=1, input_length=look_back))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, nb_epoch=numberOfEpochs, batch_size=1, verbose=2)
# trainX.shape = (6883, 30, 1)
# trainY.shape = (6883,)
# testX.shape = (3375, 30, 1)
# testY.shape = (3375,)
Batch prediction is done with:
trainPredict = model.predict(trainX, batch_size=batch_size)
testPredict = model.predict(testX, batch_size=batch_size)
To try a stateful prediction phase, I ran the same model setup and training as before, but then the following:
w = model.get_weights()
batch_size = 1
model = Sequential()
model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
trainPredictions, testPredictions = [], []
for trainSample in trainX:
trainPredictions.append(model.predict(trainSample.reshape((1,look_back,1)), batch_size=batch_size))
trainPredict = numpy.concatenate(trainPredictions).ravel()
for testSample in testX:
testPredictions.append(model.predict(testSample.reshape((1,look_back,1)), batch_size=batch_size))
testPredict = numpy.concatenate(testPredictions).ravel()
To inspect the results, the plots below show the actual (normalized) data in blue, the predictions on the training set in green, and the predictions on the test set in red.
The first figure is from using batch prediction, and the second from stateful. Any ideas what I'm doing incorrectly?
If I understand you correctly you are asking if you can enable statefulness after training. This should be possible, yes. For example:
net = Dense(1)(SimpleRNN(stateful=False)(input))
model = Model(input=input, output=net)
model.fit(...)
w = model.get_weights()
net = Dense(1)(SimpleRNN(stateful=True)(input))
model = Model(input=input, output=net)
model.set_weights(w)
After that you can predict in a stateful way.