Bi-LSTM with Keras : dimensions must be equal but are 7 and 300 - python-3.x

I am creating for the first time a bilstm with keras but I am having difficulties. So that you understand, here are the steps I have done:
I created an embedding matrix with Glove for my x ;
def create_embeddings(fichier,dictionnaire,dictionnaire_tokens):
with open(fichier) as file:
line = file.readline()
max_words = max(dictionnaire_tokens.values())+1 #1032
max_size_dimensions = 300
emb_matrix = np.zeros((max_words,max_size_dimensions))
for item,count in dictionnaire_tokens.items():
try:
vecteur = dictionnaire[item]
except:
pass
if vecteur is not None:
emb_matrix[count]= vecteur
return emb_matrix
I did some one hot encoding with my y's;
def one_hot_encoding(file):
with open(file) as file:
line = file.readline()
liste = []
while line:
tag = line.split(" ")[1]
tag = [tag]
line = file.readline()
liste.append(tag)
one_hot = MultiLabelBinarizer()
array = one_hot.fit_transform(liste)
return array
I compiled my model with keras
from tensorflow.keras.layers import Bidirectional
model = Sequential()
embedding_layer = Embedding(input_dim=1031 + 1,
output_dim=300,
weights=[embedding_matrix],
trainable=False)
model.add(embedding_layer)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=True))
model.add(bilstm_layer)
model.add(Dense(300, activation="relu"))
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()
Input of my embedding layer (embedding matrix) :
[[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
[ 0. 0. 0. ... 0. 0. 0. ]
...
[-0.068577 -0.71314 0.3898 ... -0.077923 -1.0469 0.56874 ]
[ 0.32461 0.50463 0.72544 ... 0.17634 -0.28961 0.29007 ]
[-0.33771 -0.24912 -0.032685 ... -0.033254 -0.45513 -0.13319 ]]
I train my model. However when I want to train it, I get the following message: ValueError: Dimensions must be equal, but are 7 and 300 for '{{node binary_crossentropy/mul}} = Mul[T=DT_FLOAT](binary_crossentropy/Cast, binary_crossentropy/Log)' with input shapes: [?,7], [?,300,300].
My embedding matrix was made with glove 300d so it has 300 dimensions. While my labels, I have only 7 labels. So I have to make my x and y have the same dimensions but how? Thank you!!!

keras.backend.clear_session()
from tensorflow.keras.layers import Bidirectional
model = Sequential()
_input = keras.layers.Input(shape=(300,1))
model.add(_input)
bilstm_layer = Bidirectional(LSTM(units=300, return_sequences=False))
model.add(bilstm_layer)
model.add(Dense(7, activation="relu")) #here 7 is the number of classes you have and None is the batch_size
#crf_layer = CRF(units=len(self.tags), sparse_target=True)
#model.add(crf_layer)
model.compile(optimizer="adam", loss='binary_crossentropy', metrics='acc')
model.summary()

Related

How to setup a base model in inference mode?

Keras documentation about fine-tuning states that it is important to "keep the BatchNormalization layers in inference mode by passing training=False when calling the base model.". (What is interesting, that every non-official example that I've found about the topic ignores this setting.)
Documentation follows up with example:
from tensorflow import keras
from keras.applications.xception import Xception
base_model = keras.applications.Xception(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(150, 150, 3),
include_top=False) # Do not include the ImageNet classifier at the top.
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(x)
# We make sure that the base_model is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs , outputs)
The thing is that the example is adding preprocessing to the base model and my model(EfficientNetB3) has already preprocessing included and I don't know how to set my base_model with `training=False`` without prepending it with additional layer:
base_model = EfficientNetB3(weights='imagenet', include_top=False, input_shape=input_shape)
base_model.trainable=False
model = Sequential()
model.add(base_model) # How to set base_model training=False?
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.2))
model.add(Dense(10, activation="softmax", name="classifier"))
How to prove that training=False or training=True has an effect:
#Frightera explained to me how to "lock" the model's state and I wanted to prove to myself that the lock happens by checking BatchNormalization non-trainable variables. My understating is that if I call model with training=True then it should update the variables. However, this is not the case, or am I missing something?
import tensorflow as tf
from tensorflow import keras
from keras.applications.efficientnet import EfficientNetB3
import numpy as np
class WrappedEffNet(keras.layers.Layer):
def __init__(self, **kwargs):
super(WrappedEffNet, self).__init__(**kwargs)
self.model = EfficientNetB3(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
self.model.trainable=False
def call(self, x, training=False):
return self.model(x, training=training) # Modified to pass also True.
base_model_wrapped = WrappedEffNet()
random_vector = tf.random.uniform((1, 224, 224, 3))
o1 = base_model_wrapped(random_vector)
o2 = base_model_wrapped(random_vector, training = False)
# Getting all non-trainable variable values from all BatchNormalization layers.
array_a = np.array([])
for layer in base_model_wrapped.model.layers:
if hasattr(layer, 'moving_mean'):
v = layer.moving_mean.numpy()
np.concatenate([array_a, v])
v = layer.moving_variance.numpy()
np.concatenate([array_a, v])
o3 = base_model_wrapped(random_vector, training = True) # Changing to True, shouldn't this update BatchNormalization non-trainable variables?
array_b = np.array([])
for layer in base_model_wrapped.model.layers:
if hasattr(layer, 'moving_mean'):
v = layer.moving_mean.numpy()
np.concatenate([array_b, v])
v = layer.moving_variance.numpy()
np.concatenate([array_b, v])
print(np.allclose(array_a, array_b)) # Shouldn't this be False?
It is not possible to invoke the call method of the base model in Sequential model as in Functional. However, you can think the model as if it is a custom layer:
class WrappedEffNet(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(WrappedEffNet, self).__init__(**kwargs)
self.model = keras.applications.EfficientNetB3(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
self.model.trainable=False
def call(self, x, training):
return self.model(x, training=False)
Sanity check:
base_model_wrapped = WrappedEffNet()
random_vector = tf.random.uniform((1, 224, 224, 3))
o1 = base_model_wrapped(random_vector)
o2 = base_model_wrapped(random_vector, training = False)
o3 = base_model_wrapped(random_vector, training = True)
np.allclose(o1, o2), np.allclose(o1, o3), np.allclose(o2, o3)
# (True, True, True)
It is inference mode regardless of the value of training.
Model summary is the same as Sequential:
Layer (type) Output Shape Param #
=================================================================
wrapped_eff_net (WrappedEff (1, 7, 7, 1536) 10783535
Net)
global_average_pooling2d (G (1, 1536) 0
lobalAveragePooling2D)
dropout (Dropout) (1, 1536) 0
classifier (Dense) (1, 10) 15370
=================================================================
Total params: 10,798,905
Trainable params: 15,370
Non-trainable params: 10,783,535
_________________________________________________________________
Edit: In order to see difference of BatchNormalization:
import tensorflow as tf
import numpy as np
x = np.random.randn(1, 2) * 20 + 0.1
bn = tf.keras.layers.BatchNormalization()
input_layer = tf.keras.layers.Input((x.shape[-1], ))
output = bn(input_layer )
model = tf.keras.Model(inputs=input_layer , outputs=output)
model.trainable = False:
model.trainable = False
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = True -->', model(x, training = True).numpy())
print('training = False -->', model(x, training = False).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[ 2.5019286 12.437845 ]]
training = False --> [[ 2.5019286 12.437845 ]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[ 2.5019286 12.437845 ]]
training = False --> [[ 2.5019286 12.437845 ]]
model.trainable = True, training = True:
model.trainable = True
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = True -->', model(x, training = True).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0. 0.]
training = True --> [[0. 0.]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.02503179 0.12444062]
training = True --> [[0. 0.]]
model.trainable = True, training = False:
model.trainable = True
for i in range(2):
print('Input:', x)
print('Moving mean:', model.layers[1].moving_mean.numpy())
print('training = False -->', model(x, training = False).numpy())
print()
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.04981326 0.24763682]
training = False --> [[ 2.476884 12.313342]]
Input: [[ 2.50317905 12.44406219]]
Moving mean: [0.04981326 0.24763682]
training = False --> [[ 2.476884 12.313342]]

Issue with Prediction function using LSTM states in Autoencoder

Trying to put embedding layer in the LSTM based autoencoder keras model (https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/) where embedding layer expects 2D input to decoder input and 3D output.
Facing issue with dimensions of decoder input in prediction function.
Error: Error when checking input: expected input_4 to have 2 dimensions, but got array with shape (1, 5, 5)
Here 5 is the original input dimension which is being compressed to fixed vector size of 2 using embedding.
source= np.array([1, 2,3])
encoder=source[None,:]
"Source is:" [[1 2 3]] ## 2D input
decoder_input=source[:-1]
shifted_target=np.concatenate(([0], decoder_input))
shifted_target_input=shifted_target[None,:]
"Shifted Target is:" [[0 1 2]] ##2D Decoder Input
target_output=encoder
target_output=to_categorical(encoder,num_classes=5)
Target is: [[[0. 1. 0. 0. 0.] ##3D Decoder Output, cardinality=5
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]]]
#########################################
LSTM Embedding Code:
def define_models(embedding,vocab_size,n_units):
Embedding_Layer = Embedding(output_dim=embedding, input_dim=vocab_size, input_length=None, name="Embedding")
encoder_inputs = Input(shape=(None,), name="Encoder_input")
encoder =LSTM(n_units, return_state=True, name='Encoder_lstm')
embedding_encode = Embedding_Layer(encoder_inputs)
print(embedding_encode)
encoder_outputs, state_h, state_c = encoder(embedding_encode)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None,), name="Decoder_input")
decoder_lstm =LSTM(n_units, return_sequences=True, return_state=True, name="Decoder_lstm")
embedding_decode = Embedding_Layer(decoder_inputs)
decoder_outputs, _, _ = decoder_lstm(embedding_decode, initial_state=encoder_states)
decoder_dense = (Dense(vocab_size, activation='softmax', name="Dense_layer"))
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
##Inference model used for prediction
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(n_units,), name="H_state_input")
decoder_state_input_c = Input(shape=(n_units,), name="C_state_input")
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(embedding_decode, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
return model, encoder_model, decoder_model
where Encoder input is 2D, Decoder input is 2D and Decoder output is 3D
# define model
train, infenc, infdec = define_models(2,5,100)
train.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
# train model
train.fit([encoder,shifted_target_input], target_output, epochs=2)
#####################################################
##Prediction function
(https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/):
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)

Keras model fit ValueError expected input_1 to be one number larger than the size of my array

I'm trying to create an Autoencoder neural network for finding outliers using Keras TensorFlow, my data is a list of texts with one word per line, it is the following: https://pastebin.com/hEvm6qWg it has 139 lines.
When I fit my model with my data, I get the error:
ValueError: Error when checking input: expected input_1 to have shape (139,) but got array with shape (140,)
But I can't tell why it recognizes it as 140 shape array, my entire code is as follows:
from keras import Input, Model
from keras.layers import Dense
from keras.preprocessing.text import Tokenizer
with open('drawables.txt', 'r') as arquivo:
dados = arquivo.read().splitlines()
tokenizer = Tokenizer(filters='')
tokenizer.fit_on_texts(dados)
x_dados = tokenizer.texts_to_matrix(dados, mode="freq")
tamanho = len(tokenizer.word_index)
x = Input(shape=(tamanho,))
# Encoder
hidden_1 = Dense(tamanho, activation='relu')(x)
h = Dense(tamanho, activation='relu')(hidden_1)
# Decoder
hidden_2 = Dense(tamanho, activation='relu')(h)
r = Dense(tamanho, activation='sigmoid')(hidden_2)
autoencoder = Model(input=x, output=r)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_dados, epochs=5, shuffle=False)
I am utterly lost, I can't even tell if my approach to an autoencoder network is the correct one, what am I doing wrong?
word_index in Tokenizer start from 1 not from zero
Example:
tokenizer = Tokenizer(filters='')
tokenizer.fit_on_texts(["this a cat", "this is a dog"])
print (tokenizer.word_index)
Output:
{'this': 1, 'a': 2, 'cat': 3, 'is': 4, 'dog': 5}
Index is starting from 1 not from zero. So when we create term frequency matrix using these indices
x_dados = tokenizer.texts_to_matrix(["this a cat", "this is a dog"], mode="freq")
The shape of x_dados will be 2x6 because numpy arrays are indexed from 0.
so no:of columns in x_dados = 1+len(tokenizer.word_index)
So to fix your code change
tamanho = len(tokenizer.word_index)
to
tamanho = len(tokenizer.word_index) + 1
Working sample:
dados = ["this is a cat", "that is a dog and a cat"]*100
tokenizer = Tokenizer(filters='')
tokenizer.fit_on_texts(dados)
x_dados = tokenizer.texts_to_matrix(dados, mode="freq")
tamanho = len(tokenizer.word_index)+1
x = Input(shape=(tamanho,))
# Encoder
hidden_1 = Dense(tamanho, activation='relu')(x)
h = Dense(tamanho, activation='relu')(hidden_1)
# Decoder
hidden_2 = Dense(tamanho, activation='relu')(h)
r = Dense(tamanho, activation='sigmoid')(hidden_2)
autoencoder = Model(input=x, output=r)
print (autoencoder.summary())
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(x_dados, x_dados, epochs=5, shuffle=False)

Multiple propositions for multiple class prediction

I am working on a word prediction problem. I have examples of career path, and I would like to be able to predict a next person's job using their last 2 jobs. I have built a LSTM model to perform it
I have a problem when intenting to get multiple results from keras model.predict_classes function. It only returns 1 result. I would like to get multiple results, ordered by their probability.
Here is the code :
from numpy import array
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Embedding
# generate a sequence from a language model
def generate_seq(model, tokenizer, max_length, seed_text, n_words):
in_text = seed_text
# generate a fixed number of words
for _ in range(n_words):
# encode the text as integer
encoded = tokenizer.texts_to_sequences([in_text])[0]
# pre-pad sequences to a fixed length
encoded = pad_sequences([encoded], maxlen=max_length, padding='pre')
# predict probabilities for each word
yhat = model.predict_classes(encoded, verbose=1)
print('yhat = ' + yhat)
#print('yhat : ' + str(yhat))
# map predicted word index to word
out_word = ''
for word, index in tokenizer.word_index.items():
if index == yhat:
out_word = word
break
# append to input
in_text += ' ' + out_word
return in_text
# source text
data = """apprenti electricien chefOdeOprojet \n
soudeur chefOdeOsection directeurOusine\n
mecanicien chefOdeOsection directeurOadjoint\n
ingenieur chefOdeOprojet directeurOadjoint directeurOusine\n
ingenieur chefOdeOprojet \n
apprenti soudeur chefOdeOsection chefOdeOprojet\n
ingenieurOetude chefOdeOprojet\n
ingenieurOetude manager chefOdeOprojet directeurOdepartement\n
apprenti gestionOproduction manager directeurOdepartement\n
ingenieurOetude commercial\n
soudeur ingenieurOetude manager directeurOadjoint\n
ingenieurOetude directeurOdepartement directeurOusine\n
apprenti soudeur\n
agentOsecurite chefOsecurite\n
apprenti mecanicien ouvrier manager\n
commercial directeurOadjoint\n
agentOsecurite chefOsecurite\n
directeurOusine retraite\n
ouvrier manager\n
ingenieur vente\n
secretaire comptable\n
comptable chefOcomptable\n
chefOcomptable directeurOdepartement\n
assistant secretaire comptable\n
assistant comptable\n
assistant secretaire commercial\n
commercial chefOdeOprojet\n
commercial vente chefOdeOprojet\n
electricien chefOdeOsection\n
apprenti ouvrier chefOdeOsection\n"""
# integer encode sequences of words
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])
encoded = tokenizer.texts_to_sequences([data])[0]
# retrieve vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# encode 2 words -> 1 word
sequences = list()
for line in data.split('\n'):
encoded = tokenizer.texts_to_sequences([line])[0]
for i in range(2, len(encoded)):
sequence = encoded[i-2:i+1]
sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))
# pad sequences
max_length = max([len(seq) for seq in sequences])
sequences = pad_sequences(sequences, maxlen=max_length, padding='pre')
print('Max Sequence Length: %d' % max_length)
# split into input and output elements
sequences = array(sequences)
X, y = sequences[:,:-1],sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)
# define model
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dropout(0.2))
#model.add(Dense(units = 3, activation = 'relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=0)
# evaluate model
print(generate_seq(model, tokenizer, max_length-1, 'electricien secretaire', 1))
and there is the console display:
Vocabulary Size: 24
Total Sequences: 20
Max Sequence Length: 3
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 2, 10) 240
_________________________________________________________________
lstm_2 (LSTM) (None, 50) 12200
_________________________________________________________________
dropout_2 (Dropout) (None, 50) 0
_________________________________________________________________
dense_2 (Dense) (None, 24) 1224
=================================================================
Total params: 13,664
Trainable params: 13,664
Non-trainable params: 0
_________________________________________________________________
None
1/1 [==============================] - 0s 86ms/step
yhat = [1]
electricien secretaire chefodeoprojet
If I understand your question correctly, you would like to see the probabilities associated with each class of a multi-classification problem?
The code looks pretty correct to me, but I would recommend trying a different evaluation step. I have gotten multi-classifier outputs with the following snippet:
# Fit the model
print "Fitting model..."
model.fit(np.asarray(self.X), np.asarray(self.Y), epochs=200, batch_size=10)
print "Model fitting complete."
self.TEST = np.asarray(self.TEST).reshape(( test_data.shape[0], 1, 128))
print "Predicting on Test (unseen) data..."
predictions = model.predict(self.TEST)
# Sigmoid predictions
labels = np.zeros(predictions.shape)
labels[predictions>0.5] = 1
print "Prediction labels for unseen: " + str(labels)
The output:
Prediction labels for unseen:
[[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 1. 0.]]
Each row denotes the classification of one sample; the index of the 1 represents which class (A,B,C,D) the sample fell into.

Keras, stateless LSTM

Here is a very simple example of LSTM in stateless mode and we train it on a very simple sequence [0–>1] and [0–>2]
Any idea why it won’t converge in stateless mode.?
We have a batch of size 2 with 2 samples and it supposed to keep the state within the batch. When predicting we would like to receive successively 1 and 2.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequence into required data format.
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_batch = 2
n_features = n_unique #which is =3
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=( 1, n_features) ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
model.fit(seqX, seqY, epochs=300, batch_size=n_batch, verbose=2, shuffle=False)
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, batch_size=n_batch, verbose=0)
for i in range(2):
print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))
Example 2
Here I want to clarify a bit what result I want.
Same code example but in stateful mode (stateful=True). It works perfectly. We feed the network 2 times with zeros and get 1 and then 2. But I want to get the same result in stateless mode as it supposed to keep the state within the batch.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequences into required data format
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_batch = 1
n_features = n_unique
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, 1, n_features), stateful=True ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
for epoch in range(300):
model.fit(seqX, seqY, epochs=1, batch_size=n_batch, verbose=2, shuffle=False)
model.reset_states()
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, batch_size=1, verbose=0)
for i in range(2):
print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))
As a correct result we should get:
Sequence
X=0.0 y=1.0, yhat=1.0
X=0.0 y=2.0, yhat=2.0
You must feed one sequence with two steps instead of two sequences with one step:
One sequence, two steps: seqX.shape = (1,2,3)
Two sequences, one step: seqX.shape = (2,1,3)
The input shape is (numberOfSequences, stepsPerSequence, featuresPerStep)
seqX = [[[1,0,0],[1,0,0]]]
If you want to get both steps for y as output, you must use return_sequences=True.
LSTM(n_neurons, input_shape=( 1, n_features), return_sequences=True)
The entire working code:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequence into required data format.
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[[ 1. , 0. , 0.], [ 1. , 0. , 0.]]])
seqY=numpy.array([[[0. , 1. , 0.] , [ 0. , 0. , 1.]]])
#shapes are (1,2,3) - 1 sequence, 2 steps, 3 features
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_features = n_unique #which is =3
#no need for batch size
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=( 2, n_features),return_sequences=True))
#the input shape must have two steps
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
model.fit(seqX, seqY, epochs=300, verbose=2)
#no shuffling and no batch size needed.
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, verbose=0)
print(seqX)
print(result) #all steps are predicted in a single array (with return_sequences=True)

Resources