Keras, stateless LSTM - keras

Here is a very simple example of LSTM in stateless mode and we train it on a very simple sequence [0–>1] and [0–>2]
Any idea why it won’t converge in stateless mode.?
We have a batch of size 2 with 2 samples and it supposed to keep the state within the batch. When predicting we would like to receive successively 1 and 2.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequence into required data format.
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_batch = 2
n_features = n_unique #which is =3
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=( 1, n_features) ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
model.fit(seqX, seqY, epochs=300, batch_size=n_batch, verbose=2, shuffle=False)
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, batch_size=n_batch, verbose=0)
for i in range(2):
print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))
Example 2
Here I want to clarify a bit what result I want.
Same code example but in stateful mode (stateful=True). It works perfectly. We feed the network 2 times with zeros and get 1 and then 2. But I want to get the same result in stateless mode as it supposed to keep the state within the batch.
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequences into required data format
seqX=numpy.array([[( 1. , 0. , 0.)], [( 1. , 0. , 0.)]])
seqY=numpy.array([( 0. , 1. , 0.) , ( 0. , 0. , 1.)])
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_batch = 1
n_features = n_unique
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, 1, n_features), stateful=True ))
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
for epoch in range(300):
model.fit(seqX, seqY, epochs=1, batch_size=n_batch, verbose=2, shuffle=False)
model.reset_states()
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, batch_size=1, verbose=0)
for i in range(2):
print('X=%.1f y=%.1f, yhat=%.1f' % (0, i+1, result[i]))
As a correct result we should get:
Sequence
X=0.0 y=1.0, yhat=1.0
X=0.0 y=2.0, yhat=2.0

You must feed one sequence with two steps instead of two sequences with one step:
One sequence, two steps: seqX.shape = (1,2,3)
Two sequences, one step: seqX.shape = (2,1,3)
The input shape is (numberOfSequences, stepsPerSequence, featuresPerStep)
seqX = [[[1,0,0],[1,0,0]]]
If you want to get both steps for y as output, you must use return_sequences=True.
LSTM(n_neurons, input_shape=( 1, n_features), return_sequences=True)
The entire working code:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy
# define sequences
seq = [0, 1, 0, 2]
# convert sequence into required data format.
#We are going to extract 2 samples [0–>1] and [0–>2] and convert them into one hot vectors
seqX=numpy.array([[[ 1. , 0. , 0.], [ 1. , 0. , 0.]]])
seqY=numpy.array([[[0. , 1. , 0.] , [ 0. , 0. , 1.]]])
#shapes are (1,2,3) - 1 sequence, 2 steps, 3 features
# define LSTM configuration
n_unique = len(set(seq))
n_neurons = 20
n_features = n_unique #which is =3
#no need for batch size
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=( 2, n_features),return_sequences=True))
#the input shape must have two steps
model.add(Dense(n_unique, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam')
# train LSTM
model.fit(seqX, seqY, epochs=300, verbose=2)
#no shuffling and no batch size needed.
# evaluate LSTM
print('Sequence')
result = model.predict_classes(seqX, verbose=0)
print(seqX)
print(result) #all steps are predicted in a single array (with return_sequences=True)

Related

why does my model not reach zero loss / still makes mistakes even though I intentionally overfit?

I am using the cartpole environment of openai and tried to make a DQN that solves it. I saw that the DQN doesn't get better scores even though I trained it for 200 episodes. Thus I tried to see how well the keras model performs that should learn the q values for each action.
# imports & constants
import numpy as np
import time, random
import matplotlib.pyplot as plt
import gym
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.callbacks import TensorBoard
import tensorflow as tf
import pickle
from collections import deque
NO_NEURONS = 20
BATCH_SIZE = 32
# make a model
model = Sequential()
model.add(Dense(NO_NEURONS, batch_input_shape=(32,4), activation=tf.nn.relu, name="dense1"))
model.add(Dense(NO_NEURONS, activation=tf.nn.relu, name="dense2"))
model.add(Dense(NO_NEURONS, activation=tf.nn.relu, name="dense3"))
model.add(Dense(NO_NEURONS, activation=tf.nn.relu, name="dense4"))
model.add(Dense(2, activation=tf.nn.sigmoid, name="dense5"))
model.compile(optimizer='adam', loss=tf.keras.losses.MeanSquaredError(), metrics=['MSE'])
print(current_qs.shape, future_qs.shape, len(transitions))
# this gives (32, 2) (32, 2) 32 which is good.
print("transitions: ", transitions[0])
print("old states q: ",old_states[0])
print("new states q: ", new_states[0])
"""
This gives:
transitions: (array([ 0.0209578 , -0.34232047, -0.00570952, 0.5816825 ], dtype=float32), 0, 1.0, array([ 0.01411139, -0.537362 , 0.00592413, 0.8725614 ], dtype=float32), False)
old states q: [ 0.00633551 -0.22556473 -0.00497439 0.24501044]
new states q: [ 1.8242138e-03 -4.2061529e-01 -7.4184951e-05 5.3612018e-01]
"""
# here we update the q values
# the reward should always be 1 because for every extra time the environment isn't done we get rewards.
for index, (old_state, action, reward, observation, done) in enumerate(transitions):
if done: # if done
q = reward # then the right q value is the reward
else:
next_q = np.max(future_qs[index])
q = reward + 0.90*next_q
# otherwise it is the reward + the discount factor times the next maximum q value
current_q = current_qs[index]
current_q[action] = q
x.append(old_state)
y.append(current_q)
x = np.array(x)
y = np.array(y)
print(x.shape, y.shape)
# gives (32, 4) (32, 2) which is good.
# actually fit the model
model.fit(x, y, epochs=500, batch_size=64, shuffle=False, verbose=2)
The last 12 epochs look like this and have exactly the same loss
Epoch 500/500
1/1 - 0s - loss: 0.1357 - MSE: 0.1357 - 5ms/epoch - 5ms/ste
For printing the results:
# see the mistakes still
preds = model.predict(x) for i in range(len(y)):
if np.argmax(preds[i]) != np.argmax(y[i]):
print("WRONG: ", i, np.argmax(preds[i]), preds[i], np.argmax(y[i]), y[i])
I get:
WRONG: 6 0 [0.9999985 0.65476906] 1 [0.55540055 1.5391604 ]
WRONG: 8 0 [1. 0.6853456] 1 [0.5077556 1. ]
WRONG: 20 0 [0.9998766 0.99974597] 1 [0.55703723 1.5438437 ]
WRONG: 25 0 [1. 0.6897808] 1 [0.54609764 1. ]
I thought that since I have more weights in my model that it should be able to perfectly learn the weights. Is that assumption wrong or does anyone see something that I missed?

how to build in Keras Many to Many LTSM with Softmax Layer at the end of each time step

I am struggling to create a many to many LTSM with a softmax at the end of each time step.
I have built:
an input of (400 samples, 2 time steps, 8 input dimensions per time step)
a softmax output to recognize 5 different ranks:
(400 samples, 2 time steps, 5 input dimensions per time step)
ranks 0 1 2 3 4
i.e. [[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]]
Note: XX dim (400,2,8)
Note: YY dim (400,2,5)
Then I try to run it after splitting between train and test sets,
but no matter what I change, I keep getting this error:
InvalidArgumentError:
logits and labels must have the same first dimension, got logits shape [64,5] and labels shape [320]
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWith#Logits (defined at \Local\Temp/ipykernel_14204/3401061264.py:9) ]] [Op:__inference_train_function_46581]"
I have spent two days without any progress. I will appreciate an expert advice.
This is my model:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
model = Sequential()
model.add(LSTM(8, input_shape=(dim_XX[1],dim_XX[2]), activation='relu', return_sequences=True))
model.add(Dense(5,input_shape=(dim_YY[1],dim_YY[2]),activation='softmax'))
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'],
)
model.fit(X_train,
y_train,
epochs=1,
validation_data=(X_test, y_test))

Multiple propositions for multiple class prediction

I am working on a word prediction problem. I have examples of career path, and I would like to be able to predict a next person's job using their last 2 jobs. I have built a LSTM model to perform it
I have a problem when intenting to get multiple results from keras model.predict_classes function. It only returns 1 result. I would like to get multiple results, ordered by their probability.
Here is the code :
from numpy import array
from keras.preprocessing.text import Tokenizer
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import Embedding
# generate a sequence from a language model
def generate_seq(model, tokenizer, max_length, seed_text, n_words):
in_text = seed_text
# generate a fixed number of words
for _ in range(n_words):
# encode the text as integer
encoded = tokenizer.texts_to_sequences([in_text])[0]
# pre-pad sequences to a fixed length
encoded = pad_sequences([encoded], maxlen=max_length, padding='pre')
# predict probabilities for each word
yhat = model.predict_classes(encoded, verbose=1)
print('yhat = ' + yhat)
#print('yhat : ' + str(yhat))
# map predicted word index to word
out_word = ''
for word, index in tokenizer.word_index.items():
if index == yhat:
out_word = word
break
# append to input
in_text += ' ' + out_word
return in_text
# source text
data = """apprenti electricien chefOdeOprojet \n
soudeur chefOdeOsection directeurOusine\n
mecanicien chefOdeOsection directeurOadjoint\n
ingenieur chefOdeOprojet directeurOadjoint directeurOusine\n
ingenieur chefOdeOprojet \n
apprenti soudeur chefOdeOsection chefOdeOprojet\n
ingenieurOetude chefOdeOprojet\n
ingenieurOetude manager chefOdeOprojet directeurOdepartement\n
apprenti gestionOproduction manager directeurOdepartement\n
ingenieurOetude commercial\n
soudeur ingenieurOetude manager directeurOadjoint\n
ingenieurOetude directeurOdepartement directeurOusine\n
apprenti soudeur\n
agentOsecurite chefOsecurite\n
apprenti mecanicien ouvrier manager\n
commercial directeurOadjoint\n
agentOsecurite chefOsecurite\n
directeurOusine retraite\n
ouvrier manager\n
ingenieur vente\n
secretaire comptable\n
comptable chefOcomptable\n
chefOcomptable directeurOdepartement\n
assistant secretaire comptable\n
assistant comptable\n
assistant secretaire commercial\n
commercial chefOdeOprojet\n
commercial vente chefOdeOprojet\n
electricien chefOdeOsection\n
apprenti ouvrier chefOdeOsection\n"""
# integer encode sequences of words
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])
encoded = tokenizer.texts_to_sequences([data])[0]
# retrieve vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# encode 2 words -> 1 word
sequences = list()
for line in data.split('\n'):
encoded = tokenizer.texts_to_sequences([line])[0]
for i in range(2, len(encoded)):
sequence = encoded[i-2:i+1]
sequences.append(sequence)
print('Total Sequences: %d' % len(sequences))
# pad sequences
max_length = max([len(seq) for seq in sequences])
sequences = pad_sequences(sequences, maxlen=max_length, padding='pre')
print('Max Sequence Length: %d' % max_length)
# split into input and output elements
sequences = array(sequences)
X, y = sequences[:,:-1],sequences[:,-1]
y = to_categorical(y, num_classes=vocab_size)
# define model
model = Sequential()
model.add(Embedding(vocab_size, 10, input_length=max_length-1))
model.add(LSTM(50))
model.add(Dropout(0.2))
#model.add(Dense(units = 3, activation = 'relu'))
model.add(Dense(vocab_size, activation='softmax'))
print(model.summary())
# compile network
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit network
model.fit(X, y, epochs=500, verbose=0)
# evaluate model
print(generate_seq(model, tokenizer, max_length-1, 'electricien secretaire', 1))
and there is the console display:
Vocabulary Size: 24
Total Sequences: 20
Max Sequence Length: 3
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 2, 10) 240
_________________________________________________________________
lstm_2 (LSTM) (None, 50) 12200
_________________________________________________________________
dropout_2 (Dropout) (None, 50) 0
_________________________________________________________________
dense_2 (Dense) (None, 24) 1224
=================================================================
Total params: 13,664
Trainable params: 13,664
Non-trainable params: 0
_________________________________________________________________
None
1/1 [==============================] - 0s 86ms/step
yhat = [1]
electricien secretaire chefodeoprojet
If I understand your question correctly, you would like to see the probabilities associated with each class of a multi-classification problem?
The code looks pretty correct to me, but I would recommend trying a different evaluation step. I have gotten multi-classifier outputs with the following snippet:
# Fit the model
print "Fitting model..."
model.fit(np.asarray(self.X), np.asarray(self.Y), epochs=200, batch_size=10)
print "Model fitting complete."
self.TEST = np.asarray(self.TEST).reshape(( test_data.shape[0], 1, 128))
print "Predicting on Test (unseen) data..."
predictions = model.predict(self.TEST)
# Sigmoid predictions
labels = np.zeros(predictions.shape)
labels[predictions>0.5] = 1
print "Prediction labels for unseen: " + str(labels)
The output:
Prediction labels for unseen:
[[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 1. 0.]]
Each row denotes the classification of one sample; the index of the 1 represents which class (A,B,C,D) the sample fell into.

DCGAN - Issue in understanding code

This a part of the code for a Deconvolutional-Convoltional Generative Adversarial Network (DC-GAN)
discriminator.trainable = False
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
Issue 1 - I do not understand what the line ganInput = Input(shape=(100,)) does. Clearly ganInput is a variable but what is Input? Is it a function ? If Input is a function then what will ganInput contain ?
Issue 2 - What is the role of the Model API ? I read about in the keras documentation but failed to understand what it is doing here.
Please ask for any further clarification / details you need.
Keras with TensorFlow backend
COMPLETE SOURCE CODE :
https://github.com/yashk2810/DCGAN-Keras/blob/master/DCGAN.ipynb
Line ganInput = Input(shape=(100,)) is just defining the shape of your input
which is a tensor of shape (100,)
The model will include all layers required in the computation of output given input. In the case of multi-input or multi-output models, you can use lists as well:
model = Model(inputs=[ganInput1, ganInput2], outputs=[ganOutput1, ganOutput2, ganOutput3])
Which means to compute ganOutput1, ganOutput2, ganOutput3 the Model api requires
input layers ganInput1, ganInput2
This is necessary for backtracking so that way the Model api has what it needs to calculate the output
this line loads the mnist data : (X_train, Y_train), (X_test, Y_test) = mnist.load_data() .... X_train and Y_train has training data and its corresponding target values .... X_test, Y_test has training data and its corresponding target values
# ======================================================
# Here the data is being loaded
# X_train = training data, Y_train = training targets
# X_test = testing data , Y_test = testing targets
# ======================================================
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# ================================================
# Reshaping the training and testing data
# He has added one extra dimension which is always one
# ================================================
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
# ================================================
# Initially pixel values are in range of 0-255
# he makes the pixel values to be between -1 to 1
#==================================================
X_train = (X_train - 127.5) / 127.5
X_train.shape
# ======================================================================
# He builds the generator model over here
# 1] Dense layer with no of neurons = 128*7*7 & takes 100 numbers as input
# 2] Applying Batch Normalization
# 3] Upsampling layer
# 4] Convolution layer with activation LeakyRELU
# 5] Applying BatchNormalization
# 6] UpSampling2D layer
# 7] Convolution layer with activation LeakyRELU
# ======================================================================
generator = Sequential([
Dense(128*7*7, input_dim=100, activation=LeakyReLU(0.2)),
BatchNormalization(),
Reshape((7,7,128)),
UpSampling2D(),
Convolution2D(64, 5, 5, border_mode='same', activation=LeakyReLU(0.2)),
BatchNormalization(),
UpSampling2D(),
Convolution2D(1, 5, 5, border_mode='same', activation='tanh')
])
generator.summary()
# ======================================================================
# He builds the discriminator model over here
# 1] Convolution layer which takes as input an image of shape (28, 28, 1)
# 2] Dropout layer
# 3] Convolution layer for down-sampling with LeakyReLU as activation
# 4] Dropout layer
# 5] Flatten layer to flatten the output
# 6] 1 output node with sigmoid activation
# ======================================================================
discriminator = Sequential([
Convolution2D(64, 5, 5, subsample=(2,2), input_shape=(28,28,1), border_mode='same', activation=LeakyReLU(0.2)),
Dropout(0.3),
Convolution2D(128, 5, 5, subsample=(2,2), border_mode='same', activation=LeakyReLU(0.2)),
Dropout(0.3),
Flatten(),
Dense(1, activation='sigmoid')
])
discriminator.summary()
generator.compile(loss='binary_crossentropy', optimizer=Adam())
discriminator.compile(loss='binary_crossentropy', optimizer=Adam())
discriminator.trainable = False
# =====================================================================
# Remember above generator takes 100 numbers as input in the first layer
# Dense(128*7*7, input_dim=100, activation=LeakyReLU(0.2))
# Input(shape=(100,)) returns a tensor of this shape (100,)
# ====================================================================
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
# ===========================================================
# giving the input tensor of shape (100,) to generator model
# ===========================================================
x = generator(ganInput)
# ===========================================================
# the output of generator will be of shape (batch_size, 28, 28, 1)
# this output of generator will go to discriminator as input
# Remember we have defined discriminator input as shape (28, 28, 1)
# ===========================================================
ganOutput = discriminator(x)
# =========================================================================
# Now it is clear that generators output is needed as input to discriminator
# You have to tell this to Model api for backpropogation
# Your Model api is the whole model you have built
# it tells you that your model is a combination of generator and discriminator model where that data flow is from generator to discriminator
# YOUR_Model = generator -> discriminator
# This is something like you want to train generator and discriminator as one single model and not as two different models
# but at the same time they are actually being trained individually (Hope this makes sense)
# =========================================================================
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
gan.summary()
def train(epoch=10, batch_size=128):
batch_count = X_train.shape[0] // batch_size
for i in range(epoch):
for j in tqdm(range(batch_count)):
# Input for the generator
noise_input = np.random.rand(batch_size, 100)
# getting random images from X_train of size=batch_size
# these are the real images that will be fed to the discriminator
image_batch = X_train[np.random.randint(0, X_train.shape[0], size=batch_size)]
# these are the predicted images from the generator
predictions = generator.predict(noise_input, batch_size=batch_size)
# the discriminator takes in the real images and the generated images
X = np.concatenate([predictions, image_batch])
# labels for the discriminator
y_discriminator = [0]*batch_size + [1]*batch_size
# Let's train the discriminator
discriminator.trainable = True
discriminator.train_on_batch(X, y_discriminator)
# Let's train the generator
noise_input = np.random.rand(batch_size, 100)
y_generator = [1]*batch_size
discriminator.trainable = False
gan.train_on_batch(noise_input, y_generator)

Keras LSTM predict 1 timestep at a time

Edited to add:
I found what I think is a working solution: https://bleyddyn.github.io/posts/2017/10/keras-lstm/
I'm trying to use a Conv/LSTM network for controlling a robot. I think I have everything set up so I could start training it on batches of data from a replay memory, but I can't figure out how to actually use it to control a robot. Simplified test code is below.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(timesteps,) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', unroll=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, timesteps, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# stateful should be false here
pred = model.train_on_batch( batch, y )
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps and I don't think the LSTM internal state would be correct if I tried a rolling queue of input images.
batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
The error I get on the "model.predict( image..." line is:
ValueError: Error when checking : expected time_distributed_1_input to have shape (None, 10, 84, 84, 3) but got array with shape (1, 1, 84, 84, 3)
Which is understandable, but I can't find a way around it.
I don't know Keras well enough to even know if I'm using the TimeDistributed layers correctly.
So, is this even possible in Keras? If so, how?
If not, is it possible in TF or PyTorch?
Thanks for any suggestions!
Edited to add running code, although it's not necessarily correct. Still needs to be tested on an OpenAI gym task.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(1,None) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), batch_input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', stateful=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, 1, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# Need to find a way to prevent the optimizer from updating every b, but accumulate updates over an entire batch (batch_size).
for b in range(batch_size):
pred = model.train_on_batch( np.reshape(batch[b,:], (1,timesteps,84,84,3)), np.reshape(y[b,:], (1,timesteps,num_actions)) )
#for t in range(timesteps):
# pred = model.train_on_batch( np.reshape(batch[b,t,:], (1,1,84,84,3)), np.reshape(y[b,t,:], (1,1,num_actions)) )
model.reset_states() # Don't carry internal state between batches
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps
#batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
#pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
print( pred )
The first thing you need is to understand your data.
Do these 5 dimensions mean anything?
I'll try to guess:
- 1 learning example
- 1 time step (this is added by TimeDistributed, normal 2D convolutions don't take this)
- 84 image side
- 84 another image side
- 3 channels (RGB)
The purpose of TimeDistributed is to add that extra timesteps dimension, so you can simulate a sequence in layers that are not supposed to work with sequences.
Your error message is telling you this:
Your input_shape parameter is (None, 10, 84, 84, 3), where None is the batch size (number of samples/examples).
Your input data, which is batch in your code is (1, 1, 84, 84, 3).
There is a mismatch, you are supposed to use batches containing 10 time steps (as defined by your input_shape). It's ok for the stateful=False model to pack 10 images in a batch and train with that.
But later, in the stateful=True case, you will need that input_shape to be just one step. (You either create a new model just for predicting and copy all weights from the training model to the predicting model, or you can try to use None in that time steps dimension, meaning you can train and predict with different amounts of time steps)
Now, differently from the convolutionals, the LSTM layer is already expecting time steps. So you should find a way to squeeze your data in less dimensions.
The LSTM will expect (None, timeSteps, features). The time steps are the same as the previous. 10 for training, 1 for predicting, and you could try to go with None there.
So, instead of a Flatten() inside a TimeDistributed, you should simply reshape the data, condensing the dimensions that are not batch size or steps:
model.add(Reshape((8,9*9*32))) #the batch size doesn't participate in this definition, and it will remain as it is.
The 9*9*32 are the sides of the preceding convolutional and its 32 filters. (I'm just not sure the sides are 9, maybe they're 8, you can see in the current model.summary()).
Finally, for the stateful=True case, you will have to define the model with batch_shape instead of input_shape. The amount of samples in a batch must be a fixed number, because the model will assume the samples in the second batch are new steps belonging to the samples in the previous batch. (The number of samples will then need to be the same for all batches).

Resources