Tenserflow Keras model not accepting my python generator output - keras

I am following some tutorials on setting up my first conv NN for some image classifications.
The tutorials load all images into memory and pass them into model.fit(). I can't do that because my data set is too large.
I wrote this generator to "drip feed" preprocessed images to model.fit, but I am getting an error and because I am a newbie I am having trouble diagnosing.
These are processed only as greyscale images also.
Here is the generator that I made...
# need to preprocess image in batches because memory
# tdata expects list of tuples<(string) file_path, (int) class_num)>
def image_generator(tdata, batch_size):
start_from = 0;
while True:
# Slice array into batch data
batch = tdata[start_from:start_from+batch_size]
# Keep track of position
start_from += batch_size
# Create batch lists
batch_x = []
batch_y = []
# Read in each input, perform preprocessing and get labels
for img_path, class_num in batch:
# Read raw img data as np array
# Returns as shape (600, 300, 1)
img_arr = create_np_img_array(img_path)
# Normalize img data (/255)
img_arr = normalize_img_array(img_arr)
# Add to the batch x data list
batch_x.append(img_arr)
# Add to the batch y classification list
batch_y.append(class_num)
yield (batch_x, batch_y)
Creating an instance of the generator:
img_gen = image_generator(training_data, 30)
Setting up my model like so:
# create the model
model = Sequential()
# input layer has the input_shape param which is the dimentions of the np array
model.add( Conv2D(256, (3, 3), activation='relu', input_shape = (600, 300, 1)) )
model.add( MaxPooling2D( (2,2)) )
# second hidden layer
model.add( MaxPooling2D((2, 2)) )
model.add( Conv2D(256, (3, 3), activation='relu') )
# third hidden layer
model.add( MaxPooling2D((2, 2)))
model.add( Conv2D(256, (3, 3), activation='relu') )
# forth hidden layer
model.add( Flatten() )
model.add( Dense(64, activation='relu') )
# ouput layer
model.add( Dense(2) )
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# pass generator
model.fit(img_gen, epochs=5)
Then model.fit() fails from trying to call shape on an int.
~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in _get_dynamic_shape(t)
797
798 def _get_dynamic_shape(t):
--> 799 shape = t.shape
800 # Unknown number of dimensions, `as_list` cannot be called.
801 if shape.rank is None:
AttributeError: 'int' object has no attribute 'shape'
Any suggestions on what I've done wrong??

Converting the outputs from the generator to numpy arrays seems to have stopped the error.
np_x = np.array(batch_x)
np_y = np.array(batch_y)
Seems like it didn't like the classifications as a std list of ints.

Related

Getting the output from a specific layer in a CNN

I'm trying to understand how to work on CNNs. My question is that I would like to be able to extract the "encoded" version of an image from this model. The encoded version should be the output of the Dense 4096 .
def get_siamese_model(input_shape):
# Define the tensors for the two input images
left_input = Input(input_shape)
right_input = Input(input_shape)
# Convolutional Neural Network
model = Sequential()
model.add(Conv2D(64, (10,10), activation='relu', input_shape=input_shape,
kernel_initializer=initialize_weights, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(128, (7,7), activation='relu',
kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(128, (4,4), activation='relu', kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(MaxPooling2D())
model.add(Conv2D(256, (4,4), activation='relu', kernel_initializer=initialize_weights,
bias_initializer=initialize_bias, kernel_regularizer=l2(2e-4)))
model.add(Flatten())
model.add(Dense(4096, activation='sigmoid',
kernel_regularizer=l2(1e-3),
kernel_initializer=initialize_weights,bias_initializer=initialize_bias))
# Generate the encodings (feature vectors) for the two images
encoded_l = model(left_input)
encoded_r = model(right_input)
# Add a customized layer to compute the absolute difference between the encodings
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
L1_distance = L1_layer([encoded_l, encoded_r])
# Add a dense layer with a sigmoid unit to generate the similarity score
prediction = Dense(1,activation='sigmoid',bias_initializer=initialize_bias)(L1_distance)
# Connect the inputs with the outputs
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
# return the model
return siamese_net
Moreover, can I give as input only one image and get the encoding of that image?
Many thanks!
Edit: I have tried by doing this
layer_name = 'sequential_3'
model2= Model(inputs=model.input, outputs=model.get_layer(layer_name).output)
but i get this error
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 105, 105, 1), dtype=tf.float32, name='conv2d_12_input'), name='conv2d_12_input', description="created by layer 'conv2d_12_input'") at layer "conv2d_12". The following previous layers were accessed without issue: []
And I don't know how to fix it

Keras multi-class prediction only returning 1 prediction with softmax and categorical_crossentropy

I'm using Keras and Tensorflow to train a model that predicts a matching font based on an image of some letters. My folder contains data with a separate folder with each image of the letter in varying forms. My code for training the model looks like this:
LETTER_IMAGES_FOLDER = "datasets"
MODEL_FILENAME = "fonts_model.hdf5"
MODEL_LABELS_FILENAME = "model_labels.dat"
data = pd.read_csv('annotations.csv')
paths = list(data['Path'].values)
Y = list(data['Font'].values)
encoder = LabelEncoder()
encoder.fit(Y)
Y = encoder.transform(Y)
Y = np_utils.to_categorical(Y)
data = []
# loop over the input images
for image_file in paths:
# Load the image and convert it to grayscale
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Add a third channel dimension to the image to make Keras happy
image = np.expand_dims(image, axis=2)
# Add the letter image and it's label to our training data
data.append(image)
data = np.array(data, dtype="float") / 255.0
train_x, test_x, train_y, test_y = model_selection.train_test_split(data,Y,test_size = 0.1, random_state = 0)
# Save the mapping from labels to one-hot encodings.
# We'll need this later when we use the model to decode what it's predictions mean
with open(MODEL_LABELS_FILENAME, "wb") as f:
pickle.dump(encoder, f)
# Build the neural network!
model = Sequential()
# First convolutional layer with max pooling
model.add(Conv2D(20, (5, 5), padding="same", input_shape=(100, 100, 1), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Second convolutional layer with max pooling
model.add(Conv2D(50, (5, 5), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500, activation="relu"))
print (len(encoder.classes_))
model.add(Dense(len(encoder.classes_), activation="softmax"))
# Ask Keras to build the TensorFlow model behind the scenes
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# Train the neural network
model.fit(train_x, train_y, validation_data=(test_x, test_y), batch_size=32, epochs=2, verbose=1)
# Save the trained model to disk
model.save(MODEL_FILENAME)
Once the model has been created I'm predicting with it as follows:
predictions = model.predict(letter_image)
print (predictions) # this has the length of 1
The problem is that "predictions" is always an array of size 1 and I'm not sure why. I'm using softmax, categorical_crossentropy and my Dense value is greater than 1 in the last layer. Could someone please tell me why I'm not getting the top n predictions here?
I've also tried sigmoid with binary_crossentropy but get the same result. I think there's something more to it that I'm missing.

Deep learning Keras model CTC_Loss gives loss = infinity

i've a CRNN model for text recognition, it was published on Github, trained on english language,
Now i'm doing the same thing using this algorithm but for arabic.
My ctc function is:
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
My Model is:
def get_Model(training):
img_w = 128
img_h = 64
# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 128
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
# Initialising the CNN
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)
conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)
inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)
# cuts down input size going into RNN:
inner = Dense(time_dense_size, activation=act, name='dense1')(inner)
# Two layers of bidirectional GRUs
# GRU seems to work as well, if not better than LSTM:
gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)
gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)
gru1_merged = add([gru_1, gru_1b])
gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)
# transforms RNN output to character activations:
inner = Dense(num_classes+1, kernel_initializer='he_normal',
name='dense2')(concatenate([gru_2, gru_2b]))
y_pred = Activation('softmax', name='softmax')(inner)
Model(inputs=input_data, outputs=y_pred).summary()
labels = Input(name='the_labels', shape=[30], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
# clipnorm seems to speeds up convergence
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
if training:
return Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)
return Model(inputs=[input_data], outputs=y_pred)
Then i compile it with SGD optimizer (Tried SGD,adam)
sgd = SGD(lr=0.0000002, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
Then i fit the model with my training set (Images of words up to 30 characters) into (sequence of labels of 30
model.fit_generator(generator=tiger_train.next_batch(),
steps_per_epoch=int(tiger_train.n / batch_size),
epochs=30,
callbacks=[checkpoint],
validation_data=tiger_val.next_batch(),
validation_steps=int(tiger_val.n / val_batch_size))
Once it starts, it give me loss = inf, after many searches, i didn't find any similar problem.
So my questions is, how can i solve this, what can make a ctc_loss compute an infinite cost?
Thanks in advance
I found the problem, it was dimensions problem,
For R-CNN OCR using CTC layer, if you are detecting a sequence with length n, you should have an image with at least a width of (2*n-1). The more the better till you reach the best image/timesteps ratio to let the CTC layer able to recognize the letter correctly. If image with is less than (2*n-1), it will give a nan loss.
This error is happened when image text have two equal characters in the same sequence e.g happen --> pp. for so that you can remove data that has this characteristic.

DCGAN - Issue in understanding code

This a part of the code for a Deconvolutional-Convoltional Generative Adversarial Network (DC-GAN)
discriminator.trainable = False
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
Issue 1 - I do not understand what the line ganInput = Input(shape=(100,)) does. Clearly ganInput is a variable but what is Input? Is it a function ? If Input is a function then what will ganInput contain ?
Issue 2 - What is the role of the Model API ? I read about in the keras documentation but failed to understand what it is doing here.
Please ask for any further clarification / details you need.
Keras with TensorFlow backend
COMPLETE SOURCE CODE :
https://github.com/yashk2810/DCGAN-Keras/blob/master/DCGAN.ipynb
Line ganInput = Input(shape=(100,)) is just defining the shape of your input
which is a tensor of shape (100,)
The model will include all layers required in the computation of output given input. In the case of multi-input or multi-output models, you can use lists as well:
model = Model(inputs=[ganInput1, ganInput2], outputs=[ganOutput1, ganOutput2, ganOutput3])
Which means to compute ganOutput1, ganOutput2, ganOutput3 the Model api requires
input layers ganInput1, ganInput2
This is necessary for backtracking so that way the Model api has what it needs to calculate the output
this line loads the mnist data : (X_train, Y_train), (X_test, Y_test) = mnist.load_data() .... X_train and Y_train has training data and its corresponding target values .... X_test, Y_test has training data and its corresponding target values
# ======================================================
# Here the data is being loaded
# X_train = training data, Y_train = training targets
# X_test = testing data , Y_test = testing targets
# ======================================================
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# ================================================
# Reshaping the training and testing data
# He has added one extra dimension which is always one
# ================================================
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_train = X_train.astype('float32')
# ================================================
# Initially pixel values are in range of 0-255
# he makes the pixel values to be between -1 to 1
#==================================================
X_train = (X_train - 127.5) / 127.5
X_train.shape
# ======================================================================
# He builds the generator model over here
# 1] Dense layer with no of neurons = 128*7*7 & takes 100 numbers as input
# 2] Applying Batch Normalization
# 3] Upsampling layer
# 4] Convolution layer with activation LeakyRELU
# 5] Applying BatchNormalization
# 6] UpSampling2D layer
# 7] Convolution layer with activation LeakyRELU
# ======================================================================
generator = Sequential([
Dense(128*7*7, input_dim=100, activation=LeakyReLU(0.2)),
BatchNormalization(),
Reshape((7,7,128)),
UpSampling2D(),
Convolution2D(64, 5, 5, border_mode='same', activation=LeakyReLU(0.2)),
BatchNormalization(),
UpSampling2D(),
Convolution2D(1, 5, 5, border_mode='same', activation='tanh')
])
generator.summary()
# ======================================================================
# He builds the discriminator model over here
# 1] Convolution layer which takes as input an image of shape (28, 28, 1)
# 2] Dropout layer
# 3] Convolution layer for down-sampling with LeakyReLU as activation
# 4] Dropout layer
# 5] Flatten layer to flatten the output
# 6] 1 output node with sigmoid activation
# ======================================================================
discriminator = Sequential([
Convolution2D(64, 5, 5, subsample=(2,2), input_shape=(28,28,1), border_mode='same', activation=LeakyReLU(0.2)),
Dropout(0.3),
Convolution2D(128, 5, 5, subsample=(2,2), border_mode='same', activation=LeakyReLU(0.2)),
Dropout(0.3),
Flatten(),
Dense(1, activation='sigmoid')
])
discriminator.summary()
generator.compile(loss='binary_crossentropy', optimizer=Adam())
discriminator.compile(loss='binary_crossentropy', optimizer=Adam())
discriminator.trainable = False
# =====================================================================
# Remember above generator takes 100 numbers as input in the first layer
# Dense(128*7*7, input_dim=100, activation=LeakyReLU(0.2))
# Input(shape=(100,)) returns a tensor of this shape (100,)
# ====================================================================
ganInput = Input(shape=(100,))
# getting the output of the generator
# and then feeding it to the discriminator
# new model = D(G(input))
# ===========================================================
# giving the input tensor of shape (100,) to generator model
# ===========================================================
x = generator(ganInput)
# ===========================================================
# the output of generator will be of shape (batch_size, 28, 28, 1)
# this output of generator will go to discriminator as input
# Remember we have defined discriminator input as shape (28, 28, 1)
# ===========================================================
ganOutput = discriminator(x)
# =========================================================================
# Now it is clear that generators output is needed as input to discriminator
# You have to tell this to Model api for backpropogation
# Your Model api is the whole model you have built
# it tells you that your model is a combination of generator and discriminator model where that data flow is from generator to discriminator
# YOUR_Model = generator -> discriminator
# This is something like you want to train generator and discriminator as one single model and not as two different models
# but at the same time they are actually being trained individually (Hope this makes sense)
# =========================================================================
gan = Model(input=ganInput, output=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
gan.summary()
def train(epoch=10, batch_size=128):
batch_count = X_train.shape[0] // batch_size
for i in range(epoch):
for j in tqdm(range(batch_count)):
# Input for the generator
noise_input = np.random.rand(batch_size, 100)
# getting random images from X_train of size=batch_size
# these are the real images that will be fed to the discriminator
image_batch = X_train[np.random.randint(0, X_train.shape[0], size=batch_size)]
# these are the predicted images from the generator
predictions = generator.predict(noise_input, batch_size=batch_size)
# the discriminator takes in the real images and the generated images
X = np.concatenate([predictions, image_batch])
# labels for the discriminator
y_discriminator = [0]*batch_size + [1]*batch_size
# Let's train the discriminator
discriminator.trainable = True
discriminator.train_on_batch(X, y_discriminator)
# Let's train the generator
noise_input = np.random.rand(batch_size, 100)
y_generator = [1]*batch_size
discriminator.trainable = False
gan.train_on_batch(noise_input, y_generator)

Keras LSTM predict 1 timestep at a time

Edited to add:
I found what I think is a working solution: https://bleyddyn.github.io/posts/2017/10/keras-lstm/
I'm trying to use a Conv/LSTM network for controlling a robot. I think I have everything set up so I could start training it on batches of data from a replay memory, but I can't figure out how to actually use it to control a robot. Simplified test code is below.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(timesteps,) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', unroll=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, timesteps, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# stateful should be false here
pred = model.train_on_batch( batch, y )
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps and I don't think the LSTM internal state would be correct if I tried a rolling queue of input images.
batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
The error I get on the "model.predict( image..." line is:
ValueError: Error when checking : expected time_distributed_1_input to have shape (None, 10, 84, 84, 3) but got array with shape (1, 1, 84, 84, 3)
Which is understandable, but I can't find a way around it.
I don't know Keras well enough to even know if I'm using the TimeDistributed layers correctly.
So, is this even possible in Keras? If so, how?
If not, is it possible in TF or PyTorch?
Thanks for any suggestions!
Edited to add running code, although it's not necessarily correct. Still needs to be tested on an OpenAI gym task.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten, Input
from keras.layers import Convolution2D
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.utils import to_categorical
def make_model(num_actions, timesteps, input_dim, l2_reg=0.005 ):
input_shape=(1,None) + input_dim
model = Sequential()
model.add(TimeDistributed( Convolution2D(8, (3, 3), strides=(2,2), activation='relu' ), batch_input_shape=input_shape) )
model.add(TimeDistributed( Convolution2D(16, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed( Convolution2D(32, (3, 3), strides=(2,2), activation='relu', ) ))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(512, return_sequences=True, activation='relu', stateful=True))
model.add(Dense(num_actions, activation='softmax', ))
model.compile(loss='categorical_crossentropy', optimizer='adam' )
return model
batch_size = 16
timesteps = 10
num_actions = 6
model = make_model( num_actions, 1, (84,84,3) )
model.summary()
# Fake training batch. Would be pulled from a replay memory
batch = np.random.uniform( low=0, high=255, size=(batch_size,timesteps,84,84,3) )
y = np.random.randint( 0, high=5, size=(160) )
y = to_categorical( y, num_classes=num_actions )
y = y.reshape( batch_size, timesteps, num_actions )
# Need to find a way to prevent the optimizer from updating every b, but accumulate updates over an entire batch (batch_size).
for b in range(batch_size):
pred = model.train_on_batch( np.reshape(batch[b,:], (1,timesteps,84,84,3)), np.reshape(y[b,:], (1,timesteps,num_actions)) )
#for t in range(timesteps):
# pred = model.train_on_batch( np.reshape(batch[b,t,:], (1,1,84,84,3)), np.reshape(y[b,t,:], (1,1,num_actions)) )
model.reset_states() # Don't carry internal state between batches
# move trained network to robot
# This works, but it isn't practical to not get outputs (actions) until after 10 timesteps
#batch = np.random.uniform( low=0, high=255, size=(1,timesteps,84,84,3) )
#pred = model.predict( batch, batch_size=1 )
# This is what I would need to do on my robot, with the LSTM keeping state between calls to predict
max_time = 10 # or 100000, or forever, etc.
for i in range(max_time) :
image = np.random.uniform( low=0, high=255, size=(1,1,84,84,3) ) # pull one image from camera
# stateful should be true here
pred = model.predict( image, batch_size=1 )
# take action based on pred
print( pred )
The first thing you need is to understand your data.
Do these 5 dimensions mean anything?
I'll try to guess:
- 1 learning example
- 1 time step (this is added by TimeDistributed, normal 2D convolutions don't take this)
- 84 image side
- 84 another image side
- 3 channels (RGB)
The purpose of TimeDistributed is to add that extra timesteps dimension, so you can simulate a sequence in layers that are not supposed to work with sequences.
Your error message is telling you this:
Your input_shape parameter is (None, 10, 84, 84, 3), where None is the batch size (number of samples/examples).
Your input data, which is batch in your code is (1, 1, 84, 84, 3).
There is a mismatch, you are supposed to use batches containing 10 time steps (as defined by your input_shape). It's ok for the stateful=False model to pack 10 images in a batch and train with that.
But later, in the stateful=True case, you will need that input_shape to be just one step. (You either create a new model just for predicting and copy all weights from the training model to the predicting model, or you can try to use None in that time steps dimension, meaning you can train and predict with different amounts of time steps)
Now, differently from the convolutionals, the LSTM layer is already expecting time steps. So you should find a way to squeeze your data in less dimensions.
The LSTM will expect (None, timeSteps, features). The time steps are the same as the previous. 10 for training, 1 for predicting, and you could try to go with None there.
So, instead of a Flatten() inside a TimeDistributed, you should simply reshape the data, condensing the dimensions that are not batch size or steps:
model.add(Reshape((8,9*9*32))) #the batch size doesn't participate in this definition, and it will remain as it is.
The 9*9*32 are the sides of the preceding convolutional and its 32 filters. (I'm just not sure the sides are 9, maybe they're 8, you can see in the current model.summary()).
Finally, for the stateful=True case, you will have to define the model with batch_shape instead of input_shape. The amount of samples in a batch must be a fixed number, because the model will assume the samples in the second batch are new steps belonging to the samples in the previous batch. (The number of samples will then need to be the same for all batches).

Resources