Model Fit Keras - keras

I am having some trouble to understand some arguments of model.fit function in Keras.
Model Keras
In my problem i have a total of 1147 samples, and i have split those samples for training and validation (80% for training and 20% for validation). I am using the same batch size for training and validation. So, i got this:
Total_Samples = 1147
Training_Samples = 918
Validation_Samples = 229
Batch_Size = 16 # For Training and Validation
1st Question: Is the steps_per_epoch = Total_Samples/Batch_Size?
2nd Question Is the validation_steps = Validation_Samples/Batch_Size?
Thanks in advance!

The steps_per_epoch will be Training_Samples (not Total_Samples) divided by Batch_Size. Similarly, the validation_steps will be Validation_Samples divided by Batch_Size.

Related

CNN and batch size

#Hello, for a customized CNN that I train on a picture dataset with the method fit_generator , I don't #understanbd why it doesn't work with a low batch size and it works when I increase the batch _size #parameter ? Could somebody explain me what's wrong ?
nb_train_samples = 700
nb_validation_samples = 70
epochs = 50
batch_size = 5 ###
loops of CNN don't work when the batch size is too low (ex : 5
TL;DR: In simple words, your steps_per_epoch and validation_steps can not be more than len(train_generator) if it is batched already. (Assuming it is not repeated.)
The generator is already batched. You are trying to pass more steps than len(training_set) with nb_train_samples // batch_size which is equal to 140.
You don't need to set steps_per_epoch when using generators, unless you want to have less steps.
Example:
train_generator = train_datagen.flow_from_directory(
...
batch_size=20)
train_generator.samples # returns 2000
So in this case len(train_generator) returns 100. If you want to use less data-points then you can specify steps_per_epoch like:
steps_per_epoch=train_generator.samples // 32 <-- equals to 62

Keras.fit_generator takes more time for epoch

I am doing image classification by using Keras , I have 8k images(input) in training sample and 2k images(input) in test sample , defined epoch as 25 . I noticed that epoch is very slow (approx takes an hour for first iteration) .
can any one suggest how can I overcome this , and what is the reason it takes hell lot of time?
code below..
PART-1
initialise neural network
from keras.models import Sequential
#package to perfom first layer , which is convolution , using 2d as it is for image , for video it will be 3d
from keras.layers import Convolution2D
#to perform max pooling on convolved layer
from keras.layers import MaxPool2D
#to convert the pool feature map into large feature vector, will be input for ANN
from keras.layers import Flatten
#to add layeres on ANN
from keras.layers import Dense
#STEP -1
#Initializing CNN
classifier = Sequential()
#add convolution layer
classifier.add(Convolution2D(filters=32,kernel_size=(3,3),strides=(1, 1),input_shape= (64,64,3),activation='relu'))
#filters - Number of feature detecters that we are going to apply in image
#kernel_size - dimension of feature detector
#strides moving thru one unit at a time
#input shape - shape of the input image on which we are going to apply filter thru convolution opeation,
#we will have to covert the image into that shape in image preprocessing before feeding it into convolution
#channell 3 for rgb and 1 for bw , and dimension of pixels
#activation - function we use to avoid non linearity in image
#STEP -2
#add pooling
#this step will significantly reduce the size of feature map , and makes it easier for computation
classifier.add(MaxPool2D(pool_size=(2,2)))
#pool_size - factor by which to downscale
#STEP -3
#flattern the feature map
classifier.add(Flatten())
#STEP -4
#hidden layer
classifier.add(Dense(units=128,activation='relu',kernel_initializer='uniform'))
#output layer
classifier.add(Dense(units=1,activation='sigmoid'))
#Compiling the CNN using stochastic gradient descend
classifier.compile(optimizer='adam',loss = 'binary_crossentropy',
metrics=['accuracy'])
#loss function should be categorical_crossentrophy if output is more than 2 class
#PART2 - Fitting CNN to image
#copied from keras documentation
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
training_set = train_datagen.flow_from_directory(
'/Users/arunramji/Downloads/Sourcefiles/CNN_Imageclassification/Convolutional_Neural_Networks/dataset/training_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'/Users/arunramji/Downloads/Sourcefiles/CNN_Imageclassification/Convolutional_Neural_Networks/dataset/test_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
classifier.fit_generator(
training_set,
steps_per_epoch=8000, #number of input (image)
epochs=25,
validation_data=test_set,
validation_steps=2000) # number of training sample
classifier.fit(
training_set,
steps_per_epoch=8000, #number of input (image)
epochs=25,
validation_data=test_set,
validation_steps=2000)
You are setting steps_per_epoch to the wrong value (this is why it takes longer than necessary): it is not set to the number of data points. steps_per_epoch should be set to the size of the dataset divided by the batch size, which should be 8000/32 = 250 for your training set, and 63 for your validation set.
Update:
As Matias in his answer pointed out, your steps_per_epoch parameter setting in your fit method led for the huge slowing down per epoch.
From the fit_generator documentation:
steps_per_epoch:
Integer. Total number of steps (batches of samples)
to yield from generator before declaring one epoch finished and
starting the next epoch. It should typically be equal to
ceil(num_samples / batch_size) Optional for Sequence: if unspecified,
will use the len(generator) as a number of steps.
validation_steps: Only relevant if validation_data is a generator.
Total number of steps (batches of samples) to yield from
validation_data generator before stopping at the end of every epoch.
It should typically be equal to the number of samples of your
validation dataset divided by the batch size. Optional for Sequence:
if unspecified, will use the len(validation_data) as a number of
steps.
Actually Keras has an inconsistency at handling the two parameters, as fit method raises an Valuerror if you uses a simple dataset instead of datagenerator and set the parameters like batch_size=batch_size, steps_per_epoch=num_samples:
ValueError: Number of samples 60000 is less than samples required for specified batch_size 200 and steps 60000
But when data comes from datagenerator it doesn't handle the same problem letting you to have an issue like the current one.
I made a little example code to check these up.
The fit method with steps_per_epoch=num_samples:
Number of samples: 60000
Number of samples per batch: 200
Train for 60000 steps, validate for 50 steps
Epoch 1/5
263/60000 [..............................] - ETA: 4:07:09 - loss: 0.2882 - accuracy: 0.9116
with ETA (estimated time): 4:07:09,
as this is for 60000 steps, each of 200 samples per batch.
The same fit with steps_per_epoch=num_samples // batch_size:
Number of samples: 60000
Number of samples per batch: 200
Train for 300 steps, validate for 50 steps
Epoch 1/5
28/300 [=>............................] - ETA: 1:15 - loss: 1.0946 - accuracy: 0.6446
with ETA: 1:15
Solution:
steps_per_epoch=(training_set.shape[0] // batch_size)
validation_steps=(validation_set.shape[0] // batch_size)
Further possible issues regarding performance:
As #SajanGohil wrote in his comment train_datagen.flow_from_director make some tasks like file operations, preprocessings before actual traning process which sometimes takes more time as the traning itself.
So to avoid these extratime, you can do the preprocessing task before the whole traning process separately only once. Then you can use these preprocessed data at traning time.
Anyway CNNs with vast images are rather time and resource consuming tasks, which assumes GPU usage for this reason.

No effect of batch_size on number of iterations in model.fit in keras

I have a simple model for demonstration:
input_layer = Input(shape=(100,))
encoded = Dense(2, activation='relu')(input_layer)
X = np.ones((1000, 100))
Y = np.ones((1000, 2))
print(X.shape)
model = Model(input_layer, encoded)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x=X, y=Y, batch_size = 2)
Output is:
2.2.4
(1000, 100)
Epoch 1/1
1000/1000 [==============================] - 3s 3ms/step - loss: 1.3864
Why there are 1000 iterations in one epoch(as shown in the output).
I tried changing this but does not changes the output. I guess it should have been 1000/2 = 500. Please explain what is wrong with my understanding and how can i set the batch size appropriately.
Thanks
In model.fit the numbers in the left part of the progress bar count samples, so it is always the current samples / total number of samples.
Maybe you are confused because it works different in model.fit_generator. There you actually see iterations or batches being counted.
It changes the batch size, the bar progresses faster although you do not explicitly see it as a step. I had the same question in my mind some time ago.
If you want to explicitly see each step, you can use steps_per_epoch and validation_steps.
An example is listed below.
model.fit_generator(training_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps)
In this case, steps_per_epoch = number_of_training_samples / batch_size, while validation_steps = number_of_training_samples / batch_size.
During the training, you will see 500 steps instead of 1000 (provided that you have 1000 training samples and your batch_size is 2).

VGG bottleneck features + LSTM in keras

I have pre-stored bottleneck features (.npy files) obtained from VGG16 for around 10k images. Training a SVM classifier (3-class classification) on these features gave me an accuracy of 90% on the test set. These images are obtained from videos. I want to train an LSTM in keras on top of these features. My code snippet can be found below. The issue is that the training accuracy is not going above 43%, which is unexpected. Please help me in debugging the issue. I have tried with different learning rates.
#Asume all necessary imports done
classes = 3
frames = 5
channels = 3
img_height = 224
img_width = 224
epochs = 20
#Model definition
model = Sequential()
model.add(TimeDistributed(Flatten(),input_shape=(frames,7,7,512)))
model.add(LSTM(256,return_sequences=False))
model.add(Dense(1024,activation="relu"))
model.add(Dense(3,activation="softmax"))
optimizer = Adam(lr=0.1,beta_1=0.9,beta_2=0.999,epsilon=None,decay=0.0)
model.compile (loss="categorical_crossentropy",optimizer=optimizer,metrics=["accuracy"])
model.summary()
train_data = np.load(open('bottleneck_features_train.npy','rb'))
#final_img_data shape --> 2342,5,7,7,512
#one_hot_labels shape --> 2342,3
model.fit(final_img_data,one_hot_labels,epochs=epochs,batch_size=2)
You are probably missing the local minimum, because learning rate is too high. Try to decrease learning rate to 0.01 -- 0.001 and increase number of epochs. Also, decrease Dense layer neurons from 1024 to half. Otherwise you may overfit.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Resources