How to reshape the input for Conv1D with 1 Dimensional data (Keras)? - keras

My data looks like this:
10000 columns = features and 68 rows = samples. Omit the first column which is the label.
How can I reshape it in the proper manner. Currently the performance is quite bad. And I'm guessing it's the encoding.
I'm struggling to see which dimension is which which since my data is 1 dimensional.
Here is the dataset.
Or a minimal representation:
30, 0.5, 0.2, 0.004, 0.001, 0.1, 0.003, 0.0005, 0.003
20, 0.1, 0.003, 0.0005, 0.003, 0.003, 0.1, 0.4, 0.33
25, 0.9, 0.63, 0.0005, 0.003, 0.0005, 0.003, 0.1, 0.003
26, 0.08, 0.83, 0.0005, 0.003, 0.1, 0.003, 0.0005, 0.003
39, 0.003, 0.1, 0.4, 0.33, 0.9, 0.63, 0.0005, 0.003
(First column is age, the rest are numbers between 0 and 1).
And here is how to use it:
data1_df = pd.read_csv("GSE106648_data1.csv")
data2_df = pd.read_csv("GSE106648_data2.csv")
# Split the data
X1, y1 = data1_df.values[:,1:], data1_df.values[:,0]
X2, y2 = data2_df.values[:,1:], data2_df.values[:,0]
X1_train, X1_valid, y1_train, y1_valid = train_test_split(X1, y1, test_size=0.2, shuffle= True)
X2_train, X2_valid, y2_train, y2_valid = train_test_split(X2, y2, test_size=0.2, shuffle= True)
How I reshaped it:
sample_size = X1_train.shape[0] # number of samples in train set
time_steps = X1_train.shape[1] # number of features in train set
input_dimension = 1 # each feature is represented by 1 number
# We need to reshape the Test and validation data as well:
X1_train_reshaped = X1_train.reshape(X1_train.shape[0],X1_train.shape[1],1)
X1_valid_reshaped = X1_valid.reshape(X1_valid.shape[0],X1_valid.shape[1],1)
X2_train_reshaped = X2_train.reshape(X2_train.shape[0],X2_train.shape[1],1)
X2_valid_reshaped = X2_valid.reshape(X2_valid.shape[0],X2_valid.shape[1],1)
X1_reshaped = X1.reshape(X1.shape[0],X1.shape[1],1)
X2_reshaped = X2.reshape(X2.shape[0],X2.shape[1],1)
And the model:
def conv1D_model():
n_timesteps = X1_train_reshaped.shape[1]
n_features = X1_train_reshaped.shape[2] #1
model = Sequential(name="model_conv1D")
model.add(layers.Input(shape=(n_timesteps,n_features)))
model.add(layers.Conv1D(filters=8, kernel_size=4, activation='LeakyReLU', name="Conv1D_1"))
model.add(layers.MaxPooling1D(pool_size=4, name="MaxPooling1D_1"))
model.add(layers.Flatten())
model.add(layers.Dense(50, activation='LeakyReLU', name="Dense_1"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(n_features, activation='LeakyReLU', name="output"))
optimizer = optimizers.Adam(learning_rate=1e-4)
model.compile(loss='mse',optimizer=optimizer,metrics=['mae'])
return model
my_model = conv1D_model()
history = my_model.fit(X1_train_reshaped,y1_train,batch_size=50,epochs=100,validation_data=(X1_valid_reshaped,y1_valid), shuffle=True)

Few remarks in general, some I'll pose as questions as thinking about them might help you progress as well.
Why is X written in capital letters, while your y is denoted small? Is this vector vs matrix?
Trying to reproduce your code, it would also be great to see which packages (and versions) you are using.
In my case you will be able to reproduce with
python == 3.9
keras == 2.11.0
tensorflow == 2.11.0
pandas == 1.5.2
What do your cryptic (at least for me) input features stand for? Which range of value can you expect from them (you answered that one already)?
What is it your model tries to predict (oh I see the age)? What is the range of values for your expected output value (becomes clearer once we know your target is age)?
What is the difference between data1 and data2? Is this your train-validation-test split? But then you split it again in the code, and finally only use X1 for training? Something has to be straightened out here. (I will go ahead and assume that X1 is train and validate while X2 is the pure test set - which is arguably good practice).
Some things on architecture
A convolutional model is perfect for image data, where you have spatial links between pixel, while at the same time you expect pattern to reemerge along the whole image within a convolutional filter, so you benefit from weight sharing. These characteristics I can not see in your problem.
I would go and try with a fully connected model first.
def architecture(num_features):
# fewer layers will probably also do the trick
model = Sequential(name="model_Dense")
model.add(layers.Input(shape=(1, num_features)))
model.add(layers.Dense(num_features, activation='LeakyReLU', name="Dense_1"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(30, activation='LeakyReLU', name="Dense_2"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(10, activation='LeakyReLU', name="Dense_3"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(1, activation='LeakyReLU', name="output"))
optimizer = optimizers.Adam()
model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
return model
I might be wrong as I do not know your data - 10 minutes later, going down your rabbit whole I realized you are talking of time stamps in your code - I hope your samples have the same reference time stamp at the beginning otherwise we will really have to rethink the whole approach - maybe going in the direction of recurrent neural networks, or you will have to do some proper preprocessing.
Again after realizing that your sample has an input dimension of 10k values. To me everything shouts preprocessing. Can you maybe downsample your feature size? This depends on your sample rate and again the type of data and pattern you are trying to learn from (a plot of a timeseries would have also helped so much while answering your question and understanding your data).
Some things on reshaping
If I understood your task correctly, you get an input vector of some fancy (I guess genes? Or something "biology") and then want to predict a single value (the age).
To me the way you load the data is pretty much fine already.
# prepare samples and ground truth
X1, y1 = data1_df.values[:, 1:], data1_df.values[:, 0]
num_samples = X1.shape[0]
num_features = X1.shape[1]
X1 = X1.reshape(num_samples, 1, num_features)
I do away with the validation splitting by using the keras inbuilt validation_split function where you only have to set the percentage of samples that is then used for validation.
model.fit(x_train, y_train, batch_size=1, validation_split=0.2, epochs=1)
For reproducibility it is advised to set a seed.
from tensorflow import set_random_seed
set_random_seed(2)
First training run and sanity checking
Things to check. When increasing your dropout a lot will not decrease your accuracy you probably can do away with quite some input values - meaning decreasing the sample rate.
You have only 67 samples in data1 (everything shouts Augmentation or collect more data), so a batch size larger than that, is meaningless, while a batch size of 67 would mean you only have a single weight update step in a single epoch. Which is great for gradient estimation, but probably not too compute efficient, when your dataset size increases.
I did train this thing because that is the fun part. After 100 epochs I feel like I pretty much converged already - at an error of approx. 7 years - you tell me whether that prediction accuracy is already good enough for what you are trying to achieve.
Epoch 98/100
7/7 [==============================] - 2s 278ms/step - loss: 261.2974 - mae: 12.7917 - val_loss: 81.9025 - val_mae: 7.2442
Epoch 99/100
7/7 [==============================] - 2s 277ms/step - loss: 269.4073 - mae: 13.1898 - val_loss: 64.5845 - val_mae: 6.4933
Epoch 100/100
7/7 [==============================] - 2s 276ms/step - loss: 251.1944 - mae: 12.9838 - val_loss: 70.1897 - val_mae: 6.6773
An easy next step to decrease your error could be to cast your prediction to an Integer. As your ground truth are all Integers. However that again depends on which prediction resolution (year, month, etc.) you require.
Looks like a fun project. You'll manage

Related

Why is my loss function tending to inifnity, however it is working appropiately when the x and y co-ordinates are swapped

I have a cookie-cutter Linear Regression PyTorch model. To calculate the expected years of experience, dependent on the individuals' salary. A visualisation of the dataset can be viewed below
where the parameters are as follows.
model = LinearRegressionModel(1, 1) # single dimension
criterion = nn.MSELoss(reduction = "mean") # mean squared error, minimise total loss
learning_rate = 5e-4
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # Stochastic Gradient Descent
EPOCHS = 10000
model = model.double()
The model is as follows:
class LinearRegressionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegressionModel, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
When I am applying my training function to the dataset I get the following output
epoch 0, loss 8369994489.500052
epoch 5, loss 5.837550943575215e+79
epoch 10, loss 4.071328967185933e+149
epoch 15, loss 2.8394989130314087e+219
epoch 20, loss 1.9803740110638713e+289
epoch 25, loss inf
epoch 30, loss inf
epoch 35, loss inf
epoch 40, loss inf
epoch 45, loss nan
Where my test train split is as follows:
def test_train_split(df):
training_data = df.sample(frac=0.8, random_state=25) #
testing_data = df.drop(training_data.index)
y_train, x_train = (
training_data["YearsExperience"].to_numpy(),
training_data["Salary"].to_numpy(),
)
y_test, x_test = (
testing_data["YearsExperience"].to_numpy(),
testing_data["Salary"].to_numpy(),
)
return x_train, y_train, x_test, y_test
However when I swap my X and Y values thus changing my model to: calculate the salary of an individual depending on their experience my training model gives the following output
epoch 0, loss 9643590644.01929
epoch 5, loss 1910502419.8189254
epoch 10, loss 394543586.1592383
epoch 15, loss 97350361.21930182
epoch 20, loss 39076027.76543479
epoch 25, loss 27637810.070729867
epoch 30, loss 25381050.43396528
epoch 35, loss 24924174.726827644
epoch 40, loss 24820147.1727601
epoch 45, loss 24785300.2845243
epoch 50, loss 24764025.725635834
epoch 55, loss 24745422.391813274
epoch 60, loss 24727353.293723747
The output above is working as intended
Where the test train split is as follows, notice the order of the tuples.
def test_train_split(df):
training_data = df.sample(frac=0.8, random_state=25) #
testing_data = df.drop(training_data.index)
x_train, y_train = (
training_data["YearsExperience"].to_numpy(),
training_data["Salary"].to_numpy(),
)
x_test, y_test = (
testing_data["YearsExperience"].to_numpy(),
testing_data["Salary"].to_numpy(),
)
return x_train, y_train, x_test, y_test
So my question is:
Why is this happening, AFAIK the model doesn't care about the data since it's trying to identify heuristics to minimise the loss, So why can I generate a working solution when flipping the axis?
How can I fix my model to work when I have my intended question to be asked which is someone's expected years of experience when you have their salary as an input.
I have tried, to get the model working with my intended question:
Tweaking learning rate
Trying different optimisers
Trying different loss
Changing epoch size
The issue arose due to the lost function exploding gradient issue since I was using MSEloss and my dataset was using large numbers. the Mean Squared error was initially high, So my iterations on stochastic gradient descent Iterated the loss to a higher value as expected, but these values were too large so when squared they immediately got classified to np.inf, in doing so it was too late to fix the gradient for future iterations.
The solution:
Scale the dataset down, whilst drastically decreasing the learning rate, I tried iterations of e^-8.
Or
Use a different loss function. This is the solution I went with, I used the Mean Absolute error. Due to the fact of its properties which I learned from this resource https://neptune.ai/blog/pytorch-loss-functions
When could it be used?
Regression problems, especially when the distribution of the target variable has outliers, such as small or big values that are a great distance from the mean value. It is considered to be more robust to outliers.
The property above fits my use case, thus I implemented it appropriately. Which stopped my exploding gradient issue.

CNN classifier with keras shows accuracy (and categorical accuracy) of 1 but still many predictions are false

I have trained a CNN classifier and get weird results. When training, it reaches 1 accuracy (and also categorical accuracy, whatever the difference might be). However, when I predict on training samples manually, I rarely get the right class after a np.argmax() which seems very odd. I figured it might be a bad mapping of classes, but after checking the generator classes mapping it look ok.
I suspect the way I input the images for testing is different from the way the data generator feeds the images for training, it's the only possible explanation. Here's some code:
datagen = ImageDataGenerator(rescale=1./255)
train_classif_generator = datagen.flow_from_directory('full_ae_output/classifier_classes',target_size=image_dims_original, batch_size=batch_size,shuffle=True, color_mode='grayscale')
classifier = Sequential()
classifier.add(Conv2D(8, (3, 3), padding='same', input_shape=image_input_dims))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
#2nd convolution layer
classifier.add(Conv2D(8, (3, 3), padding='same'))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
#3rd convolution layer
classifier.add(Conv2D(16, (3, 3), padding='same'))
classifier.add(Activation('relu'))
classifier.add(MaxPooling2D(pool_size=(2,2), padding='same'))
# Classifier
classifier.add(Flatten())
classifier.add(Dense(n_classes*2,activation='relu'))
#classifier.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
classifier.add(Dense(n_classes, activation='softmax'))
classifier.summary()
classifier.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['categorical_accuracy'])
Epoch 1/3 92/92 [==============================] - 108s 1s/step -
loss: 0.0638 - categorical_accuracy: 0.9853 Epoch 2/3 92/92
[==============================] - 107s 1s/step - loss: 0.0141 -
categorical_accuracy: 0.9969 Epoch 3/3 92/92
[==============================] - 108s 1s/step - loss: 0.0188 -
categorical_accuracy: 0.9938
input_class = 10
i = 0
image_path = glob.glob("full_ae_output/classifier_classes/class"+"{0:0=3d}".format(input_class)+"/*")[i]
input_img = np.array([np.array(Image.open(image_path).convert('L').resize(image_dims_original[::-1]))/255])
pred = classifier.predict(np.expand_dims(input_img,axis=3))
print("Predicted class = ",np.argmax(pred[0]))
I didn't recompute the actual accuracy but I suspect it to be lower than 50% since every sample I try I never get the right class.
Any ideas what might be bugging? Is the training accuracy computed by keras false?
Found it! It's just a matter of interpolation, datagenerator uses nearest by default, and opencv.resize uses bilinear interpolation. It's unbelievable how this difference messes up the classifier and changes the whole predictions. Fixed it and I get my 100% accuracy right now. Issue solved!
The training accuracy is definitely not false as computed by Keras.
Your intuition is good, indeed it is the preprocessing step on the test images that causes this problem.
I would recommend that you load one image and check how ImageDataGenerator works behind the curtains and check that when you use Pillow the exact preprocessing steps are applied.

Fine-Tuning DistilBertForSequenceClassification: Is not learning, why is loss not changing? Weights not updated?

I am relatively new to PyTorch and Huggingface-transformers and experimented with DistillBertForSequenceClassification on this Kaggle-Dataset.
from transformers import DistilBertForSequenceClassification
import torch.optim as optim
import torch.nn as nn
from transformers import get_linear_schedule_with_warmup
n_epochs = 5 # or whatever
batch_size = 32 # or whatever
bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
#bert_distil.classifier = nn.Sequential(nn.Linear(in_features=768, out_features=1), nn.Sigmoid())
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=0.1)
X_train = []
Y_train = []
for row in train_df.iterrows():
seq = tokenizer.encode(preprocess_text(row[1]['text']), add_special_tokens=True, pad_to_max_length=True)
X_train.append(torch.tensor(seq).unsqueeze(0))
Y_train.append(torch.tensor([row[1]['target']]).unsqueeze(0))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)
running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
permutation = torch.randperm(len(X_train))
j = 0
for i in range(0,len(X_train), batch_size):
optimizer.zero_grad()
indices = permutation[i:i+batch_size]
batch_x, batch_y = X_train[indices], Y_train[indices]
batch_x.cuda()
batch_y.cuda()
outputs = bert_distil.forward(batch_x.cuda())
loss = criterion(outputs[0],batch_y.squeeze().cuda())
loss.requires_grad = True
loss.backward()
optimizer.step()
running_loss += loss.item()
j+=1
if j == 20:
#print(outputs[0])
print('[%d, %5d] running loss: %.3f loss: %.3f ' %
(epoch + 1, i*1, running_loss / 20, loss.item()))
running_loss = 0.0
j = 0
[1, 608] running loss: 0.689 loss: 0.687
[1, 1248] running loss: 0.693 loss: 0.694
[1, 1888] running loss: 0.693 loss: 0.683
[1, 2528] running loss: 0.689 loss: 0.701
[1, 3168] running loss: 0.690 loss: 0.684
[1, 3808] running loss: 0.689 loss: 0.688
[1, 4448] running loss: 0.689 loss: 0.692 etc...
Regardless on what I tried, loss did never decrease, or even increase, nor did the prediction get better. It seems to me that I forgot something so that weights are actually not updated. Someone has an idea?
O
what I tried
Different loss functions
BCE
CrossEntropy
even MSE-loss
One-Hot Encoding vs A single neuron output
Different learning rates, and optimizers
I even changed all the targets to only one single label, but even then, the network did'nt converge.
Looking at running loss and minibatch loss is easily misleading. You should look at epoch loss, because the inputs are the same for every loss.
Besides, there are some problems in your code, fixing all of them and the behavior is as expected: the loss slowly decreases after each epoch, and it can also overfit to a small minibatch. Please look at the code, changes include: using model(x) instead of model.forward(x), cuda() only called once, smaller learning rate, etc.
Tuning and fine-tuning ML models are difficult work.
n_epochs = 5
batch_size = 1
bert_distil = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(bert_distil.parameters(), lr=1e-3)
X_train = []
Y_train = []
for row in train_df.iterrows():
seq = tokenizer.encode(row[1]['text'], add_special_tokens=True, pad_to_max_length=True)[:100]
X_train.append(torch.tensor(seq).unsqueeze(0))
Y_train.append(torch.tensor([row[1]['target']]))
X_train = torch.cat(X_train)
Y_train = torch.cat(Y_train)
running_loss = 0.0
bert_distil.cuda()
bert_distil.train(True)
for epoch in range(n_epochs):
permutation = torch.randperm(len(X_train))
for i in range(0,len(X_train), batch_size):
optimizer.zero_grad()
indices = permutation[i:i+batch_size]
batch_x, batch_y = X_train[indices].cuda(), Y_train[indices].cuda()
outputs = bert_distil(batch_x)
loss = criterion(outputs[0], batch_y)
loss.backward()
optimizer.step()
running_loss += loss.item()
print('[%d] epoch loss: %.3f' %
(epoch + 1, running_loss / len(X_train) * batch_size))
running_loss = 0.0
Output:
[1] epoch loss: 0.695
[2] epoch loss: 0.690
[3] epoch loss: 0.687
[4] epoch loss: 0.685
[5] epoch loss: 0.684
I would highlight two possible reasons for your "stable" results:
I agree that the learning rate is surely too high that prevents model from any significant updates.
But what is important to know is that based on the state-of-the-art papers finetuning has very marginal effect on the core NLP abilities of Transformers. For example, the paper says that finetuning only applies really small weight changes. Citing it: "Finetuning barely affects accuracy on NEL, COREF and REL indicating that those tasks are already sufficiently covered by pre-training". Several papers suggest that finetuning for classification tasks is basically waste of time. Thus, considering that DistilBert is actually a student model of BERT, maybe you won't get better results. Try pre-training with your data first. Generally, pre-training has a more significant impact.
I have got similar problem when I tried to use xxxForSequenceClassification to fine-tune my down-stream task.
At last, I changed xxxForSequenceClassification to xxxModel and added Dropout - FC - Softmax. Magically it's solved, loss decreased as expected.
I'm still trying to find out why.
Hope it may help you.
FYI, transformers verion: 3.5.0
Maybe the poor performance is due to gradients being applied to the BERT backbone. Validate it like so:
print([p.requires_grad for p in bert_distil.distilbert.parameters()])
As an alternative solution, try freezing the weights of your trained model:
for param in bert_distil.distilbert.parameters():
param.requires_grad = False
As you are trying to optimize the weights of a trained model during fine-tuning on your data, you face issues described, among other sources, in the ULMIfit (https://arxiv.org/abs/1801.06146) paper

Inconsistancies of loss function results with Keras

I am implementing a CNN coupled to a multiple instance learning layer. In brief, I've got this, with C the number of categories:
[1 batch of images, 1 label] > CNN > Custom final layer -> [1 vector of size C]
My final layer just sums up the previous layer for the moment. To be clear, 1 batch of inputs only gives 1 single ouput. The batch corresponds therefore to multiple instances fetched in 1 single bag associated to 1 label.
When I train my model and validate it with the same set:
history = model.fit_generator(
generator=training_generator,
steps_per_epoch=training_set.batch_count,
epochs=max_epoch,
validation_data=training_generator
validation_steps=training_set.batch_count)
I've got 2 different results betwen the training and the validation sets, in spite of being the same:
35/35 [==============================] - 30s 843ms/step - loss: 1.9647 - acc: 0.2857 - val_loss: 1.9403 - val_acc: 0.3714
The loss function is just the categorical cross entropy as implemented in Keras (I've got 3 categories). I have implemented my own loss function to get some insight about what happens. Unfortunately, I obtain another inconsistency between the regular loss and my custom loss function:
35/35 [==============================] - 30s 843ms/step - loss: 1.9647 - acc: 0.2857 - bag_loss: 1.1035 - val_loss: 1.9403 - val_acc: 0.3714 - val_bag_loss: 1.0874
My loss function:
def bag_loss(y_true, y_predicted):
y_true_mean = keras.backend.mean(y_true, axis=0, keepdims=False)
y_predicted_mean = keras.backend.mean(y_predicted, axis=0, keepdims=False)
loss = keras.losses.categorical_crossentropy(y_true_mean, y_predicted_mean)
return loss
The final layer of my model (I only shown the call part, for concision):
def call(self, x):
x = kb.sum(x, axis=0, keepdims=True)
x = kb.dot(x, self.kernel)
x = kb.bias_add(x, self.bias)
out = kb.sigmoid(x)
return out
After inspecting the code with TensorBoard and the TensorFlow Debugger, I have found out that, indeed, my beg loss and the regular loss return the same value at somme point. But then, Keras perform 6 supplemental additions on the regular sigmoid loss (1 for each layer in my model). Can someone help my to entangle this ball of surprising results? I expect the regular loss, the validation loss and my bag loss to be the same.
OK, I finally found the culprit: The L2 regularization, which was turned on, while I thought it was off. The regulirization term is obviously added to the cross-entropy to calculate the effective loss.
Most of the time, a good night of sleep and a careful inspection your code are more than enough to answer your question.

No effect of batch_size on number of iterations in model.fit in keras

I have a simple model for demonstration:
input_layer = Input(shape=(100,))
encoded = Dense(2, activation='relu')(input_layer)
X = np.ones((1000, 100))
Y = np.ones((1000, 2))
print(X.shape)
model = Model(input_layer, encoded)
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x=X, y=Y, batch_size = 2)
Output is:
2.2.4
(1000, 100)
Epoch 1/1
1000/1000 [==============================] - 3s 3ms/step - loss: 1.3864
Why there are 1000 iterations in one epoch(as shown in the output).
I tried changing this but does not changes the output. I guess it should have been 1000/2 = 500. Please explain what is wrong with my understanding and how can i set the batch size appropriately.
Thanks
In model.fit the numbers in the left part of the progress bar count samples, so it is always the current samples / total number of samples.
Maybe you are confused because it works different in model.fit_generator. There you actually see iterations or batches being counted.
It changes the batch size, the bar progresses faster although you do not explicitly see it as a step. I had the same question in my mind some time ago.
If you want to explicitly see each step, you can use steps_per_epoch and validation_steps.
An example is listed below.
model.fit_generator(training_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_steps)
In this case, steps_per_epoch = number_of_training_samples / batch_size, while validation_steps = number_of_training_samples / batch_size.
During the training, you will see 500 steps instead of 1000 (provided that you have 1000 training samples and your batch_size is 2).

Resources