How to change the batch size during training? - keras

During training, at each epoch, I'd like to change the batch size (for experimental purpose).
Creating a custom Callback seems appropriate but batch_size isn't a member of the Model class.
The only way I see would be to override fit_loop and expose batch_size to the callback at each loop. Is there a cleaner or faster way to do it without using a callback ?

For others who land here, I found the easiest way to do batch size adjustment in Keras is just to call fit more than once (with different batch sizes):
model.fit(X_train, y_train, batch_size=32, epochs=20)
# ...continue training with a larger batch size
model.fit(X_train, y_train, batch_size=512, epochs=10)

I think it will be better to use a custom data generator to have control over the data you pass to the training loop, so you can generate batches of different sizes, process data on the fly etc. Here is an outline:
def data_gen(data):
while True: # generator yields forever
# process data into batch, it could be any size
# it's your responsibility to construct a batch
yield x,y # here x and y are a single batch
Now you can train with model.fit_generator(data_gen(data), steps_per_epoch=100) which will yield 100 batches per epoch. You can also use a Sequence if you want to encapsulate this inside a class.

For most purposes the accepted answer is the best, don't change the batch size. There's probably a better way 99% of the time that this question comes up.
For those 1%-ers who do have an exceptional case where changing the batch size mid-network is appropriate there's a git discussion that addresses this well:
https://github.com/keras-team/keras/issues/4807
To summarize it: Keras doesn't want you to change the batch size, so you need to cheat and add a dimension and tell keras it's working with a batch_size of 1. For example, your batch of 10 cifar10 images was sized [10, 32, 32, 3], now it becomes [1, 10, 32, 32, 3]. You'll need to reshape this throughout the network appropriately. Use tf.expand_dims and tf.squeeze to add and remove a dimension trivially.

Related

How to sample along with another dataloader in PyTorch

Assume I have train/valid/test dataset with batch_size and shuffleed as normal.
When I do train/valid/test, I want to sample a certain number (called memory_size) of new samples from the entire dataset for each sample.
For example, I set batch_size as 256, let dataset shuffled, and memory_size as 80.
In every forward step, not only use each sample from dataset, but sample data from entire original dataset which size is memory_size and I want to use it inside forward. Let new samples as Memory (Yeah, I want to adopt idea from Memory Networks). Memory can be overlapped between each sample in train set.
I'm using PyTorch and PyTorch-Lightning. Can I create new memory dataloader per each train_dataloader, val_dataloader, and test_dataloader then load it with original dataloader? or is there a better way to achieve what I want?

using flow_from_directory for training and validation, without augmentation

I am training a simple CNN with Nt=148 + Nv=37 images for training and validation respectively. I used the ImageGenerator.flow_from_directory() method because I plan to use data augmentation in the future, but for the time being I don't want any data augmentation. I just want to read the images from disk one by one (and each exactly once, this is primarily important for the validation) to avoid loading all of them in memory.
But the following makes me think that something different than expected is happening:
the training and validation accuracy achieve values which do not resemble a fraction with 148 or 37 as the denominator. Actually trying to estimate a reasonable denominator from a submultiple of the deltas, leads to numbers much bigger than 148 (about 534 or 551, see below (*) why I think they should be multiples of 19) and of 37
verifying all predictions on both the training and and validation datasets (with a separate program, which reads the validation directory only once and doesn't use the above generators), shows a number of fails which is not exactly (1-val_acc)*Nv as I would expect
(*) Lastly I found that the batch size I used for both is 19, so I expect that I am providing 19*7=133 or 19*8=152 training images per epoch and 19 or 38 images as the validation set at each epoch end.
By the way: is it possible to use the model.fit_generator() with generators built from the ImageGenerator.flow_from_directory() to achieve:
- no data augmentation
- both generators should respectively supply all images to the training process and to the validation process exactly once per epoch
- shuffling is fine, and actually desired, so that each epoch runs different
Meanwhile I am orienting myself to set the batch size equal to the validation set length (i.e. 37). Being it a divider of the training set numerosity, I think it should work out the numbers.
But still I am unsure if the following code is achieving the requirement "no data augmentation at all"
valid_augmenter = ImageDataGenerator(rescale=1./255)
val_batch_size = 37
train_generator = train_augmenter.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=val_batch_size,
class_mode='binary',
color_mode='grayscale',
follow_links=True )
validation_generator = valid_augmenter.flow_from_directory(
validation_data_dir,
target_size=(img_height,img_width),
batch_size=val_batch_size,
class_mode='binary',
color_mode='grayscale',
follow_links=True )
Some issues in your situation.
First of all, that amount of images is quite low. Scrape a lot more images and use augmentation.
Second, I have seen typical fractions are:
from the total data:
80% for train
20% for validation.
Put the images you select in folders with that proportion.
Third, you can check if your code generates data if you put this line in your flow_from_directory call, after the last line (and put a comma after that last line):
save_to_dir='folder_to_see_augmented_images'
Then run the model (compile, and then fit) and check the contents of the save_to_dir folder.

Normalization of input data in Keras

One common task in DL is that you normalize input samples to zero mean and unit variance. One can "manually" perform the normalization using code like this:
mean = np.mean(X, axis = 0)
std = np.std(X, axis = 0)
X = [(x - mean)/std for x in X]
However, then one must keep the mean and std values around, to normalize the testing data, in addition to the Keras model being trained. Since the mean and std are learnable parameters, perhaps Keras can learn them? Something like this:
m = Sequential()
m.add(SomeKerasLayzerForNormalizing(...))
m.add(Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid'))
... rest of network
m.add(Dense(1, activation = 'sigmoid'))
I hope you understand what I'm getting at.
Add BatchNormalization as the first layer and it works as expected, though not exactly like the OP's example. You can see the detailed explanation here.
Both the OP's example and batch normalization use a learned mean and standard deviation of the input data during inference. But the OP's example uses a simple mean that gives every training sample equal weight, while the BatchNormalization layer uses a moving average that gives recently-seen samples more weight than older samples.
Importantly, batch normalization works differently from the OP's example during training. During training, the layer normalizes its output using the mean and standard deviation of the current batch of inputs.
A second distinction is that the OP's code produces an output with a mean of zero and a standard deviation of one. Batch Normalization instead learns a mean and standard deviation for the output that improves the entire network's loss. To get the behavior of the OP's example, Batch Normalization should be initialized with the parameters scale=False and center=False.
There's now a Keras layer for this purpose, Normalization. At time of writing it is in the experimental module, keras.layers.experimental.preprocessing.
https://keras.io/api/layers/preprocessing_layers/core_preprocessing_layers/normalization/
Before you use it, you call the layer's adapt method with the data X you want to derive the scale from (i.e. mean and standard deviation). Once you do this, the scale is fixed (it does not change during training). The scale is then applied to the inputs whenever the model is used (during training and prediction).
from keras.layers.experimental.preprocessing import Normalization
norm_layer = Normalization()
norm_layer.adapt(X)
model = keras.Sequential()
model.add(norm_layer)
# ... Continue as usual.
Maybe you can use sklearn.preprocessing.StandardScaler to scale you data,
This object allow you to save the scaling parameters in an object,
Then you can use Mixin types inputs into you model, lets say:
Your_model
[param1_scaler, param2_scaler]
Here is a link https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
https://keras.io/getting-started/functional-api-guide/
There's BatchNormalization, which learns mean and standard deviation of the input. I haven't tried using it as the first layer of the network, but as I understand it, it should do something very similar to what you're looking for.

value of steps per epoch passed to keras fit generator function

What is the need for setting steps_per_epoch value when calling the function fit_generator() when ideally it should be number of total samples/ batch size?
Keras' generators are infinite.
Because of this, Keras cannot know by itself how many batches the generators should yield to complete one epoch.
When you have a static number of samples, it makes perfect sense to use samples//batch_size for one epoch. But you may want to use a generator that performs random data augmentation for instance. And because of the random process, you will never have two identical training epochs. There isn't then a clear limit.
So, these parameters in fit_generator allow you to control the yields per epoch as you wish, although in standard cases you'll probably keep to the most obvious option: samples//batch_size.
Without data augmentation, the number of samples is static as Daniel mentioned.
Then, the number of samples for training is steps_per_epoch * batch size.
By using ImageDataGenerator in Keras, we make additional training data for data augmentation. Therefore, the number of samples for training can be set by yourself.
If you want two times training data, just set steps_per_epoch as (original sample size *2)/batch_size.

How to set batch size and epoch value in Keras for infinite data set?

I want to feed images to a Keras CNN. The program randomly feeds either an image downloaded from the net, or an image of random pixel values. How do I set batch size and epoch number? My training data is essentially infinite.
Even if your dataset is infinite, you have to set both batch size and number of epochs.
For batch size, you can use the largest batch size that fits into your GPU/CPU RAM, by just trial and error. For example you can try power of two batch sizes like 32, 64, 128, 256.
For number of epochs, this is a parameter that always has to be tuned for the specific problem. You can use a validation set to then train until the validation loss is maximized, or the training loss is almost constant (it converges). Make sure to use a different part of the dataset to decide when to stop training. Then you can report final metrics on another different set (the test set).
It is because implementations are vectorised for faster & efficient execution. When the data is large, all the data cannot fit the memory & hence we use batch size to still get some vectorisation.
In my opinion, one should use a batch size as large as your machine can handle.

Resources