Keras : Dealing with large image datasets - keras

I am trying to fit a model using a large image datasets. I have a memory RAM of 14 GB, and the dataset have the size of 40 GB. I tried to use fit_generator, but I end up with a method that does not delete the loaded batchs after using theme.
If there is anyway to sole the problem or resources, thanks to point me to it.
Thanks.
The generator code is :
class Data_Generator(Sequence):
def __init__(self, image_filenames, labels, batch_size):
self.image_filenames, self.labels = image_filenames, labels
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.image_filenames) / float(self.batch_size)))
def __format_labels__(self, gd_truth):
cols=gd_truth.columns
y=[]
for col in cols:
y.append(gd_truth[col].values)
return y
def __getitem__(self, idx):
batch_x = self.image_filenames[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.labels[idx * self.batch_size:(idx + 1) * self.batch_size]
gd_truth=pd.DataFrame(data=batch_y,columns=self.labels.columns)
#gd_truth=batch_y
return np.array([read_image(file_name) for file_name in batch_x]),self.__format_labels__(gd_truth) #np.array(batch_y)
Then I have created two generators for train and validation images:
training_batch_generator = Data_Generator(training_filenames, trainTargets, batch_size)
mvalidation_batch_generator = Data_Generator(validation_filenames, valTargets, batch_size)
The fit_generator call is as follow :
num_epochs=10
model.fit_generator(generator=my_training_batch_generator,
steps_per_epoch=(num_training_samples // batch_size),
epochs=num_epochs,
verbose=1,
validation_data=my_validation_batch_generator,
validation_steps=(num_validation_samples // batch_size),
max_queue_size=16)

Related

How albumentations work with keras Sequence

I have read this tutorial for using albumentations with keras sequence. The code is as follows :
`
from tensorflow.python.keras.utils.data_utils import Sequence
class CIFAR10Sequence(Sequence):
def __init__(self, x_set, y_set, batch_size, augmentations):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
self.augment = augmentations
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
return np.stack([
self.augment(image=x)["image"] for x in batch_x
], axis=0), np.array(batch_y)
`
The thing is I don't understand how it is augmenting ( i.e. providing more samples ) the data. The way I see it, it is just transforming the samples in the dataset, and not generating newer ones.
Following the tutorial you provided you may see that the author defines AUGMENTATIONS_TRAIN and AUGMENTATIONS_TEST objects which perform the actual augmentation.
Then these objects are passed to the sequence generator above:
train_gen = CIFAR10Sequence(x_train, y_train, hparams.train_batch_size, augmentations=AUGMENTATIONS_TRAIN)
so that calling self.augment actually augments every image in the batch:
self.augment(image=x)["image"] for x in batch_x
And yes, augmentation doesn't mean creating new objects but applying random transformation to existing ones to create 'artifical' objects which are somewhat different from the originals.

keras Data Augmentation

I know that ImageDataGenerator generates for each input image one image randomly augmented . Now, I would like to generate for each input image two augmented images :
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
train_ds = datagen.flow_from_directory('/home/train/')
To explain more, I would like to apply 2 distinct augmentation functions on the same image, i.e, if we sample 5 images, we end up with 2 × 5 = 10 augmented observations in the batch
So how I can proceed please ?
I would recommend creating a custom data generator that inherits from tf.keras.utils.Sequence. There are a number of ways to go about this, but this should be along the lines of what you are looking for:
class double_aug_generator(tf.keras.utils.Sequence):
def __init__(self, x, y, batch_size, aug_params1, aug_params2):
self.x, self.y = x, y
self.batch_size = batch_size
self.datagen = tf.keras.preprocessing.image.ImageDataGenerator(**aug_params1)
// dictionary of parameters for the second augmentation
self.aug_params2 = aug_params2
def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
def load(self, file_names):
// load and return raw images however you like
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) *
self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) *
self.batch_size]
// load images
batch_x = self.load(batch_x)
// apply first augmentation
batch_x = self.datagen.flow(batch_x)
// apply second
batch_x = self.datagen.apply_transform(batch_x, self.aug_params2)
return batch_x, np.array(batch_y)

Errors and difficulties with: Keras generators with h5 data: Cancelled operations

I'm experiencing many errors/problems with Keras generators and multi-processing.
I have used:
history = model.fit(training_generator,
steps_per_epoch=trainingSetSize // batch_size,
epochs=epochs,
verbose=1,
validation_data=validationGenerator,
validation_steps= validationSetSize // batch_size,
callbacks=callbacks,
use_multiprocessing=True,
workers=nb_workers,
max_queue_size=2*nb_workers)
to launch the training.
My generator spits batches of (batch_size,64,64,2) tensors. One problem is that I notice the following warnings/error messages:
Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
even though I added the steps_per_epoch = X_size // batch_size,
Also by adding a print() inside the generator I notice that at the end of an epoch, it generates an "empty" tensor of shape: (0,64,64,2)...
Any ideas, any proposals/comments/answers?
this is the generator's code:
class custom_gen(Sequence):
def __init__(self, fn, datasetSize, batch_size, trainingsetName,trainingsetTargetsName):
self.fn=fn
self.datasetSize= datasetSize
self.batch_size = batch_size
self.trainingsetTargetsName = trainingsetTargetsName
self.trainingsetName = trainingsetName
self.lock = threading.Lock()
#compulsory method: total number of batches that the generator must produce
def __len__(self):
return self.datasetSize // self.batch_size
#idx arg is managed by Python/Keras by themselves!?
def __getitem__(self, idx):
with self.lock:
f=h5py.File(self.fn,'r')
X=f[self.trainingsetName][idx * self.batch_size:(idx + 1) * self.batch_size]
Y=f[self.trainingsetTargetsName][idx * self.batch_size:(idx + 1) * self.batch_size]
print('Generated X.shape='+str(X.shape))
print('Generated Y.shape='+str(Y.shape))
return X,Y

Order of rotated images by using a custom generator

I use a custom image data generator for my project. It receives batches of images and returns [0, 90, 180 and 270] degrees rotated versions of the images with the corresponding class indices {0:0, 1:90, 2:180, 3:270}. Lets assume we have images A, B and C in a batch and images A to Z in the whole data set. All the images are naturally in 0 degree orientation. Initially I returned all the rotated images at the same time. Here is a sample of returned batch: [A0,B0,C0,A1,B1,C1,...,A3,B3,C3]. But this gave me useless results. To compare my approach I trained the same model by using my generator and built in Keras ImageDataGenerator with flow_from_directory. For the built in function I manually rotated original images and stored them in separate folders. Here are the accuracy plots for comparison:
I used only a few images just to see if there is any difference. From the plots it is obvious that the custom generator is not correct. Hence I think it must return the images as [[A0,B0,C0],[D0,E0,F0]...[...,Z0]], then [[A1,B1,C1],[D1,E1,F1]...[...,Z1]] and so on. To do this I must use the folowing function for multiple times (in my case 4).
def next(self):
with self.lock:
# get input data index and size of the current batch
index_array = next(self.index_generator)
# create array to hold the images
return self._get_batches_of_transformed_samples(index_array)
This function iterates through the directory and returns batches of images. When it reaches to the last image it finishes and the next epoch starts. In my case, in one epoch I want to run this for 4 times by sending the rotation angle as an argument like this: self._get_batches_of_transformed_samples(index_array) , rotation_angle). I was wondering if this is possible or not? If not what could be the solution? Here is the current data generator code:
def _get_batches_of_transformed_samples(self, index_array):
# create list to hold the images and labels
batch_x = []
batch_y = []
# create angle categories corresponding to number of rotation angles
angle_categories = list(range(0, len(self.target_angles)))
# generate rotated images and corresponding labels
for rotation_angle, angle_indice in zip(self.target_angles, angle_categories):
for i, j in enumerate(index_array):
if self.filenames is None:
image = self.images[j]
if len(image.shape) == 2: image = cv2.cvtColor(image,cv2.COLOR_GRAY2RGB)
else:
is_color = int(self.color_mode == 'rgb')
image = cv2.imread(self.filenames[j], is_color)
if is_color:
if not image is None:
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# do nothing if the image is none
if not image is None:
rotated_im = rotate(image, rotation_angle, self.target_size[:2])
if self.preprocess_func: rotated_im = self.preprocess_func(rotated_im)
# add dimension to account for the channels if the image is greyscale
if rotated_im.ndim == 2: rotated_im = np.expand_dims(rotated_im, axis=2)
batch_x.append(rotated_im)
batch_y.append(angle_indice)
# convert lists to numpy arrays
batch_x = np.asarray(batch_x)
batch_y = np.asarray(batch_y)
batch_y = to_categorical(batch_y, len(self.target_angles))
return batch_x, batch_y
def next(self):
with self.lock:
# get input data index and size of the current batch
index_array = next(self.index_generator)
# create array to hold the images
return self._get_batches_of_transformed_samples(index_array)
Hmm I would probably do this through keras.utils.Sequence
from keras.utils import Sequence
import numpy as np
class RotationSequence(Sequence):
def __init__(self, x_set, y_set, batch_size, rotations=(0,90,180,270)):
self.rotations = rotations
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
x, y = [], []
for rot in self.rotations:
x += [rotate(cv2.imread(file_name), rotation_angle) for file_name in batch_x]
y += batch_y
return np.array(x), np.array(y)
def on_epoch_end(self):
shuffle_idx = np.random.permutation(len(self.x))
self.x, self.y = self.x[shuffle_idx], self.y[shuffle_idx]
And then just pass the batcher to model.fit()
rotation_batcher = RotationSequence(...)
model.fit_generator(rotation_batcher,
steps_per_epoch=len(rotation_batcher),
validation_data=validation_batcher,
epochs=epochs)
This allows you to have more control over the batches being fed into your model. This implementation will almost run. You just need to implement the rotate() function in __getitem__. Also, the batch_size will be 4 times the set size because I just duplicated and rotated each batch. Hope this is helpful to you

Keras Custom Layer Error (Operation IsVariableInitialized has been marked as not fetchable)

I'm trying to create a custom Keras layer on a toy dataset, and am having issues. At a high level, I want to create an "Input Gate" layer, which would have trainable weights to turn each column of input on or off. So I'm starting with just trying to multiply the inputs by a sigmoid'd version of the learned weights. My code is as follows:
### This is my custom layer
class InputGate(Layer):
def __init__(self, **kwargs):
super(InputGate, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='input_gate',
shape=input_shape[1:],
initializer='random_uniform',
trainable=True)
super(InputGate, self).build(input_shape) # Be sure to call this somewhere!
def call(self, inputs):
gate_amount = K.sigmoid(self.kernel)
return inputs * gate_amount
def get_config(self):
config = {}
base_config = super(InputGate, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def compute_output_shape(self, input_shape):
return input_shape
def create_linear_model(x, y, num_noise_vars = 0, reg_strength=0):
new_x = get_x_with_noise(x, num_noise_vars=num_noise_vars)
model = Sequential([
InputGate(input_shape=(1+num_noise_vars,)),
Dense(1, kernel_regularizer=l2(reg_strength))
])
model.compile(optimizer="rmsprop", loss="mse")
model.optimizer.lr = 0.001
return {"model": model, "new_x": new_x}
def get_x_with_noise(x, num_noise_vars):
noise_vars = []
for noise_var in range(num_noise_vars):
noise_vars.append(np.random.random(len(x)))
noise_vars.append(x)
x_with_noise = noise_vars
new_x = np.array(list(zip(*x_with_noise)))
return new_x
x = np.random.random(500)
y = (x * 3) + 10
num_noise_vars = 5
info = create_linear_model(x, y, num_noise_vars=num_noise_vars)
model = info["model"]
new_x = info["new_x"]
results = model.fit(new_x, y, epochs=num_epochs, verbose=0)
And then I get the following error:
ValueError: Operation 'input_gate_14/IsVariableInitialized' has been marked as not fetchable.
This layer is mostly taken from the docs(https://keras.io/layers/writing-your-own-keras-layers/). I'm using Keras 2.0.9, with Tensorflow backend on a CPU (Macbook Air).
This layer seems as simple as can be, and googling the error leads me to discussions that don't seem relevant. Anyone have ideas of what's causing this?
Any help is much appreciated! Thanks!

Resources