keras training on big datasets seperately keras - keras

I am working on a keras denoising neural network that denoise high Dimension x-ray images. The idea is to train on some datasets eg.1,2,3 and after having the weights, another datasets eg.4,5,6 will start with a new training with weights initialized from the previous training. Implementation-wise it works, however the weights resulted from the last rotation perform better only on the datasets that were used to train on in this rotation. Same goes for other rotation.
In other words, weights resutlted from training on dataset: 4,5,6 doesn't give the good results on an image of dataset 1 as intended as the weights that were trained on datasets: 1,2,3. which shouldn't be what I intend to do
The idea is that weights should be tweaked to work with all datasets effectively, as training on the whole dataset doesn't fit into memory.
I tried other solutions such as creating custom generator that takes images from disk and do the training as batches which is very slow as it depends on factors like I/O operations happening on disk or the time complexity of processing functions happening inside the custom keras generator!
Below is a code that shows what I am doing. I have 12 datasets, seperated into 4 checkpoints. data is loaded and training goes and saves final model to an array and next training takes the weights from the previous rotation and continues.
EPOCHES = 150
NUM_CHKPTS = 4
weights = []
for chk in range(1,NUM_CHKPTS+1):
log_dir = os.path.join(os.getcwd(), 'resnet_checkpts_' + str(EPOCHES) + "_tl2_chkpt" + str(chk))
if not os.path.isdir(log_dir):
os.makedirs(log_dir)
else:
print('Training log directory already exists # {}.'.format(log_dir))
tb_output = TensorBoard(log_dir=log_dir, histogram_freq=1)
print("Loading Data From CHKPT #" + str(chk))
h5f = h5py.File('C:\\autoencoder\\datasets\\mix\\chk' + str(chk) + '.h5','r')
org_patch = h5f['train_data'][:]
noisy_patch = h5f['train_noisy'][:]
h5f.close()
input_patch, test_patch, noisy_patch, test_noisy_patch = train_test_split(org_patch, noisy_patch, train_size=0.8, shuffle=True)
print("Reshaping")
train_data = np.array([np.reshape(input_patch[i], (52, 52, 1)) for i in range(input_patch.shape[0])], dtype = np.float32)
train_noisy_data = np.array([np.reshape(noisy_patch[i], (52, 52, 1)) for i in range(noisy_patch.shape[0])], dtype = np.float32)
test_data = np.array([np.reshape(test_patch[i], (52, 52, 1)) for i in range(test_patch.shape[0])], dtype = np.float32)
test_noisy_data = np.array([np.reshape(test_noisy_patch[i], (52, 52, 1)) for i in range(test_noisy_patch.shape[0])], dtype = np.float32)
print('Number of training samples are:', train_data.shape[0])
print('Number of test samples are:', test_data.shape[0])
# IN = np.ones((len(XTRAINFILES), 52, 52, 1 ))
if chk == 1:
print("Generating the Model For The First Time..")
autoencoder_model = model_autoencoder(train_noisy_data)
print("Done!")
else:
autoencoder_model=load_model(weights[chk-2])
checkpt_path = log_dir + r"\\cp-{epoch:04d}.ckpt"
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpt_path, verbose=0, save_weights_only=True, save_freq='epoch')
optimizer = tf.keras.optimizers.Adam(lr=0.0001)
autoencoder_model.compile(loss='mse',optimizer=optimizer)
autoencoder_model.fit(train_noisy_data, train_data,
batch_size=128,
epochs=EPOCHES, shuffle=True, verbose=1,
validation_data=(test_noisy_data, test_data),
callbacks=[tb_output, checkpoint_callback])
weight_dir = log_dir+'\\model_resnet_new_OL' + str(EPOCHES) + 'epochs.h5'
weights.append(weight_dir)
autoencoder_model.save(weight_dir) # Defined saved model name by number of epochs.
Tensorboard Graphs, Rotations are 1,2,3,4 from up down :

Your model will forget previous dataset as you train on new dataset.
I read in reinforcement learning, when game are used to train Deep Reinforcement Learning (DRL), then you have to create memory replay, which collect data from different rounds of game, because each round of game has different data, then randomly some of that data is chosen to train model. that way DRL model can learn to play different rounds of game without forgetting previous rounds.
You can try to create a single dataset by taking some random samples from each dataset.
When you train model on new dataset that make sure data from all previous rotation are in current rotation.
Also in transfer learning, when you train model on new dataset, you have to freeze previous layers so that model don`t forget previous training. you are not using transfer learning but still when you start training on 2nd dataset your 1st dataset will slowly be removed from memory of weights.
you can try freezing initial layers of decoder so that they are not updated when extracting feature, assuming all of the dataset contain similar images, that way your model will not forget previous training as in transfer learning. but still when you train on new dataset previous will be forgotten.

Related

What is the point of data augmentation?

The code below is based on François Collet. He uses it to show that when the training set is small (2000 images), data augmentation improves the classification power in the validation set (which is true!).
My questions are:
If the model.fit_generator method uses steps_per_epoch = 2000 // batch_size. Are we using 2000 images per epoch?
If yes. What is the point of data augmentation if I use an augmented sample size equal in size to the original one?
batch_size = 32
# Train data augmentation
train_datagen = ImageDataGenerator(
rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size = (150, 150),
batch_size = batch_size,
class_mode = 'binary')
# Train data generation
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size = (150, 150),
batch_size = 20,
class_mode = 'binary')
# training and validation
history = model.fit_generator(train_generator,
steps_per_epoch = 2000 // batch_size,
epochs = 100,
validation_data = validation_generator,
validation_steps = 500 // batch_size)
The code that you posted is quite dated, but serves the purpose for the explanation.
Admittedly we have 2000 images. We use them all, but the number of steps that are performed in that epoch is 2000//batch_size, since you update the weights of the network after a batch of size batch_size. Therefore you perform 2000//batch_size steps.
At the same time, think about augmentation as enrichment at run-time. When we use augmentation you do not create new examples which are physically stored on your drive, but when you load the batch into the memory. This means that, out of the batch that contains batch_size elements, some of them are modified(augmented), and fed into your network. Each augmentation has a probability associated, that is there is N % probability (you can set it even manually if you want) that your image is subjected to that specific augmentation.
But this means that as the training progresses, as the number of epochs increase, your network gets to see many more images than the initial size of 2000.
In the snippet that you have provided, steps_per_epoch = 2000 // batch_size essentially means that the model will see the whole 2000 images during an epoch but out of those 2000 images many of them will be replaced with their augmented counterparts based on either a specific probability that you can provide or by randomly choosing the images.
For eg. Consider you are building a Dog-vs-Cats classifier and the dataset is made up of images of right-facing dogs and left-facing cats only. In this case, if you don't apply augmentation (horizontal_flipping) then the model might learn that all the left facing animals are cats which will lead to incorrect results when given an image of a left-facing dog.
Augmentation here (specifically horizontal_flipping) will randomly flip the images of cats and dogs enabling the model to reach a better solution and hence make it more robust!
Augmentation Happens in-place no new images are generated.

Anomalies have similar error values to normal data

I have inertial measurement unit (IMU) data for which I am building an anomaly detection autoencoder neural net. I have about 5k training samples of which I am using 10% for validation. I also have about 50 (though I can make more) samples to test anomaly detection. My dataset has 12 IMU features. I train for about 10,000 epochs and I attain mean squared errors for reconstruction (MSE) of about 0.004 during training. After training, I perform an MSE calculation on the test data and I get values very similar to those in the train data (0.003) and I do not know why!
I am making my test set by slicing 50 samples from the overall data (not part of X_train) and changing one of the features to all zeros. I have also tried adding noise to one of the features as well as making multiple features zero.
np.random.seed(404)
np.random.shuffle(all_imu_data)
norm_imu_data = all_imu_data[:len_slice]
anom_imu_data = all_imu_data[len_slice:]
anom_imu_data[:,6] = 0
scaler = MinMaxScaler()
norm_data = scaler.fit_transform(norm_imu_data)
anom_data = scaler.transform(anom_imu_data)
X_train = pd.DataFrame(norm_data)
X_test = pd.DataFrame(anom_data)
I have tried many different network sizes by ranging number of hidden layers and number of hidden nodes/layer. As an example, I show a topology like [12-7-4-7-12]:
input_dim = num_features
input_layer = Input(shape=(input_dim, ))
encoder = Dense(int(7), activation="tanh", activity_regularizer=regularizers.l1(10e-5))(input_layer)
encoder = Dense(int(4), activation="tanh")(encoder)
decoder = Dense(int(7), activation="tanh")(encoder)
decoder = Dense(int(input_dim), activation="tanh")(decoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)
autoencoder.compile(optimizer='adam', loss='mse', metrics=['mse'])
history = autoencoder.fit(X_train, X_train,
epochs=nb_epoch,
batch_size=batch_size,
shuffle=True,
validation_split=0.1,
verbose=1,
callbacks=[checkpointer, tensorboard]).history
pred_train = autoencoder.predict(X_train)
pred_test = autoencoder.predict(X_test)
mse_train = np.mean(np.power(X_train - pred_train, 2), axis=1)
mse_test = np.mean(np.power(X_test - pred_test, 2), axis=1)
print('MSE mean() - X_train:', np.mean(mse_train))
print('MSE mean() - X_test:', np.mean(mse_test))
After doing this, I get MSE mean numbers of 0.004 for Train and 0.003 for Test. Therefore, I cannot select a good threshold for anomalous data, as there are a lot of normal points that have larger MSE scores than the 'anomalous' data.
Any thoughts as to why this network is unable to detect these anomalies?
It is completely normal. You train your autoencoder on a sub sample of your whole data. Therefore, there are also anomalies contaminating your training data. The purpose of the autoencoder is to find a perfect reconstruction of your original data which it does including the anomalies. It is a very powerful tool, so if you show it anomalies in the training data, it will reconstruct them easily.
You need to remove 5% of your anomalous data with another anomaly detection algorithm (for example isolation forest) and do the subsampling on that part of the data (without outliers).
After that, you can find your outliers easily.

Nan loss in keras with triplet loss

I'm trying to learn an embedding for Paris6k images combining VGG and Adrian Ung triplet loss. The problem is that after a small amount of iterations, in the first epoch, the loss becomes nan, and then the accuracy and validation accuracy grow to 1.
I've already tried lowering the learning rate, increasing the batch size (only to 16 beacuse of memory), changing optimizer (Adam and RMSprop), checking if there are None values on my dataset, changing data format from 'float32' to 'float64', adding a little bias to them and simplify the model.
Here is my code:
base_model = VGG16(include_top = False, input_shape = (512, 384, 3))
input_images = base_model.input
input_labels = Input(shape=(1,), name='input_label')
embeddings = Flatten()(base_model.output)
labels_plus_embeddings = concatenate([input_labels, embeddings])
model = Model(inputs=[input_images, input_labels], outputs=labels_plus_embeddings)
batch_size = 16
epochs = 2
embedding_size = 64
opt = Adam(lr=0.0001)
model.compile(loss=tl.triplet_loss_adapted_from_tf, optimizer=opt, metrics=['accuracy'])
label_list = np.vstack(label_list)
x_train = image_list[:2500]
x_val = image_list[2500:]
y_train = label_list[:2500]
y_val = label_list[2500:]
dummy_gt_train = np.zeros((len(x_train), embedding_size + 1))
dummy_gt_val = np.zeros((len(x_val), embedding_size + 1))
H = model.fit(
x=[x_train,y_train],
y=dummy_gt_train,
batch_size=batch_size,
epochs=epochs,
validation_data=([x_val, y_val], dummy_gt_val),callbacks=callbacks_list)
The images are 3366 with values scaled in range [0, 1].
The network takes dummy values because it tries to learn embeddings from images in a way that images of the same class should have small distance, while images of different classes should have high distances and than the real class is part of the training.
I've noticed that I was previously making an incorrect class division (and keeping images that should be discarded), and I didn't have the nan loss problem.
What should I try to do?
Thanks in advance and sorry for my english.
In some case, the random NaN loss can be caused by your data, because if there are no positive pairs in your batch, you will get a NaN loss.
As you can see in Adrian Ung's notebook (or in tensorflow addons triplet loss; it's the same code) :
semi_hard_triplet_loss_distance = math_ops.truediv(
math_ops.reduce_sum(
math_ops.maximum(
math_ops.multiply(loss_mat, mask_positives), 0.0)),
num_positives,
name='triplet_semihard_loss')
There is a division by the number of positives pairs (num_positives), which can lead to NaN.
I suggest you try to inspect your data pipeline in order to ensure there is at least one positive pair in each of your batches. (You can for example adapt some of the code in the triplet_loss_adapted_from_tf to get the num_positives of your batch, and check if it is greater than 0).
Try increasing your batch size. It happened to me also. As mentioned in the previous answer, network is unable to find any num_positives. I had 250 classes and was getting nan loss initially. I increased it to 128/256 and then there was no issue.
I saw that Paris6k has 15 classes or 12 classes. Increase your batch size 32 and if the GPU memory occurs you can try with model with less parameters. You can work on Efficient B0 model for starting. It has 5.3M compared to VGG16 which has 138M parameters.
I have implemented a package for triplet generation so that every batch is guaranteed to include postive pairs. It is compatible with TF/Keras only.
https://github.com/ma7555/kerasgen (Disclaimer: I am the owner)

how can i improve number predictions?

I've got some number classification model, on test data it works OK, but when I want to classifier other images, I faced with problems that my model can't exactly predict what number is it. Pls, help me improve the model.predict() performance.
I've tried to train my model in many ways, in the code below there is a function that creates classification model, I trained this model actually many ways, [1K < n < 60K] of input test data, [3 < e < 50] of trained iterations.
def load_data():
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_images = tf.keras.utils.normalize(train_images, axis = 1)
test_images = tf.keras.utils.normalize(test_images, axis = 1)
return (train_images, train_labels), (test_images, test_labels)
def create_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation = tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation = tf.nn.softmax))
data = load_data(n=60000, k=5)
model.compile(optimizer ='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data[0][0][:n], data[0][1][:n], epochs = e)# ive tried from 3-50 epochs
model.save(config.model_name)
def load_model():
return tf.keras.models.load_model(config.model_name)def predict(images):
try:
model = load_model()
except:
create_model()
model = load_model()
images = tf.keras.utils.normalize(images, axis = 0)
d = load_data()
plot_many_images([d[0][0][0].reshape((28,28)), images[0]],['data', 'image'])
predictions = model.predict(images)
return predictions
I think that my input data isn't looking like the data is predicting model, but I've tried to make it as similar as I can. On this pic(https://imgur.com/FfLGMEK) on the LEFT is train data image, and on RIGHT is my parsed image, they are both 28x28 pix, both a cv2.noramalized
for the test image predictions I've used this(https://imgur.com/RMfKtag) sudoku, it's already formatted to be similar with a test data numbers, but when I test this image with the model prediction the result is not so nice(https://imgur.com/RQFvLNE)
As you can see predicted data leaves much to be desired.
P.S. the (' ') items in predicted data result made by my hands(I've replaced numbers at that positions by ' '), cos after predictions they all have some value(1-9), its not necessary now.
what do you mean "on test data it works OK"? if you mean its works good for train data but do not has a good prediction on test data, maybe your model was over-fit in training phase. i suggest to use train/validation/test approach to train your network.

Is image needed to rescale before predicting with model that trained with ImageDataGenerator(1./255)?

After training model with ImageDataGenerator(1/255.), do I need to rescale image before predicting ?
I thought it is necessary but experiment result said NO.
I trained a Resnet50 model which has 37 class on top layer.
Model was trained with ImageDataGenerator like this.
datagen = ImageDataGenerator(rescale=1./255)
generator=datagen.flow_from_directory(
directory=os.path.join(os.getcwd(), data_folder),
target_size=(224,224),
batch_size=256,
classes=None,
class_mode='categorical')
history = model.fit_generator(generator, steps_per_epoch=generator.n / 256, epochs=10)
Accuracy achieved 98% after 10 epochs on my train dataset.
The problem is, when i tried to predict each image in TRAIN dataset, prediction was wrong ( result is 33 whatever input image was )
img_p = './data/pets/shiba_inu/shiba_inu_27.jpg'
img = cv2.imread(img_p, cv2.IMREAD_COLOR)
img = cv2.resize(img, (224,224))
img_arr = np.zeros((1,224,224,3))
img_arr[0, :, :, :] = img / 255.
pred = model.predict(img_arr)
yhat = np.argmax(pred, axis=1)
yhat is 5, but y is 33
When I replace this line
img_arr[0, :, :, :] = img / 255.
by this
img_arr[0, :, :, :] = img
yhat is exactly 33.
Someone might suggest to use predict_generator() instead of predict(), but I want to understand what I did wrong here.
I knew what's wrong here.
I'm using Imagenet pretrained model, which DO NOT rescale image by divide it to 255. I have to use resnet50.preprocess_input before train/test.
preprocess_input function can be found here.
https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py
You must do every preprocessing that you do on train data, on each data that you want to feed to your trained network. actually when, for example, you rescale train images and train a network, your network train to get a matrix with entries between 0 and 1 and find the proper category. so if after training phase, you feed an image without rescaling, you feed a matrix with entries between 0 and 255 to your trained network while your network did not learn how treat with such matrix.
If you are following pre-processing exactly same as at the time of training then, you might look at the part of your code where you are predicting class using yhat = np.argmax(pred, axis=1) my hunch is that there might be class mismatch in accordance to indexing, to check how your classes are indexed when you use flow_from_directory use class_map = generator.class_indices this will return you a dictionary which will show you how your classes are mapped against index.
Note: The reason I state this because I've faced similar problem, using Keras flow_from_directory doesn't sort classes and hence it's quite possible that your prediction class 1 lies on the index 10 while np.argmax will return you class 1'.

Resources