fit() gives different result than fit_generator() on the same dataset - python-3.x

I have been playing around with CNNs to try and remove coherent noise from relatively large images. Since the images are large, I cannot load too many into memory at once. Size of images: 1500x250. Because of this issue, I have tried to implement a generator which i feeding the images to the network. I have been struggling for a while getting bad results, but assumed the issues where in the network. I tried using fit() with a subset of my data and got extremely good results without doing anything with the network. Testing the generator on the same subset resulted in bad results. What is the catch that I cannot see? Why does my generator fail?
My dataset is approx 114 000 images, which is approximately 475 GB thus explaining why I cannot load all into memory at once. I get results and they are actual results recreating the images, but they are extremely bad. My generator class i here:
class genOne(k.utils.Sequence):
def __init__(self, img_rows, img_cols, channels, batch_size, clean_dir,
noisy_dir, clean_files, noisy_files, shuffle=True):
"""Initialize variables:
img_rows, img_cols, channels: the shape of the image
batch_size : Self explanatory
clean_dir, noisy_dir : directories with files
clean_files : Randomized list with clean images
noisy_file : Randomized list with noise"""
self.img_rows = img_rows
self.img_cols = img_cols
self.channels = channels
self.batch_size = batch_size
self.clean_dir = clean_dir
self.noisy_dir = noisy_dir
self.clean_files = clean_files.tolist()
self.noisy_files = noisy_files.tolist()
self.shuffled_noisy = []
self.tmp_noisy = []
self.tmp_clean = []
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
"""Sets the number of batches per epoch"""
return floor((len(self.noisy_files)*len(self.clean_files))/self.batch_size)
def __getitem__(self, index):
"""Generates data for each batch
combine every type of noise with each image."""
X = np.empty((self.batch_size, self.img_rows, self.img_cols,
self.channels))
Y = np.zeros((self.batch_size, self.img_rows, self.img_cols,
self.channels))
for i in range(self.batch_size):
if not self.tmp_noisy:
self.tmp_noisy = self.shuffled_noisy
self.tmp_clean.pop(0)
x_test = self.tmp_noisy.pop(0)
X[i,] = np.expand_dims(np.load(self.noisy_dir + x_test).T[
:self.img_rows,:self.img_cols],-1)
Y[i,] = np.expand_dims(np.load(self.clean_dir + self.tmp_clean[0
]).T[:self.img_rows, :self.img_cols],-1)
y_test = self.tmp_clean[0]
# Input equals ground truth + noise
X[i,] += Y[i,]
# Normalize data between 0 and 1
X[i,] = ((X[i,]/np.amax(np.absolute(X[i,])))+1)/2
Y[i,] = ((Y[i,]/np.amax(np.absolute(Y[i,])))+1)/2
return X, Y
def on_epoch_end(self):
"""Refresh all data on epoch end"""
self.tmp_noisy = self.noisy_files
self.tmp_clean = self.clean_files
if self.shuffle == True:
np.random.shuffle(self.tmp_noisy)
np.random.shuffle(self.tmp_clean)
self.shuffled_noisy = self.tmp_noisy
I have 475 clean images, and 300 images consisting of pure noise. I combine them such that each image is fed into the network with each type of noise. The small case which worked with fit() was simply 300 images where every image was a different clean image with a different noise.
I am aware that my driver version is rather old, which requires an old version of tensorflow. I can not update this, so I'm stuck with tensorflow 1.4.1.
Specs:
2x Nvidia Geforce GTX 1080 7.9 GB
Nvidia Driver version 367.44
cuDNN 6.0.21
CUDA 8.0
Debian Wheezy 7
Tensorflow-gpu 1.4.1
Keras 2.0.8
Python 3.6.7

Related

Pytorch low gpu util after first epoch

Hi I'm training my pytorch model on remote server.
All the job is managed by slurm.
My problem is 'training is extremely slower after training first epoch.'
I checked gpu utilization.
On my first epoch, utilization was like below image.
I can see gpu was utilized.
But from second epoch utilized percentage is almos zero
My dataloader code like this
class img2selfie_dataset(Dataset):
def __init__(self, path, transform, csv_file, cap_vec):
self.path = path
self.transformer = transform
self.images = [path + item for item in list(csv_file['file_name'])]
self.smiles_list = cap_vec
def __getitem__(self, idx):
img = Image.open(self.images[idx])
img = self.transformer(img)
label = self.smiles_list[idx]
label = torch.Tensor(label)
return img, label.type(torch.LongTensor)
def __len__(self):
return len(self.images)
My dataloader is defined like this
train_data_set = img2selfie_dataset(train_path, preprocess, train_dataset, train_cap_vec)
train_loader = DataLoader(train_data_set, batch_size = 256, num_workers = 2, pin_memory = True)
val_data_set = img2selfie_dataset(train_path, preprocess, val_dataset, val_cap_vec)
val_loader = DataLoader(val_data_set, batch_size = 256, num_workers = 2, pin_memory = True)
My training step defined like this
train_loss = []
valid_loss = []
epochs = 20
best_loss = 1e5
for epoch in range(1, epochs + 1):
print('Epoch {}/{}'.format(epoch, epochs))
print('-' * 10)
epoch_train_loss, epoch_valid_loss = train(encoder_model, transformer_decoder, train_loader, val_loader, criterion, optimizer)
train_loss.append(epoch_train_loss)
valid_loss.append(epoch_valid_loss)
if len(valid_loss) > 1:
if valid_loss[-1] < best_loss:
print(f"valid loss on this {epoch} is better than previous one, saving model.....")
torch.save(encoder_model.state_dict(), 'model/encoder_model.pickle')
torch.save(transformer_decoder.state_dict(), 'model/decoder_model.pickle')
best_loss = valid_loss[-1]
print(best_loss)
print(f'Epoch : [{epoch}] Train Loss : [{train_loss[-1]:.5f}], Valid Loss : [{valid_loss[-1]:.5f}]')
In my opinion, if this problem comes from my code. It wouldn't have hitted 100% utilization in first epoch.
I fixed this issue with moving my training data into local drive.
My remote server(school server) policy was storing personel data into NAS.
And file i/o from NAS proveked heavy load on network.
It was also affected by other user's file i/o from NAS.
After I moved training data into NAS, everything is fine.

Pytorch freezes when checking dataloader

I am running this block of codes for Pytorch and it seems to run forever/freeze in my notebook. I suspect it has something to do with my dataloader but I can't seem to figure out what is wrong here. I am running this on a GPU environment and I have previously ran tensorflow v2 keras for the CNN model and it was able to work.
In addition I have also tried to do model.train() and it was also stuck at the first epoch.
Code I am running
import time
start_time = time.time()
for data, label in train_dataloader:
print(data.size())
print(label.size())
break
print("Time taken: ", time.time() - start_time)
The dataloader is implemented with these line of codes
train_dataset = ChestXrayDataset("dataset/CheXpert-v1.0-small/train/train", train_data, IMAGE_SIZE, True)
train_dataloader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
These are the parameters
IMAGE_SIZE = 224 # Image size (224x224)
IMAGENET_MEAN = [0.485, 0.456, 0.406] # Mean of ImageNet dataset (used for normalization)
IMAGENET_STD = [0.229, 0.224, 0.225] # Std of ImageNet dataset (used for normalization)
BATCH_SIZE = 96
LEARNING_RATE = 0.001
LEARNING_RATE_SCHEDULE_FACTOR = 0.1 # Parameter used for reducing learning rate
LEARNING_RATE_SCHEDULE_PATIENCE = 5 # Parameter used for reducing learning rate
MAX_EPOCHS = 100 # Maximum number of training epochs
I have checked the dataloader and this is what I got
<torch.utils.data.dataloader.DataLoader at 0x1f96cd5f6a0>
The class for ChestXrayDataset is shown here
class ChestXrayDataset(Dataset):
def __init__(self, folder_dir, dataframe, image_size, normalization):
"""
Init Dataset
Parameters
----------
folder_dir: str
folder contains all images
dataframe: pandas.DataFrame
dataframe contains all information of images
image_size: int
image size to rescale
normalization: bool
whether applying normalization with mean and std from ImageNet or not
"""
self.image_paths = [] # List of image paths
self.image_labels = [] # List of image labels
# Define list of image transformations
image_transformation = [
transforms.Resize((image_size, image_size)),
transforms.ToTensor()
]
if normalization:
# Normalization with mean and std from ImageNet
image_transformation.append(transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD))
self.image_transformation = transforms.Compose(image_transformation)
# Get all image paths and image labels from dataframe
for index, row in dataframe.iterrows():
image_path = os.path.join(folder_dir, row.Path)
self.image_paths.append(image_path)
if len(row) < 14:
labels = [0] * 14
else:
labels = []
for col in row[5:]:
if col == 1:
labels.append(1)
else:
labels.append(0)
self.image_labels.append(labels)
def __len__(self):
return len(self.image_paths)
def __getitem__(self, index):
"""
Read image at index and convert to torch Tensor
"""
# Read image
image_path = self.image_paths[index]
image_data = Image.open(image_path).convert("RGB") # Convert image to RGB channels
# TODO: Image augmentation code would be placed here
# Resize and convert image to torch tensor
image_data = self.image_transformation(image_data)
return image_data, torch.FloatTensor(self.image_labels[index])
Checking the length of dataframe.iterrows() and row[5:] would help.

Will switching GPU device affect the gradient in PyTorch back propagation?

I use the Pytorch. In the computation, I move some data and operators A in the GPU. In the middle step, I move the data and operators B to CPU and continue the forward.
My question is that:
My operator B is very memory-consuming that cannot be used in GPU. Will this affect (some parts compute in GPU and the others are computed in CPU) the backpropagation?
Pytorch keeps track of the location of tensors. If you use .cpu() or .to('cpu') pytorch's native commands you should be okay.
See, e.g., this model parallel tutorial - the computation is split between two different GPU devices.
If your model fits into the GPU memory, you might let PyTorch do the parallel distribution for you within the DataParallel (one process multiple threads) or DistributedDataParallel (multiple processes multiple threads, single or multiple nodes) frameworks.
Code below checks if you have a gpu device torch.cuda.device_count() > 1 and sets the DataParallel mode model = nn.DataParallel(model)
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
DataParallel replicates the same model to all GPUs, where each GPU consumes a different partition of the input data, it can significantly accelerate the training process, but it does not work for some use cases where the model is too large to fit into a single GPU.
To solve this problem, you might resort to a model parallel approach, which splits a single model onto different GPUs, rather than replicating the entire model on each GPU.
(e.g. a model m contains 10 layers: when using DataParallel, each GPU
will have a replica of each of these 10 layers, whereas when using
model parallel on two GPUs, each GPU could host 5 layers)
An example where .to('cuda:0') indicates where the layer should be positioned.
import torch
import torch.nn as nn
import torch.optim as optim
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
self.net1 = torch.nn.Linear(10, 10).to('cuda:0')
self.relu = torch.nn.ReLU()
self.net2 = torch.nn.Linear(10, 5).to('cuda:1')
def forward(self, x):
x = self.relu(self.net1(x.to('cuda:0')))
return self.net2(x.to('cuda:1'))
backward() then automatically takes location into consideration.
model = ToyModel()
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)
optimizer.zero_grad()
outputs = model(torch.randn(20, 10))
labels = torch.randn(20, 5).to('cuda:1')
loss_fn(outputs, labels).backward()
optimizer.step()
https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html
This snippet suggests that the gradient is preserved when computation goes through different devices.
def change_device():
import torch.nn as nn
a = torch.rand((4, 32))
m1 = nn.Linear(32, 32)
cpu = m1(a)
gpu = cpu.to(0)
m2 = nn.Linear(32, 32).to(0)
out = m2(gpu)
loss = out.sum()
loss.backward()
print(m1.weight.grad)
# works like magic
"""
tensor([[ 0.7746, 1.0342, 0.8706, ..., 1.0993, 0.7975, 0.3915],
[-0.5369, -0.7169, -0.6034, ..., -0.7619, -0.5527, -0.2713],
[ 0.3607, 0.4815, 0.4053, ..., 0.5118, 0.3713, 0.1823],
...,
[ 1.1200, 1.4955, 1.2588, ..., 1.5895, 1.1531, 0.5660],
[-0.1582, -0.2112, -0.1778, ..., -0.2245, -0.1629, -0.0799],
[-0.4531, -0.6050, -0.5092, ..., -0.6430, -0.4665, -0.2290]])
"""
Modifying this snippet, the gradient is preserved when tensor moves from gpu to cpu as well.

Keras: load images batch wise for large dataset

Its is possible in keras to load only one batch in memory at a time as I have 40GB dataset of images.
If dataset is small I can used ImageDataGenerator to generator batches but due large dataset I can't load all images in memory.
Is there any method in keras to do something similar to following tensorflow code:
path_queue = tf.train.string_input_producer(input_paths, shuffle= False)
paths, contents = reader.read(path_queue)
inputs = decode(contents)
input_batch = tf.train.batch([inputs], batch_size=2)
I am using this method to serialize inputs in tensorflow but I don't know how to achieve this task in Keras.
Keras has the method fit_generator() in its models. It accepts a python generator or a keras Sequence as input.
You can create a simple generator like this:
fileList = listOfFiles
def imageLoader(files, batch_size):
L = len(files)
#this line is just to make the generator infinite, keras needs that
while True:
batch_start = 0
batch_end = batch_size
while batch_start < L:
limit = min(batch_end, L)
X = someMethodToLoadImages(files[batch_start:limit])
Y = someMethodToLoadTargets(files[batch_start:limit])
yield (X,Y) #a tuple with two numpy arrays with batch_size samples
batch_start += batch_size
batch_end += batch_size
And fit like this:
model.fit_generator(imageLoader(fileList,batch_size),steps_per_epoch=..., epochs=..., ...)
Normally, you pass to steps_per_epoch the number of batches you will take from the generator.
You can also implement your own Keras Sequence. It's a little more work, but they recommend using this if you're going to make multi-thread processing.

way to train a model when we have a huge dataset so as to avoid error memory

I am trying to train a neural network with a dataset which have more thant 250k images. But I am stuck because of my limited computer which have 16 GB RAM and 32 GB SWAP. Before I load all the images, it gives me an memory error. So I was wondering if there is a way to train my neural network using all the images I have ? For example, instead of using RAM memory to load images in numpy array, can we load it on the free space disk ?
EDIT 1:
def get_array_image(file, path):
return cv2.imread(path+file)
def generator(features, labels, num_classes, batch_size, path=''):
# Create empty arrays to contain batch of features and labels#
batch_features = np.zeros((batch_size, 28, 28, 3))
batch_labels = np.zeros((batch_size, 1))
while True:
for cpt in range(0, len(features), batch_size):
for i in range(0, batch_size):
index = cpt + i
#print('images : ', index)
batch_features[i] = get_array_image(features[index], path)
batch_labels[i] = labels[index]
yield batch_features, keras.utils.to_categorical(batch_labels, num_classes)
This is my generator I used with fit_generator function. But I have a problem of accuracy. In fact, I tried it on the mnist dataset with a smal Neural Network. If I use the fit function loading all the images (about 60k images for training and 60k for the test) I have about 0.68 of accuracy after one epoch. But with the fit_generator I obtain only 0.1 . Am I doing something wrong with my generator ? When I print the index variable, it seems good.
EDIT 2: I solved my problem but i do not understand why it works. In fact when I create the array outside the loop I obtain low accuracy but when it is inside the accuracy is good with fit_generator. Does someone know what I am missing ?
def generator(features, labels, num_classes, batch_size, path='', dtype=np.uint8):
# Create empty arrays to contain batch of features and labels#
# batch_features = np.ndarray(shape=(batch_size, 28, 28, 3), dtype=dtype)
# batch_labels = np.ndarray(shape=(batch_size, 1), dtype=dtype)
while True:
for cpt in range(0, len(features), batch_size):
batch_features = np.ndarray(shape=(batch_size, 28, 28, 3), dtype=dtype)
batch_labels = np.ndarray(shape=(batch_size, 1), dtype=dtype)
for i in range(0, batch_size):
# index= random.randint(0, len(features)-1)
index = cpt + i
#print('images : ', index)
batch_features[i] = get_array_image(features[index], path)
batch_labels[i] = labels[index]
# print(batch_labels[i])
# cv2.imshow('image', batch_features[i])
# cv2.waitKey(0)
# cv2.destroyAllWindows()
# print(features[index])
print(batch_features.shape)
yield batch_features, keras.utils.to_categorical(batch_labels, num_classes)

Resources