PyTorch Data Augmentation is taking too long - pytorch

For the task that involves regression, I need to train my models to generate density maps from RGB images. To augment my dataset I have decided to flip all the images horizontally. For that matter, I also have to flip my ground truth images and I did so.
dataset_for_augmentation.listDataset(train_list,
shuffle=True,
transform=transforms.Compose([
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]),
target_transform=transforms.Compose([
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor()
]),
train=True,
resize=4,
batch_size=args.batch_size,
num_workers=args.workers),
But here is the problem : For some reason, PyTorch transforms.RandomHorizontalFlip function takes only PIL images (numpy is not allowed) as input. So I decided to convert the type to PIL Image.
img_path = self.lines[index]
img, target = load_data(img_path, self.train, resize=self.resize)
if type(target[0][0]) is np.float64:
target = np.float32(target)
img = Image.fromarray(img)
target = Image.fromarray(target)
if self.transform is not None:
img = self.transform(img)
target = self.target_transform(target)
return img, target
And yes, this operation need enormous amount of time. Considering I need this operation to be carried out for thousands of images, 23 seconds (should have been under half a second at most) per batch is not tolerable.
2019-11-01 16:29:02,497 - INFO - Epoch: [0][0/152] Time 27.095 (27.095) Data 23.150 (23.150) Loss 93.7401 (93.7401)
I would appreciate any suggestions to speed up my augmentation process

You don't need to change the DataLoader to do that. You can use ToPILImage():
transform=transforms.Compose([
transforms.ToPILImage(), # check mode assumption in the documentation
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
Anyway, I would avoid converting to PIL. It seems completely unnecessary. If you want to flip all images, then why not to do that using NumPy only?
img_path = self.lines[index]
img, target = load_data(img_path, self.train, resize=self.resize)
if type(target[0][0]) is np.float64:
target = np.float32(target)
# assuming width axis=1 -- see my comment below
img = np.flip(img, axis=1)
target = np.flip(target, axis=1)
if self.transform is not None:
img = self.transform(img)
target = self.target_transform(target)
return img, target
And remove the transforms.RandomHorizontalFlip(p=1) from the Compose. As ToTensor(...) also handles ndarray, you are good to go.
Note: I am assuming the width axis is equal to 1, since ToTensor expects it to be there.
From the docs:
Converts a PIL Image or numpy.ndarray (H x W x C) ...

More of an addition to #Berriel answer.
Horizontal Flip
You are using transforms.RandomHorizontalFlip(p=1) for both X and y images. In your case, with p=1, those will be transformed exactly the same but you are missing the point of data augmentation as the network will only see flipped images (instead of only original images). You should go for probability lower than 1 and higher than 0 (usually 0.5) to get high variability in versions of the image.
If that was the case (p=0.5), you can be more than certain that there will occur a situation, where X gets flipped and y doesn't.
I would advise to use albumentations library and it's albumentations.augmentations.transforms.HorizontalFlip to do the flip on both images the same way.
Normalization
You can find normalization with ImageNet means and stds already set up there as well.
Caching
Furthermore, to speed things up you could use torchdata third party library (disclaimer I'm the author). In your case you could transform image from PIL to Tensor, Normalize with albumentations, cache on disk or even better in RAM images after those transformations with torchdata and finally apply your transformations. This way would allow you to only apply HorizontalFlips on your image and target after initial epoch, previous steps would be pre-calculated.

Related

Prepare for Binary Masks used for the image segmentation

I am trying to prepare the masks for image segmentation with Pytorch. I have three questions about data preparation.
What is the appropriate data format to save the binary mask in general? PNG? JPEG?
Is the mask size needed to be set square such as (224x224), not a rectangle such as (224x448)?
Is the mask value fixed when the size is converted from rectangle to square?
For example, the original mask image size is (600x900), which is binary [0,1]. However, when I applied
import torchvision.transforms as transforms
transforms.Compose([
transforms.Resize((300, 300)),
transforms.ToTensor(),
])
to the mask, the output had other values: 0.01, 0.0156, 0.22... except for 0 and 1, since the mask size was converted.
I applied the below code to convert the mask into the binary again if the value is less than 0.3, the value is 0, otherwise, 1.
def __getitem__(self, idx):
img, mask = self.load_data(idx)
if self.img_transforms is not None:
img = self.img_transforms(img)
if self.mask_transforms is not None:
mask = self.mask_transforms(mask)
mask = torch.where(mask<=0.3,0,1)
return img, mask
but I wonder the process is a common approach and efficient.
PNG, because it is lossless by design.
It depends. More convenient is to use standard resolution, (224x224), I would start with that.
Use resize without interpolation transforms.Resize((300, 300), interpolation=InterpolationMode.NEAREST)

Does albumentations normalize mask?

When I pass an image and a mask to albumentations.Normalize(mean, std).
How would I go about incorporating this?
Should I just add it manually in dataset?
Grateful for any tips you have!
Edited:
Normalization works for three-channel images. If your mask image is grayscale image then probably you need to stack(image= np.stack((img,)*3, axis=-1))
it and make three channel image then apply albumentations's Normalization function. Official function for A.Normalize() is as following which deals with RGB images:
def normalize(img, mean, std, max_pixel_value=255.0):
mean = np.array(mean, dtype=np.float32)
mean *= max_pixel_value
std = np.array(std, dtype=np.float32)
std *= max_pixel_value
denominator = np.reciprocal(std, dtype=np.float32)
img = img.astype(np.float32)
img -= mean
img *= denominator
return img
According to Albumentations's docs, you can make a composition of Transforms and use it within PyTorch dataset.
import albumentations as A
from albumentations.pytorch import ToTensorV2
train_transform = A.Compose(
[
A.SmallestMaxSize(max_size=160),
A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.05, rotate_limit=15, p=0.5),
A.RandomCrop(height=128, width=128),
A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15, p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
]
)
train_dataset = CatsVsDogsDataset(images_filepaths=train_images_filepaths, transform=train_transform)
But I am not really sure that normalizing mask image is right way or not.

How augmentation increase number of images [duplicate]

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping...etc). But that doesn't seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms in PyTorch, then it applies them one by one. So for example:
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
Here , for the training, we are first randomly cropping the image and resizing it to shape (224,224). Then we are taking these (224,224) images and horizontally flipping them. Therefore, our dataset is now containing ONLY the horizontally flipped images, so our original images are lost in this case.
Am I right? Is this understanding correct? If not, then where do we tell PyTorch in this code above (taken from Official Documentation) to keep the original images and resize them to the expected shape (224,224)?
Thanks
I assume you are asking whether these data augmentation transforms (e.g. RandomHorizontalFlip) actually increase the size of the dataset as well, or are they applied on each item in the dataset one by one and not adding to the size of the dataset.
Running the following simple code snippet we could observe that the latter is true, i.e. if you have a dataset of 8 images, and create a PyTorch dataset object for this dataset when you iterate through the dataset, the transformations are called on each data point, and the transformed data point is returned. So for example if you have random flipping, some of the data points are returned as original, some are returned as flipped (e.g. 4 flipped and 4 original). In other words, by one iteration through the dataset items, you get 8 data points(some flipped and some not). [Which is at odds with the conventional understanding of augmenting the dataset(e.g. in this case having 16 data points in the augmented dataset)]
from torch.utils.data import Dataset
from torchvision import transforms
class experimental_dataset(Dataset):
def __init__(self, data, transform):
self.data = data
self.transform = transform
def __len__(self):
return len(self.data.shape[0])
def __getitem__(self, idx):
item = self.data[idx]
item = self.transform(item)
return item
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])
x = torch.rand(8, 1, 2, 2)
print(x)
dataset = experimental_dataset(x,transform)
for item in dataset:
print(item)
Results: (The little differences in floating points are caused by transforming to pil image and back)
Original dummy dataset:
tensor([[[[0.1872, 0.5518],
[0.5733, 0.6593]]],
[[[0.6570, 0.6487],
[0.4415, 0.5883]]],
[[[0.5682, 0.3294],
[0.9346, 0.1243]]],
[[[0.1829, 0.5607],
[0.3661, 0.6277]]],
[[[0.1201, 0.1574],
[0.4224, 0.6146]]],
[[[0.9301, 0.3369],
[0.9210, 0.9616]]],
[[[0.8567, 0.2297],
[0.1789, 0.8954]]],
[[[0.0068, 0.8932],
[0.9971, 0.3548]]]])
transformed dataset:
tensor([[[0.1843, 0.5490],
[0.5725, 0.6588]]])
tensor([[[0.6549, 0.6471],
[0.4392, 0.5882]]])
tensor([[[0.5647, 0.3255],
[0.9333, 0.1216]]])
tensor([[[0.5569, 0.1804],
[0.6275, 0.3647]]])
tensor([[[0.1569, 0.1176],
[0.6118, 0.4196]]])
tensor([[[0.9294, 0.3333],
[0.9176, 0.9608]]])
tensor([[[0.8549, 0.2275],
[0.1765, 0.8941]]])
tensor([[[0.8902, 0.0039],
[0.3529, 0.9961]]])
The transforms operations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.
The confusion may come from the fact that often, like in your example, transforms are used both for data preparation (resizing/cropping to expected dimensions, normalizing values, etc.) and for data augmentation (randomizing the resizing/cropping, randomly flipping the images, etc.).
What your data_transforms['train'] does is:
Randomly resize the provided image and randomly crop it to obtain a (224, 224) patch
Apply or not a random horizontal flip to this patch, with a 50/50 chance
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided
What your data_transforms['val'] does is:
Resize your image to (256, 256)
Center crop the resized image to obtain a (224, 224) patch
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided
(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)
If you don't want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip() line.
Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCrop by transforms.Resize and transforms.CenterCrop, as done for data_transforms['val'].
Yes the dataset size does not change after the transformations. Every Image is passed to the transformation and returned, thus the size remaining the same.
If you wish to use the original dataset with transformed one concat them.
e.g increased_dataset = torch.utils.data.ConcatDataset([transformed_dataset,original])
The purpose of data augumentation is to increase the diversity of training dataset.
Even though the data.transforms doesn't change the size of dataset, however, every epoch we recall the dataset, the transforms operation will be executed and then get different data.
I changed #Ashkan372 code slightly to output data for multiple epochs:
import torch
from torchvision import transforms
from torch.utils.data import TensorDataset as Dataset
from torch.utils.data import DataLoader
class experimental_dataset(Dataset):
def __init__(self, data, transform):
self.data = data
self.transform = transform
def __len__(self):
return self.data.shape[0]
def __getitem__(self, idx):
item = self.data[idx]
item = self.transform(item)
return item
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])
x = torch.rand(8, 1, 2, 2)
print('the original data: \n', x)
epoch_size = 3
batch_size = 4
dataset = experimental_dataset(x,transform)
for i in range(epoch_size):
print('----------------------------------------------')
print('the epoch', i, 'data: \n')
for item in DataLoader(dataset, batch_size, shuffle=False):
print(item)
The output is:
the original data:
tensor([[[[0.5993, 0.5898],
[0.7365, 0.5472]]],
[[[0.1878, 0.3546],
[0.2124, 0.8324]]],
[[[0.9321, 0.0795],
[0.4090, 0.9513]]],
[[[0.2825, 0.6954],
[0.3737, 0.0869]]],
[[[0.2123, 0.7024],
[0.6270, 0.5923]]],
[[[0.9997, 0.9825],
[0.0267, 0.2910]]],
[[[0.2323, 0.1768],
[0.4646, 0.4487]]],
[[[0.2368, 0.0262],
[0.2423, 0.9593]]]])
----------------------------------------------
the epoch 0 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.3529, 0.1843],
[0.8314, 0.2118]]],
[[[0.9294, 0.0784],
[0.4078, 0.9490]]],
[[[0.6941, 0.2824],
[0.0863, 0.3725]]]])
tensor([[[[0.7020, 0.2118],
[0.5922, 0.6235]]],
[[[0.9804, 0.9961],
[0.2902, 0.0235]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
----------------------------------------------
the epoch 1 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.1843, 0.3529],
[0.2118, 0.8314]]],
[[[0.0784, 0.9294],
[0.9490, 0.4078]]],
[[[0.2824, 0.6941],
[0.3725, 0.0863]]]])
tensor([[[[0.2118, 0.7020],
[0.6235, 0.5922]]],
[[[0.9804, 0.9961],
[0.2902, 0.0235]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
----------------------------------------------
the epoch 2 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.3529, 0.1843],
[0.8314, 0.2118]]],
[[[0.0784, 0.9294],
[0.9490, 0.4078]]],
[[[0.6941, 0.2824],
[0.0863, 0.3725]]]])
tensor([[[[0.2118, 0.7020],
[0.6235, 0.5922]]],
[[[0.9961, 0.9804],
[0.0235, 0.2902]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
Different epoch we get different outputs!
TLDR :
The transform operation applies a bunch of transforms with a certain probability to the input batch that comes in the loop. So the model now is exposed to more examples during the course of multiple epochs.
Personally, when I was Training an audio classification model on my own dataset, before augmentation, my model always seem to converge at 72 % accuracy. I used augmentation along with an increased number of training epochs, Which boosted the validation accuracy in the test set to 89 percent.
In PyTorch, there are types of cropping that DO change the size of the dataset. These are FiveCrop and TenCrop:
CLASS torchvision.transforms.FiveCrop(size)
Crop the given image into four corners and the central crop.
This transform returns a tuple of images and there may be a mismatch
in the number of inputs and targets your Dataset returns. See below
for an example of how to deal with this.
Example:
>>> transform = Compose([
>>> TenCrop(size), # this is a list of PIL Images
>>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
TenCrop is the same plus the flipped version of the five patches (horizontal flipping is used by default).

Keras data augmentaion changes pixel values for masks (segmentation)

Iam using runtime data augmentation using generators in keras for segmentation problem..
Here is my data generator
data_gen_args = dict(
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2
)
image_datagen = ImageDataGenerator(**data_gen_args)
def generate_data_generator(generator, Xi, Yi):
genXi = generator.flow(Xi, seed=7, batch_size=32)
genYi = generator.flow(Yi, seed=7,batch_size=32)
while True:
Xi = genXi.next()
Yi = genYi.next()
print(Yi.dtype)
print(np.unique(Yi))
yield (Xi, Yi)
train_generator = generate_data_generator(image_datagen,
x_train,
y_train)
My labels are in a numpy array with data type float 32 and value 0.0 and 1.0.
#Output of np.unique(y_train)
array([0., 1.], dtype=float32
However, the data generator seems to modifies pixel values as shown below:-
#Output of print(np.unique(Yi))
[0.00000000e+00 1.01742386e-04 1.74021334e-04 ... 9.99918878e-01
9.99988437e-01 1.00000000e+00]
It is supposed to have same values(0.0 and 1.0) after data geneartion..
Also, the the official documentation shows an example using same augmentation arguments for generating mask and images together.
However when i remove shift and zoom iam getting (0.0 and 1.0) as output.
Keras verion 2.2.4,Python 3.6.8
UPDATE:-
I saved those images as numpy array and plotted it using matplotlib.It looks like the edges are smoothly interpolated (0.0-1.0) somehow upon including shifts and zoom augmentation. I can round these values in my custom generator as a hack; but i still don't understand the root cause (in case of normal images this is quite unnoticeable and has no adverse effects; but in masks we don't want to change label values )!!!
Still wondering.. is this a bug (nobody has mentioned it so far)or problem with my custom code ??

How to match cv2.imread to the keras image.img_load output

I'm studying deep learning. Trained an image classification algorithm. The problem is, however, that to train images I used:
test_image = image.load_img('some.png', target_size = (64, 64))
test_image = image.img_to_array(test_image)
While for actual application I use:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
But I found that those give a different ndarray (different data):
Last entries from load_image:
[ 64. 71. 66.]
[ 64. 71. 66.]
[ 62. 69. 67.]]]
Last entries from cv2.imread:
[ 15 23 27]
[ 16 24 28]
[ 14 24 28]]]
, so the system is not working. Is there a way to match results of one to another?
OpenCV reads images in BGR format whereas in keras, it is represented in RGB. To get the OpenCV version to correspond to the order we expect (RGB), simply reverse the channels:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
test_image = test_image[...,::-1] # Added
The last line reverses the channels to be in RGB order. You can then feed this into your keras model.
Another point I'd like to add is that cv2.imread usually reads in images in uint8 precision. Examining the output of your keras loaded image, you can see that the data is in floating point precision so you may also want to convert to a floating-point representation, such as float32:
import numpy as np
# ...
# ...
test_image = test_image[...,::-1].astype(np.float32)
As a final point, depending on how you trained your model it's usually customary to normalize the image pixel values to a [0,1] range. If you did this with your keras model, make sure you divide your values by 255 in your image read in through OpenCV:
import numpy as np
# ...
# ...
test_image = (test_image[...,::-1].astype(np.float32)) / 255.0
Recently, I came across the same issue. I tried to convert the color channel and resize the image with OpenCV. However, PIL and OpenCV have very different ways of image resizing.
Here is the exact solution to this problem.
This is the function that takes image file path , convert to targeted size and prepares for the Keras model -
import cv2
import keras
import numpy as np
from keras.preprocessing import image
from PIL import Image
def prepare_image (file):
im_resized = image.load_img(file, target_size = (224,224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
PIL_image = prepare_image ("lena.png")
If you have an OpenCV image then the function will be like this -
def prepare_image2 (img):
# convert the color from BGR to RGB then convert to PIL array
cvt_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
im_pil = Image.fromarray(cvt_image)
# resize the array (image) then PIL image
im_resized = im_pil.resize((224, 224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
img = cv2.imread("lena.png")
cv2_image = prepare_image2 (img)
# finally check if it is working
np.array_equal(PIL_image, cv2_image)
>> True
Besides CV2 using the BGR format and Keras (using PIL as a backend) using the RGB format, there are also significant differences in the resize methods of CV2 and PIL using the same parameters.
Multiple references can be found on the internet but the general idea is that there are subtle differences in pixel coordinate systems used in the two resize algorithms and also potential issues with different methods of casting to float as an intermediate step in the interpolation algo. End result is a visually similar image but one that is slightly shifted/perturbed between versions.
A perfect example of an adversarial attack that can cause huge differences in accuracy despite small input differences.

Resources