I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping...etc). But that doesn't seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms in PyTorch, then it applies them one by one. So for example:
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
Here , for the training, we are first randomly cropping the image and resizing it to shape (224,224). Then we are taking these (224,224) images and horizontally flipping them. Therefore, our dataset is now containing ONLY the horizontally flipped images, so our original images are lost in this case.
Am I right? Is this understanding correct? If not, then where do we tell PyTorch in this code above (taken from Official Documentation) to keep the original images and resize them to the expected shape (224,224)?
Thanks
I assume you are asking whether these data augmentation transforms (e.g. RandomHorizontalFlip) actually increase the size of the dataset as well, or are they applied on each item in the dataset one by one and not adding to the size of the dataset.
Running the following simple code snippet we could observe that the latter is true, i.e. if you have a dataset of 8 images, and create a PyTorch dataset object for this dataset when you iterate through the dataset, the transformations are called on each data point, and the transformed data point is returned. So for example if you have random flipping, some of the data points are returned as original, some are returned as flipped (e.g. 4 flipped and 4 original). In other words, by one iteration through the dataset items, you get 8 data points(some flipped and some not). [Which is at odds with the conventional understanding of augmenting the dataset(e.g. in this case having 16 data points in the augmented dataset)]
from torch.utils.data import Dataset
from torchvision import transforms
class experimental_dataset(Dataset):
def __init__(self, data, transform):
self.data = data
self.transform = transform
def __len__(self):
return len(self.data.shape[0])
def __getitem__(self, idx):
item = self.data[idx]
item = self.transform(item)
return item
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])
x = torch.rand(8, 1, 2, 2)
print(x)
dataset = experimental_dataset(x,transform)
for item in dataset:
print(item)
Results: (The little differences in floating points are caused by transforming to pil image and back)
Original dummy dataset:
tensor([[[[0.1872, 0.5518],
[0.5733, 0.6593]]],
[[[0.6570, 0.6487],
[0.4415, 0.5883]]],
[[[0.5682, 0.3294],
[0.9346, 0.1243]]],
[[[0.1829, 0.5607],
[0.3661, 0.6277]]],
[[[0.1201, 0.1574],
[0.4224, 0.6146]]],
[[[0.9301, 0.3369],
[0.9210, 0.9616]]],
[[[0.8567, 0.2297],
[0.1789, 0.8954]]],
[[[0.0068, 0.8932],
[0.9971, 0.3548]]]])
transformed dataset:
tensor([[[0.1843, 0.5490],
[0.5725, 0.6588]]])
tensor([[[0.6549, 0.6471],
[0.4392, 0.5882]]])
tensor([[[0.5647, 0.3255],
[0.9333, 0.1216]]])
tensor([[[0.5569, 0.1804],
[0.6275, 0.3647]]])
tensor([[[0.1569, 0.1176],
[0.6118, 0.4196]]])
tensor([[[0.9294, 0.3333],
[0.9176, 0.9608]]])
tensor([[[0.8549, 0.2275],
[0.1765, 0.8941]]])
tensor([[[0.8902, 0.0039],
[0.3529, 0.9961]]])
The transforms operations are applied to your original images at every batch generation. So your dataset is left unchanged, only the batch images are copied and transformed every iteration.
The confusion may come from the fact that often, like in your example, transforms are used both for data preparation (resizing/cropping to expected dimensions, normalizing values, etc.) and for data augmentation (randomizing the resizing/cropping, randomly flipping the images, etc.).
What your data_transforms['train'] does is:
Randomly resize the provided image and randomly crop it to obtain a (224, 224) patch
Apply or not a random horizontal flip to this patch, with a 50/50 chance
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided
What your data_transforms['val'] does is:
Resize your image to (256, 256)
Center crop the resized image to obtain a (224, 224) patch
Convert it to a Tensor
Normalize the resulting Tensor, given the mean and deviation values you provided
(i.e. the random resizing/cropping for the training data is replaced by a fixed operation for the validation one, to have reliable validation results)
If you don't want your training images to be horizontally flipped with a 50/50 chance, just remove the transforms.RandomHorizontalFlip() line.
Similarly, if you want your images to always be center-cropped, replace transforms.RandomResizedCrop by transforms.Resize and transforms.CenterCrop, as done for data_transforms['val'].
Yes the dataset size does not change after the transformations. Every Image is passed to the transformation and returned, thus the size remaining the same.
If you wish to use the original dataset with transformed one concat them.
e.g increased_dataset = torch.utils.data.ConcatDataset([transformed_dataset,original])
The purpose of data augumentation is to increase the diversity of training dataset.
Even though the data.transforms doesn't change the size of dataset, however, every epoch we recall the dataset, the transforms operation will be executed and then get different data.
I changed #Ashkan372 code slightly to output data for multiple epochs:
import torch
from torchvision import transforms
from torch.utils.data import TensorDataset as Dataset
from torch.utils.data import DataLoader
class experimental_dataset(Dataset):
def __init__(self, data, transform):
self.data = data
self.transform = transform
def __len__(self):
return self.data.shape[0]
def __getitem__(self, idx):
item = self.data[idx]
item = self.transform(item)
return item
transform = transforms.Compose([
transforms.ToPILImage(),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])
x = torch.rand(8, 1, 2, 2)
print('the original data: \n', x)
epoch_size = 3
batch_size = 4
dataset = experimental_dataset(x,transform)
for i in range(epoch_size):
print('----------------------------------------------')
print('the epoch', i, 'data: \n')
for item in DataLoader(dataset, batch_size, shuffle=False):
print(item)
The output is:
the original data:
tensor([[[[0.5993, 0.5898],
[0.7365, 0.5472]]],
[[[0.1878, 0.3546],
[0.2124, 0.8324]]],
[[[0.9321, 0.0795],
[0.4090, 0.9513]]],
[[[0.2825, 0.6954],
[0.3737, 0.0869]]],
[[[0.2123, 0.7024],
[0.6270, 0.5923]]],
[[[0.9997, 0.9825],
[0.0267, 0.2910]]],
[[[0.2323, 0.1768],
[0.4646, 0.4487]]],
[[[0.2368, 0.0262],
[0.2423, 0.9593]]]])
----------------------------------------------
the epoch 0 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.3529, 0.1843],
[0.8314, 0.2118]]],
[[[0.9294, 0.0784],
[0.4078, 0.9490]]],
[[[0.6941, 0.2824],
[0.0863, 0.3725]]]])
tensor([[[[0.7020, 0.2118],
[0.5922, 0.6235]]],
[[[0.9804, 0.9961],
[0.2902, 0.0235]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
----------------------------------------------
the epoch 1 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.1843, 0.3529],
[0.2118, 0.8314]]],
[[[0.0784, 0.9294],
[0.9490, 0.4078]]],
[[[0.2824, 0.6941],
[0.3725, 0.0863]]]])
tensor([[[[0.2118, 0.7020],
[0.6235, 0.5922]]],
[[[0.9804, 0.9961],
[0.2902, 0.0235]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
----------------------------------------------
the epoch 2 data:
tensor([[[[0.5882, 0.5961],
[0.5451, 0.7333]]],
[[[0.3529, 0.1843],
[0.8314, 0.2118]]],
[[[0.0784, 0.9294],
[0.9490, 0.4078]]],
[[[0.6941, 0.2824],
[0.0863, 0.3725]]]])
tensor([[[[0.2118, 0.7020],
[0.6235, 0.5922]]],
[[[0.9961, 0.9804],
[0.0235, 0.2902]]],
[[[0.2314, 0.1765],
[0.4627, 0.4471]]],
[[[0.0235, 0.2353],
[0.9569, 0.2392]]]])
Different epoch we get different outputs!
TLDR :
The transform operation applies a bunch of transforms with a certain probability to the input batch that comes in the loop. So the model now is exposed to more examples during the course of multiple epochs.
Personally, when I was Training an audio classification model on my own dataset, before augmentation, my model always seem to converge at 72 % accuracy. I used augmentation along with an increased number of training epochs, Which boosted the validation accuracy in the test set to 89 percent.
In PyTorch, there are types of cropping that DO change the size of the dataset. These are FiveCrop and TenCrop:
CLASS torchvision.transforms.FiveCrop(size)
Crop the given image into four corners and the central crop.
This transform returns a tuple of images and there may be a mismatch
in the number of inputs and targets your Dataset returns. See below
for an example of how to deal with this.
Example:
>>> transform = Compose([
>>> TenCrop(size), # this is a list of PIL Images
>>> Lambda(lambda crops: torch.stack([ToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
TenCrop is the same plus the flipped version of the five patches (horizontal flipping is used by default).
I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
I'm studying deep learning. Trained an image classification algorithm. The problem is, however, that to train images I used:
test_image = image.load_img('some.png', target_size = (64, 64))
test_image = image.img_to_array(test_image)
While for actual application I use:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
But I found that those give a different ndarray (different data):
Last entries from load_image:
[ 64. 71. 66.]
[ 64. 71. 66.]
[ 62. 69. 67.]]]
Last entries from cv2.imread:
[ 15 23 27]
[ 16 24 28]
[ 14 24 28]]]
, so the system is not working. Is there a way to match results of one to another?
OpenCV reads images in BGR format whereas in keras, it is represented in RGB. To get the OpenCV version to correspond to the order we expect (RGB), simply reverse the channels:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
test_image = test_image[...,::-1] # Added
The last line reverses the channels to be in RGB order. You can then feed this into your keras model.
Another point I'd like to add is that cv2.imread usually reads in images in uint8 precision. Examining the output of your keras loaded image, you can see that the data is in floating point precision so you may also want to convert to a floating-point representation, such as float32:
import numpy as np
# ...
# ...
test_image = test_image[...,::-1].astype(np.float32)
As a final point, depending on how you trained your model it's usually customary to normalize the image pixel values to a [0,1] range. If you did this with your keras model, make sure you divide your values by 255 in your image read in through OpenCV:
import numpy as np
# ...
# ...
test_image = (test_image[...,::-1].astype(np.float32)) / 255.0
Recently, I came across the same issue. I tried to convert the color channel and resize the image with OpenCV. However, PIL and OpenCV have very different ways of image resizing.
Here is the exact solution to this problem.
This is the function that takes image file path , convert to targeted size and prepares for the Keras model -
import cv2
import keras
import numpy as np
from keras.preprocessing import image
from PIL import Image
def prepare_image (file):
im_resized = image.load_img(file, target_size = (224,224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
PIL_image = prepare_image ("lena.png")
If you have an OpenCV image then the function will be like this -
def prepare_image2 (img):
# convert the color from BGR to RGB then convert to PIL array
cvt_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
im_pil = Image.fromarray(cvt_image)
# resize the array (image) then PIL image
im_resized = im_pil.resize((224, 224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
img = cv2.imread("lena.png")
cv2_image = prepare_image2 (img)
# finally check if it is working
np.array_equal(PIL_image, cv2_image)
>> True
Besides CV2 using the BGR format and Keras (using PIL as a backend) using the RGB format, there are also significant differences in the resize methods of CV2 and PIL using the same parameters.
Multiple references can be found on the internet but the general idea is that there are subtle differences in pixel coordinate systems used in the two resize algorithms and also potential issues with different methods of casting to float as an intermediate step in the interpolation algo. End result is a visually similar image but one that is slightly shifted/perturbed between versions.
A perfect example of an adversarial attack that can cause huge differences in accuracy despite small input differences.