Maxpool of an image in pytorch - pytorch

I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:

Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()

Related

Problem applying binary mask to an RGB image with numpy

I'm trying to apply a binary mask to an RGB image with numpy
I found this https://stackoverflow.com/a/26843467/4628384 but I don't have permissions to write a comment yet.
Anyway I'm running into a problem; Any help much appreciated.
def masktoRGB(self,image,image_mask):
# create mask with same dimensions as image
mask = np.zeros_like(image)
# copy your image_mask to all dimensions (i.e. colors) of your image
for i in range(3):
mask[:,:,i] = image_mask.copy()
# apply the mask to your image
# tried to swap axes, not a solution
#image = image.swapaxes(0,1)
#this gives the error:
masked_image = image[mask]
print(mask.shape)
print(image.shape)
print(image_mask.shape)
return masked_image
this gives me:
IndexError: index 213 is out of bounds for axis 0 with size 212
print output:
(188, 212, 3)
(188, 212, 3)
(188, 212)
image and image_mask are the same image, except the first is RGB and the second is mode L
Try to use broadcasting and multiplication:
image * image_mask[..., None]
I assume that the type of image_mask is bool that maps to numbers 0 and 1. Therefore pairwise multiplication of image and mask will set masked values to zero.
Similar efect can be achieved with np.where() or & operator.
The issue is that shapes of image and image_mask are not compatible. Numpy will first add extra dimensions at the head of shape until both tensor have the same shape. Thus image_mask is reshaped from (188, 212) to (1,188, 212). This new shape is not compatible with the shape of image
(188, 212, 3).
The trick is to reshape image_mask using fancy indexing. You can use None as index to add a dummy dimension of size 1 at the end of shape. Operation image_mask[..., None] will reshape it from (188, 212) to (188, 212,1).
Broadcast rules tells that dimensions of size 1 are expanded by repeating all values along broadcast dimension. Thus numoy will automatically reshape tensir from (188, 212,1) to (188, 212,3). Operation is very fast because no copy is created.
Now bith tensors can be multiplied producing desired result.

Add extra dimension to an axes

I have a batch of segmentation masks of shape [5,1,100,100] (batch_size x dims x ht x wd) which I have to display in tensorboardX with an RGB image batch [5,3,100,100]. I want to add two dummy dimensions to the second axes of the segmentation mask to make it [5,3,100,100] so there will not be any dimension mismatch error when I pass it to torch.utils.make_grid. I have tried unsqueeze, expand and view but I am not able to do it. Any suggestions?
You can use expand, repeat, or repeat_interleave:
import torch
x = torch.randn((5, 1, 100, 100))
x1_3channels = x.expand(-1, 3, -1, -1)
x2_3channels = x.repeat(1, 3, 1, 1)
x3_3channels = x.repeat_interleave(3, dim=1)
print(x1_3channels.shape) # torch.Size([5, 3, 100, 100])
print(x2_3channels.shape) # torch.Size([5, 3, 100, 100])
print(x3_3channels.shape) # torch.Size([5, 3, 100, 100])
Note that, as stated in the docs:
torch.expand():
Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the stride to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory.
torch.repeat():
Unlike expand(), this function copies the tensor’s data.

Keras Image Preprocessing

My training images are downscaled versions of their associated HR image. Thus, the input and the output images aren't the same dimension. For now, I'm using a hand-crafted sample of 13 images, but eventually I would like to be able to use my 500-ish HR (high-resolution) images dataset. This dataset, however, does not have images of the same dimension, so I'm guessing I'll have to crop them in order to obtain a uniform dimension.
I currently have this code set up: it takes a bunch of 512x512x3 images and applies a few transformations to augment the data (flips). I thus obtain a basic set of 39 images in their HR form, and then I downscale them by a factor of 4, thus obtaining my trainset which consits of 39 images of dimension 128x128x3.
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.image as mpimg
import skimage
from skimage import transform
from constants import data_path
from constants import img_width
from constants import img_height
from model import setUpModel
def setUpImages():
train = []
finalTest = []
sample_amnt = 11
max_amnt = 13
# Extracting images (512x512)
for i in range(sample_amnt):
train.append(mpimg.imread(data_path + str(i) + '.jpg'))
for i in range(max_amnt-sample_amnt):
finalTest.append(mpimg.imread(data_path + str(i+sample_amnt) + '.jpg'))
# # TODO: https://keras.io/preprocessing/image/
# ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False,
# samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0,
# width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0,
# zoom_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False,
# vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None,
# validation_split=0.0, dtype=None)
# Augmenting data
trainData = dataAugmentation(train)
testData = dataAugmentation(finalTest)
setUpData(trainData, testData)
def setUpData(trainData, testData):
# print(type(trainData)) # <class 'numpy.ndarray'>
# print(len(trainData)) # 64
# print(type(trainData[0])) # <class 'numpy.ndarray'>
# print(trainData[0].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2-1].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2].shape) # (350, 350, 3)
# print(trainData[len(trainData)-1].shape) # (350, 350, 3)
# TODO: substract mean of all images to all images
# Separating the training data
Y_train = trainData[:len(trainData)//2] # First half is the unaltered data
X_train = trainData[len(trainData)//2:] # Second half is the deteriorated data
# Separating the testing data
Y_test = testData[:len(testData)//2] # First half is the unaltered data
X_test = testData[len(testData)//2:] # Second half is the deteriorated data
# Adjusting shapes for Keras input # TODO: make into a function ?
X_train = np.array([x for x in X_train])
Y_train = np.array([x for x in Y_train])
Y_test = np.array([x for x in Y_test])
X_test = np.array([x for x in X_test])
# # Sanity check: display four images (2x HR/LR)
# plt.figure(figsize=(10, 10))
# for i in range(2):
# plt.subplot(2, 2, i + 1)
# plt.imshow(Y_train[i], cmap=plt.cm.binary)
# for i in range(2):
# plt.subplot(2, 2, i + 1 + 2)
# plt.imshow(X_train[i], cmap=plt.cm.binary)
# plt.show()
setUpModel(X_train, Y_train, X_test, Y_test)
# TODO: possibly remove once Keras Preprocessing is integrated?
def dataAugmentation(dataToAugment):
print("Starting to augment data")
arrayToFill = []
# faster computation with values between 0 and 1 ?
dataToAugment = np.divide(dataToAugment, 255.)
# TODO: switch from RGB channels to CbCrY
# # TODO: Try GrayScale
# trainingData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(350, 350, 1) for x in trainingData])
# validateData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(1400, 1400, 1) for x in validateData])
# adding the normal images (8)
for i in range(len(dataToAugment)):
arrayToFill.append(dataToAugment[i])
# vertical axis flip (-> 16)
for i in range(len(arrayToFill)):
arrayToFill.append(np.fliplr(arrayToFill[i]))
# horizontal axis flip (-> 32)
for i in range(len(arrayToFill)):
arrayToFill.append(np.flipud(arrayToFill[i]))
# downsizing by scale of 4 (-> 64 images of 128x128x3)
for i in range(len(arrayToFill)):
arrayToFill.append(skimage.transform.resize(
arrayToFill[i],
(img_width/4, img_height/4),
mode='reflect',
anti_aliasing=True))
# # Sanity check: display the images
# plt.figure(figsize=(10, 10))
# for i in range(64):
# plt.subplot(8, 8, i + 1)
# plt.imshow(arrayToFill[i], cmap=plt.cm.binary)
# plt.show()
return np.array(arrayToFill)
My question is: in my case, can I use the Preprocessing tool that Keras offers? I would ideally like to be able to input my varying sized images of high quality, crop them (not downsize them) to 512x512x3, and data augment them through flips and whatnot. Substracting the mean would also be part of what I'd like to achieve. That set would represent my validation set.
Reusing the validation set, I want to downscale by a factor of 4 all the images, and that would generate my training set.
Those two sets could then be split appropriately to obtain, ultimately, the famous X_train Y_train X_test Y_test.
I'm just hesitant about throwing out all the work I've done so far to preprocess my mini sample, but I'm thinking if it can all be done with a single built-in function, maybe I should give that a go.
This is my first ML project, hence me not understanding very well Keras, and the documentation isn't always the clearest. I'm thinking that the fact that I'm working with a X and Y that are different in size, maybe this function doesn't apply to my project.
Thank you! :)
Yes you can use keras preprocessing function. Below some snippets to help you...
def cropping_function(x):
...
return cropped_image
X_image_gen = ImageDataGenerator(preprocessing_function = cropping_function,
horizontal_flip = True,
vertical_flip=True)
X_train_flow = X_image_gen.flow(X_train, batch_size = 16, seed = 1)
Y_image_gen = ImageDataGenerator(horizontal_flip = True,
vertical_flip=True)
Y_train_flow = Y_image_gen.flow(y_train, batch_size = 16, seed = 1)
train_flow = zip(X_train_flow,Y_train_flow)
model.fit_generator(train_flow)
Christof Henkel's suggestion is very clean and nice. I would just like to offer another way to do it using imgaug, a convenient way to augment images in lots of different ways. It's usefull if you want more implemented augmentations or if you ever need to use some ML library other than Keras.
It unfortunatly doesn't have a way to make crops that way but it allows implementing custom functions. Here is an example function for generating random crops of a set size from an image that's at least as big as the chosen crop size:
from imgaug import augmenters as iaa
def random_crop(images, random_state, parents, hooks):
crop_h, crop_w = 128, 128
new_images = []
for img in images:
if (img.shape[0] >= crop_h) and (img.shape[1] >= crop_w):
rand_h = np.random.randint(0, img.shape[0]-crop_h)
rand_w = np.random.randint(0, img.shape[1]-crop_w)
new_images.append(img[rand_h:rand_h+crop_h, rand_w:rand_w+crop_w])
else:
new_images.append(np.zeros((crop_h, crop_w, 3)))
return np.array(new_images)
def keypoints_dummy(keypoints_on_images, random_state, parents, hooks):
return keypoints_on_images
cropper = iaa.Lambda(func_images=random_crop, func_keypoints=keypoints_dummy)
You can then combine this function with any other builtin imgaug function, for example the flip functions that you're already using like this:
seq = iaa.Sequential([cropper, iaa.Fliplr(0.5), iaa.Flipud(0.5)])
This function could then generate lots of different crops from each image. An example image with some possible results (note that it would result in actual (128, 128, 3) images, they are just merged into one image here for visualization):
Your image set could then be generated by:
crops_per_image = 10
images = [skimage.io.imread(path) for path in glob.glob('train_data/*.jpg')]
augs = np.array([seq.augment_image(img)/255 for img in images for _ in range(crops_per_image)])
It would also be simple to add new functions to be applied to the images, for example the remove mean functions you mentioned.
Here's another way performing random and center crop before resizing using native ImageDataGenerator and flow_from_directory. You can add it as preprocess_crop.py module into your project.
It first resizes image preserving aspect ratio and then performs crop. Resized image size is based on crop_fraction which is hardcoded but can be changed. See crop_fraction = 0.875 line where 0.875 appears to be the most common, e.g. 224px crop from 256px image.
Note that the implementation has been done by monkey patching keras_preprocessing.image.utils.loag_img function as I couldn't find any other way to perform crop before resizing without rewriting many other classes above.
Due to these limitations, the cropping method is enumerated into the interpolation field. Methods are delimited by : where the first part is interpolation and second is crop e.g. lanczos:random. Supported crop methods are none, center, random. When no crop method is specified, none is assumed.
How to use it
Just drop the preprocess_crop.py into your project to enable cropping. The example below shows how you can use random cropping for the training and center cropping for validation:
import preprocess_crop
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
#...
# Training with random crop
train_datagen = ImageDataGenerator(
rotation_range=20,
channel_shift_range=20,
horizontal_flip=True,
preprocessing_function=preprocess_input
)
train_img_generator = train_datagen.flow_from_directory(
train_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:random', # <--------- random crop
shuffle = True
)
# Validation with center crop
validate_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input
)
validate_img_generator = validate_datagen.flow_from_directory(
validate_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:center', # <--------- center crop
shuffle = False
)
Here's preprocess_crop.py file to include with your project:
import random
import keras_preprocessing.image
def load_and_crop_img(path, grayscale=False, color_mode='rgb', target_size=None,
interpolation='nearest'):
"""Wraps keras_preprocessing.image.utils.loag_img() and adds cropping.
Cropping method enumarated in interpolation
# Arguments
path: Path to image file.
color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
The desired image format.
target_size: Either `None` (default to original size)
or tuple of ints `(img_height, img_width)`.
interpolation: Interpolation and crop methods used to resample and crop the image
if the target size is different from that of the loaded image.
Methods are delimited by ":" where first part is interpolation and second is crop
e.g. "lanczos:random".
Supported interpolation methods are "nearest", "bilinear", "bicubic", "lanczos",
"box", "hamming" By default, "nearest" is used.
Supported crop methods are "none", "center", "random".
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
ValueError: if interpolation method is not supported.
"""
# Decode interpolation string. Allowed Crop methods: none, center, random
interpolation, crop = interpolation.split(":") if ":" in interpolation else (interpolation, "none")
if crop == "none":
return keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=target_size,
interpolation=interpolation)
# Load original size image using Keras
img = keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=None,
interpolation=interpolation)
# Crop fraction of total image
crop_fraction = 0.875
target_width = target_size[1]
target_height = target_size[0]
if target_size is not None:
if img.size != (target_width, target_height):
if crop not in ["center", "random"]:
raise ValueError('Invalid crop method {} specified.', crop)
if interpolation not in keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS:
raise ValueError(
'Invalid interpolation method {} specified. Supported '
'methods are {}'.format(interpolation,
", ".join(keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS.keys())))
resample = keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS[interpolation]
width, height = img.size
# Resize keeping aspect ratio
# result shold be no smaller than the targer size, include crop fraction overhead
target_size_before_crop = (target_width/crop_fraction, target_height/crop_fraction)
ratio = max(target_size_before_crop[0] / width, target_size_before_crop[1] / height)
target_size_before_crop_keep_ratio = int(width * ratio), int(height * ratio)
img = img.resize(target_size_before_crop_keep_ratio, resample=resample)
width, height = img.size
if crop == "center":
left_corner = int(round(width/2)) - int(round(target_width/2))
top_corner = int(round(height/2)) - int(round(target_height/2))
return img.crop((left_corner, top_corner, left_corner + target_width, top_corner + target_height))
elif crop == "random":
left_shift = random.randint(0, int((width - target_width)))
down_shift = random.randint(0, int((height - target_height)))
return img.crop((left_shift, down_shift, target_width + left_shift, target_height + down_shift))
return img
# Monkey patch
keras_preprocessing.image.iterator.load_img = load_and_crop_img

Does tensorflow have shape variables?

Often, I'd like to work with variable-size data, e.g. the number of samples.
To get this data into tensorflow, I use python variables (e.g. "num_samples=2000") to define shapes.
This means I have to re-create a new graph for each number of samples.
Setting validate_shape=False is not an option to me.
Is there a Tensorflow-way of having dimension sizes as variables?
tf.placeholder() allows you to create tensors which will be filled only at runtime ; and it allows to define tensors with variable-size dimensions using None in their shape.
tf.shape() gives you the dynamic size of a tensor, itself as a tensor (actually as a tf.TensorShape, which you can use e.g. to dynamically generate other tensors). See tf.TensorShape for more detailed explanations.
An example to hopefully make things clearer:
import tensorflow as tf
import numpy as np
# Creating a placeholder for 3-channel images with undefined batche size, height and width:
images = tf.placeholder(tf.float32, shape=(None, None, None, 3))
# Dynamically obtaining the actual shape of the images:
images_shape = tf.shape(images)
# Demonstrating how this shape can be use to dynamically create other tensors:
ones = tf.ones(images_shape, dtype=images.dtype)
images_plus1 = images + ones
with tf.Session() as sess:
for i in range(2):
# Generating a random number of images with a random HxW:
num_images = np.random.randint(1, 10)
height, width = np.random.randint(10, 20), np.random.randint(10, 20)
images_zero = np.zeros((num_images, height, width, 3), dtype=np.float32)
# Running our TF operation, feeding the placeholder with the actual images:
res = sess.run(images_plus1, feed_dict={images: images_zero})
print("Shape: {} ; Pixel Val: {}".format(res.shape, res[0, 0, 0]))
# > Shape: (6, 14, 13, 3) ; Pixel Val: [1. 1. 1.]
# > Shape: (8, 11, 15, 3) ; Pixel Val: [1. 1. 1.]
# ^ As you can see, we run the same graph each time with a different number of
# images / different shape

Theano Reshaping

I am unable to clearly comprehend theano's reshape. I have an image matrix of shape:
[batch_size, stack1_size, stack2_size, height, width]
, where there are stack2_size stacks of images, each having stack1_size of channels. I now want to convert them into the following shape:
[batch_size, stack1_size*stack2_size, 1 , height, width]
such that all the stacks will be combined together into one stack of all channels. I am not sure if reshape will do this for me. I see that reshape seems to not lexicographically order the pixels if they are mixed in dimensions in the middle. I have been trying to achieve this with a combination of dimshuffle,reshape and concatenate, but to no avail. I would appreciate some help.
Thanks.
Theano reshape works just like numpy reshape with its default order, i.e. 'C':
‘C’ means to read / write the elements using C-like index order, with
the last axis index changing fastest, back to the first axis index
changing slowest.
Here's an example showing that the image pixels remain in the same order after a reshape via either numpy or Theano.
import numpy
import theano
import theano.tensor
def main():
batch_size = 2
stack1_size = 3
stack2_size = 4
height = 5
width = 6
data = numpy.arange(batch_size * stack1_size * stack2_size * height * width).reshape(
(batch_size, stack1_size, stack2_size, height, width))
reshaped_data = data.reshape([batch_size, stack1_size * stack2_size, 1, height, width])
print data[0, 0, 0]
print reshaped_data[0, 0, 0]
x = theano.tensor.TensorType('int64', (False,) * 5)()
reshaped_x = x.reshape((x.shape[0], x.shape[1] * x.shape[2], 1, x.shape[3], x.shape[4]))
f = theano.function(inputs=[x], outputs=reshaped_x)
print f(data)[0, 0, 0]
main()

Resources