While I know how to convert a single color image (32,32,3) to grayscale using CV2:
img = cv2.cvtColor( img, cv2.COLOR_RGB2GRAY )
I need to convert a batch of 60,000 images in a 4D numpy array (60000,32,32,3), how can I achieve that?
Let's say your 4D array of images is called img_stack with shape (60000,32,32,3).
You could do:
gray_stack = np.empty_like(img_stack[...,0])
for i in range(img_stack.shape[0]):
gray_stack[i] = cv2.cvtColor(img_stack[i], cv2.COLOR_RGB2GRAY)
Resulting shape is (60000,32,32).
Or you could do:
gray_stack = np.empty_like(img_stack[...,:1])
for i in range(img_stack.shape[0]):
gray_stack[i,:,:,0] = cv2.cvtColor(img_stack[i], cv2.COLOR_RGB2GRAY)
Resulting shape is (60000,32,32,1).
Bonus Tensorflow solution:
gray_stack = tf.image.rgb_to_grayscale(img_stack, name=None)
Resulting shape will be (60000,32,32,1).
The above OpenCV solutions might actually perform faster.
One more option using numpy:
grayscale_imgs = np.dot(img_stack, [0.299 , 0.587, 0.114])
grayscale_imgs.shape # => (60000, 32, 32)
More about the weighted sum can be found here
Related
I know how to use the ImageDataGenerator to augment my data by translating, flipping, rotationg, shearing, etc. The question is let's say that I have both a training image and the corresponding segmentation images and I would like to augment both of these images. For example if I rotated a training image by 45 degrees then I would also like to augment the segmentation image by 45 degrees. In essence I want to perform the identical set of transforms to two data sets. Is that possible to do with ImageDataGenerator, or do I have to write all the augmentation functions from scratch? Thanks very much in advance.
You can use augmentations in tf.data.Dataset.map and return the image twice. I don't know of any way to do this with ImageDataGenerator.
import tensorflow as tf
import matplotlib.pyplot as plt
from skimage import data
cats = tf.concat([data.chelsea()[None, ...] for i in range(24)], axis=0)
test = tf.data.Dataset.from_tensor_slices(cats)
def augment(image):
image = tf.cast(x=image, dtype=tf.float32)
image = tf.divide(x=image, y=tf.constant(255.))
image = tf.image.random_hue(image=image, max_delta=5e-1)
image = tf.image.random_brightness(image=image, max_delta=2e-1)
return image, image
test = test.batch(1).map(augment)
fig = plt.figure()
plt.subplots_adjust(wspace=.1, hspace=.2)
images = next(iter(test.take(1)))
for index, image in enumerate(images):
ax = plt.subplot(1, 2, index + 1)
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(tf.clip_by_value(tf.squeeze(image), clip_value_min=0, clip_value_max=1))
plt.show()
I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
I'm doing some image machine learning by keras and if i put one picture converted to numpy.array in my model, it returns a 4d numpy array(predicted picture).
I want to convert that array to image by using Image.fromarray in PIL library.
but Image.fromarray only accept 2d array or 3d array.
my predicted picture's array shape is (1, 256, 256, 3) 1 means number of data.
so 1 is useless data for image. I want to convert it to(256,256,3) with not damaging image data. what should I do? Thanks for your time.
1 is not useless data, it is a singular dimension. You can just leave it out, the size of the data wouldn't change.
You can do that with numpy.squeeze.
Also, make sure that your data is in the right format, for Image.fromarray this is uint8.
Example:
import numpy as np
from PIL import Image
data = np.ones((1,16,16,3))
for i in range(16):
data[0,i,i,1] = 0.0
print("size: %s, type: %s"%(data.shape, data.dtype))
# size: (1, 16, 16, 3), type: float64
data_img = (data.squeeze()*255).astype(np.uint8)
print("size: %s, type: %s"%(data_img.shape, data_img.dtype))
# size: (16, 16, 3), type: uint8
img = Image.fromarray(data_img, mode='RGB')
img.show()
I am using mnist dataset for training a capsule network in keras background.
After training, I want to display an image from mnist dataset. For loading images, mnist.load_data() is used. The data is stored as (x_train, y_train),(x_test, y_test).
Now, for visualizing image, my code is as follows:
img_path = x_test[1]
print(img_path.shape)
plt.imshow(img_path)
plt.show()
The code gives output as follows:
(28, 28, 1)
and the error on plt.imshow(img_path) as follows:
TypeError: Invalid dimensions for image data
How to show image in png format. Help!
As per the comment of #sdcbr using np.sqeeze reduces unnecessary dimension. If image is 2 dimensions then imshow function works fine. If image has 3 dimensions then you have to reduce extra 1 dimension. But, for higher dim data you will have to reduce it to 2 dims, so np.sqeeze may be applied multiple times. (Or you may use some other dim reduction functions for higher dim data)
import numpy as np
import matplotlib.pyplot as plt
img_path = x_test[1]
print(img_path.shape)
if(len(img_path.shape) == 3):
plt.imshow(np.squeeze(img_path))
elif(len(img_path.shape) == 2):
plt.imshow(img_path)
else:
print("Higher dimensional data")
Example:
plt.imshow(test_images[0])
TypeError: Invalid shape (28, 28, 1) for image data
Correction:
plt.imshow((tf.squeeze(test_images[0])))
Number 7
You can use tf.squeeze for removing dimensions of size 1 from the shape of a tensor.
plt.imshow( tf.shape( tf.squeeze(x_train) ) )
Check out TF2.0 example
matplotlib.pyplot.imshow() does not support images of shape (h, w, 1). Just remove the last dimension of the image by reshaping the image to (h, w): newimage = reshape(img,(h,w)).
I'm studying deep learning. Trained an image classification algorithm. The problem is, however, that to train images I used:
test_image = image.load_img('some.png', target_size = (64, 64))
test_image = image.img_to_array(test_image)
While for actual application I use:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
But I found that those give a different ndarray (different data):
Last entries from load_image:
[ 64. 71. 66.]
[ 64. 71. 66.]
[ 62. 69. 67.]]]
Last entries from cv2.imread:
[ 15 23 27]
[ 16 24 28]
[ 14 24 28]]]
, so the system is not working. Is there a way to match results of one to another?
OpenCV reads images in BGR format whereas in keras, it is represented in RGB. To get the OpenCV version to correspond to the order we expect (RGB), simply reverse the channels:
test_image = cv2.imread('trick.png')
test_image = cv2.resize(test_image, (64, 64))
test_image = test_image[...,::-1] # Added
The last line reverses the channels to be in RGB order. You can then feed this into your keras model.
Another point I'd like to add is that cv2.imread usually reads in images in uint8 precision. Examining the output of your keras loaded image, you can see that the data is in floating point precision so you may also want to convert to a floating-point representation, such as float32:
import numpy as np
# ...
# ...
test_image = test_image[...,::-1].astype(np.float32)
As a final point, depending on how you trained your model it's usually customary to normalize the image pixel values to a [0,1] range. If you did this with your keras model, make sure you divide your values by 255 in your image read in through OpenCV:
import numpy as np
# ...
# ...
test_image = (test_image[...,::-1].astype(np.float32)) / 255.0
Recently, I came across the same issue. I tried to convert the color channel and resize the image with OpenCV. However, PIL and OpenCV have very different ways of image resizing.
Here is the exact solution to this problem.
This is the function that takes image file path , convert to targeted size and prepares for the Keras model -
import cv2
import keras
import numpy as np
from keras.preprocessing import image
from PIL import Image
def prepare_image (file):
im_resized = image.load_img(file, target_size = (224,224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
PIL_image = prepare_image ("lena.png")
If you have an OpenCV image then the function will be like this -
def prepare_image2 (img):
# convert the color from BGR to RGB then convert to PIL array
cvt_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
im_pil = Image.fromarray(cvt_image)
# resize the array (image) then PIL image
im_resized = im_pil.resize((224, 224))
img_array = image.img_to_array(im_resized)
image_array_expanded = np.expand_dims(img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input(image_array_expanded)
# execute the function
img = cv2.imread("lena.png")
cv2_image = prepare_image2 (img)
# finally check if it is working
np.array_equal(PIL_image, cv2_image)
>> True
Besides CV2 using the BGR format and Keras (using PIL as a backend) using the RGB format, there are also significant differences in the resize methods of CV2 and PIL using the same parameters.
Multiple references can be found on the internet but the general idea is that there are subtle differences in pixel coordinate systems used in the two resize algorithms and also potential issues with different methods of casting to float as an intermediate step in the interpolation algo. End result is a visually similar image but one that is slightly shifted/perturbed between versions.
A perfect example of an adversarial attack that can cause huge differences in accuracy despite small input differences.