I am currently working on an Image segmentation problem. As part of preprocessing, I'm trying to create mask values for 2 classes [0, 1]. While, saving the processed tensor and loading them back produces different mask values. My current guess is under the hood PIL normalizing pixel values.
If so how do I stop it from doing?
I have created below a simple example explaining the same.
tensor_img = torch.where(torch.Tensor(250,250,3) > 0, 1, 0)
img_arr = tensor_img.numpy().astype(np.uint8)
np.unique(img_arr, return_counts=True)
(array([0, 1], dtype=uint8), array([148148, 39352]))
img = Image.fromarray(img_arr)
img.save("tmp.jpg")
#read saved image
img = PIL.create("tmp.jpg")
tensor(img).unique(return_counts=True)
(tensor([0, 1], dtype=torch.uint8), tensor([62288, 212]))
For this simple case (only 2 classes), you need to work with png and not jpeg since jpeg is a lossy compression and png is lossless.
tensor_img = torch.where(torch.Tensor(250,250,3) > 0, 1, 0)
img_arr = tensor_img.numpy().astype(np.uint8)
np.unique(img_arr, return_counts=True)
(array([0, 1], dtype=uint8), array([159189, 28311]))
img = Image.fromarray(img_arr)
img.save("tmp.png")
#read saved image
img = np.array(Image.open("tmp.png"))
torch.tensor(img).unique(return_counts=True)
(tensor([0, 1], dtype=torch.uint8), tensor([159189, 28311]))
For more classes it is preferred to work with color map.
Related
I have a grayscale image, I would like to split it into pixels and determine the grayscale in each pixel of the image.
The array is needed in the following form: (pixel by X, pixel by Y, gray shade 0-255).
1,1,25;
1,2,36;
1,3,50;
.
.
.
50,60,96;
.
.
.
if the image is 500 by 600 dots, then at the end it should get - (500,600, grayscale).
Could you tell me please, how can I get such an array of data from an image? What do I need to do? Which libraries should I use? If there is someone who has solved such a problem, please give an example. Thank you very much!
If you already have an image file, you can read it like this:
from PIL import Image
img = Image.open('/path/to/image.png')
To get this an an array:
import numpy as np
ima = np.asarray(img)
If it's really an 8-bit greyscale image, you might be able to use Image.open('image.png', mode='L'), but in any case you can always just get the red channel with ima[:, :, 0]. If it's greyscale, all the channels will be equal.
Now you can stack these grey levels with the coordinates:
h, w, _ = ima.shape
x, y = np.meshgrid(np.arange(w), np.arange(h))
np.dstack([x, y, ima[..., 0]])
I would do this:
# random data
np.random.seed(10)
img = np.random.randint(0,256, (500,600))
# coordinates
# np.meshgrid is also a good (better) choice
x, y = np.where(np.ones_like(img))
# put them together
out = np.stack([x,y, img.ravel()], axis=1)
Output:
array([[ 0, 0, 9],
[ 0, 1, 125],
[ 0, 2, 228],
...,
[499, 597, 111],
[499, 598, 128],
[499, 599, 8]])
I'm trying to display an local image, loaded with opencv v4.5.4.
My problem is that, the function cv2.cvtColor() does not do anything (the original and rgb images are the same, see below).
Thanks for your time !
original_image = cv2.imread('../Movie_Poster_Dataset/2015/tt1365050.jpg')
rgb_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
rows = 1
columns = 2
fig = plt.figure(figsize=(10, 7))
# Adds two subplots
fig.add_subplot(rows, columns, 1)
plt.imshow(original_image)
plt.axis('off')
fig.add_subplot(rows, columns, 2)
plt.imshow(rgb_image)
plt.axis('off')
plt.imshow(image)
plt.show()
Original image:
My result
You can swap the red and blue color channels directly:
original_image = cv2.imread('../Movie_Poster_Dataset/2015/tt1365050.jpg')
rgb_image[:, :, [0, 2]] = original_image[:, :, [2, 0]]
I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
I have written a function that takes two images of equal size and returns a combined image of the same size such such that all black pixels (where the BGR value is [0, 0, 0]) of the first image will be replaced by pixels of the second image.
My code looks like this:
def combine(img1, img2):
retImage = np.zeros((img1.shape[0], img1.shape[1], 3), dtype=np.uint8)
for x in range(img1.shape[0]):
for y in range(img1.shape[1]):
if (0 not in img1[x][y]):
retImage[x][y] = img1[x][y]
else:
retImage[x][y] = img2[x][y]
return retImage
Obviously this is very slow, especially since I'm processing several 4k images in sequence. Is there a more efficient way to do this (preferably using OpenCV functions, like thresholding/masks)?
The following code does what you want with Numpy operations which should be a lot more efficient than Python loops:
pixel_has_zero = np.any(img1 == 0, axis=2, keepdims=True)
retImage = np.where(pixel_has_zero, img2, img1)
This code is assuming that img1 and img2 are the same size. If that's not the case, you'll need to slice img2 beforehand.
img = cv2.imread('Ball.0.jpg', 0)
img = np.array(img)
print(img.size)
output: 7500
but I want the image size to be [50,50,3] how to do this?
The output is 50*50*3=7500
I want it in the format [50, 50, 3] for RGB.