I am trying to prepare the masks for image segmentation with Pytorch. I have three questions about data preparation.
What is the appropriate data format to save the binary mask in general? PNG? JPEG?
Is the mask size needed to be set square such as (224x224), not a rectangle such as (224x448)?
Is the mask value fixed when the size is converted from rectangle to square?
For example, the original mask image size is (600x900), which is binary [0,1]. However, when I applied
import torchvision.transforms as transforms
transforms.Compose([
transforms.Resize((300, 300)),
transforms.ToTensor(),
])
to the mask, the output had other values: 0.01, 0.0156, 0.22... except for 0 and 1, since the mask size was converted.
I applied the below code to convert the mask into the binary again if the value is less than 0.3, the value is 0, otherwise, 1.
def __getitem__(self, idx):
img, mask = self.load_data(idx)
if self.img_transforms is not None:
img = self.img_transforms(img)
if self.mask_transforms is not None:
mask = self.mask_transforms(mask)
mask = torch.where(mask<=0.3,0,1)
return img, mask
but I wonder the process is a common approach and efficient.
PNG, because it is lossless by design.
It depends. More convenient is to use standard resolution, (224x224), I would start with that.
Use resize without interpolation transforms.Resize((300, 300), interpolation=InterpolationMode.NEAREST)
Related
I was wondering if I can translate this opencv-python method into Pillow as I am forced furtherly to process it in Pillow.
A workaround I thought about would be to just save it with OpenCV and load it after with Pillow but I am looking for a cleaner solution, because I am using the remove_background() method's output as input for each frame of a GIF. Thus, I will read and write images N * GIF_frames_count times for no reason.
The method I want to convert from Pillow to opencv-python:
def remove_background(path):
img = cv2.imread(path)
# Convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Threshold input image as mask
mask = cv2.threshold(gray, 250, 255, cv2.THRESH_BINARY)[1]
# Negate mask
mask = 255 - mask
# Apply morphology to remove isolated extraneous noise
# Use border constant of black since foreground touches the edges
kernel = np.ones((3, 3), np.uint8)
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)
# Anti-alias the mask -- blur then stretch
# Blur alpha channel
mask = cv2.GaussianBlur(mask, (0, 0), sigmaX=2, sigmaY=2, borderType=cv2.BORDER_DEFAULT)
# Linear stretch so that 127.5 goes to 0, but 255 stays 255
mask = (2 * (mask.astype(np.float32)) - 255.0).clip(0, 255).astype(np.uint8)
# Put mask into alpha channel
result = img.copy()
result = cv2.cvtColor(result, cv2.COLOR_BGR2BGRA)
result[:, :, 3] = mask
return result
Code taken from: how to remove background of images in python
Rather than re-writing all the code using PIL equivalents, you could adopt the "if it ain't broke, don't fix it" maxim, and simply convert the Numpy array that the existing code produces into a PIL Image that you can use for your subsequent purposes.
That is this described in this answer, which I'll paraphrase as:
# Make "PIL Image" from Numpy array
pi = Image.fromarray(na)
Note that the linked answer refers to scikit-image (which uses RGB ordering like PIL) rather than OpenCV, so there is the added wrinkle that you will also need to reorder the channels from BGRA to RGBA, so the last couple of lines will look like:
...
...
result = cv2.cvtColor(result, cv2.COLOR_BGR2RGBA)
result[:, :, 3] = mask
pi = Image.fromarray(result)
When I pass an image and a mask to albumentations.Normalize(mean, std).
How would I go about incorporating this?
Should I just add it manually in dataset?
Grateful for any tips you have!
Edited:
Normalization works for three-channel images. If your mask image is grayscale image then probably you need to stack(image= np.stack((img,)*3, axis=-1))
it and make three channel image then apply albumentations's Normalization function. Official function for A.Normalize() is as following which deals with RGB images:
def normalize(img, mean, std, max_pixel_value=255.0):
mean = np.array(mean, dtype=np.float32)
mean *= max_pixel_value
std = np.array(std, dtype=np.float32)
std *= max_pixel_value
denominator = np.reciprocal(std, dtype=np.float32)
img = img.astype(np.float32)
img -= mean
img *= denominator
return img
According to Albumentations's docs, you can make a composition of Transforms and use it within PyTorch dataset.
import albumentations as A
from albumentations.pytorch import ToTensorV2
train_transform = A.Compose(
[
A.SmallestMaxSize(max_size=160),
A.ShiftScaleRotate(shift_limit=0.05, scale_limit=0.05, rotate_limit=15, p=0.5),
A.RandomCrop(height=128, width=128),
A.RGBShift(r_shift_limit=15, g_shift_limit=15, b_shift_limit=15, p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
]
)
train_dataset = CatsVsDogsDataset(images_filepaths=train_images_filepaths, transform=train_transform)
But I am not really sure that normalizing mask image is right way or not.
For the task that involves regression, I need to train my models to generate density maps from RGB images. To augment my dataset I have decided to flip all the images horizontally. For that matter, I also have to flip my ground truth images and I did so.
dataset_for_augmentation.listDataset(train_list,
shuffle=True,
transform=transforms.Compose([
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]),
target_transform=transforms.Compose([
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor()
]),
train=True,
resize=4,
batch_size=args.batch_size,
num_workers=args.workers),
But here is the problem : For some reason, PyTorch transforms.RandomHorizontalFlip function takes only PIL images (numpy is not allowed) as input. So I decided to convert the type to PIL Image.
img_path = self.lines[index]
img, target = load_data(img_path, self.train, resize=self.resize)
if type(target[0][0]) is np.float64:
target = np.float32(target)
img = Image.fromarray(img)
target = Image.fromarray(target)
if self.transform is not None:
img = self.transform(img)
target = self.target_transform(target)
return img, target
And yes, this operation need enormous amount of time. Considering I need this operation to be carried out for thousands of images, 23 seconds (should have been under half a second at most) per batch is not tolerable.
2019-11-01 16:29:02,497 - INFO - Epoch: [0][0/152] Time 27.095 (27.095) Data 23.150 (23.150) Loss 93.7401 (93.7401)
I would appreciate any suggestions to speed up my augmentation process
You don't need to change the DataLoader to do that. You can use ToPILImage():
transform=transforms.Compose([
transforms.ToPILImage(), # check mode assumption in the documentation
transforms.RandomHorizontalFlip(p=1),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
Anyway, I would avoid converting to PIL. It seems completely unnecessary. If you want to flip all images, then why not to do that using NumPy only?
img_path = self.lines[index]
img, target = load_data(img_path, self.train, resize=self.resize)
if type(target[0][0]) is np.float64:
target = np.float32(target)
# assuming width axis=1 -- see my comment below
img = np.flip(img, axis=1)
target = np.flip(target, axis=1)
if self.transform is not None:
img = self.transform(img)
target = self.target_transform(target)
return img, target
And remove the transforms.RandomHorizontalFlip(p=1) from the Compose. As ToTensor(...) also handles ndarray, you are good to go.
Note: I am assuming the width axis is equal to 1, since ToTensor expects it to be there.
From the docs:
Converts a PIL Image or numpy.ndarray (H x W x C) ...
More of an addition to #Berriel answer.
Horizontal Flip
You are using transforms.RandomHorizontalFlip(p=1) for both X and y images. In your case, with p=1, those will be transformed exactly the same but you are missing the point of data augmentation as the network will only see flipped images (instead of only original images). You should go for probability lower than 1 and higher than 0 (usually 0.5) to get high variability in versions of the image.
If that was the case (p=0.5), you can be more than certain that there will occur a situation, where X gets flipped and y doesn't.
I would advise to use albumentations library and it's albumentations.augmentations.transforms.HorizontalFlip to do the flip on both images the same way.
Normalization
You can find normalization with ImageNet means and stds already set up there as well.
Caching
Furthermore, to speed things up you could use torchdata third party library (disclaimer I'm the author). In your case you could transform image from PIL to Tensor, Normalize with albumentations, cache on disk or even better in RAM images after those transformations with torchdata and finally apply your transformations. This way would allow you to only apply HorizontalFlips on your image and target after initial epoch, previous steps would be pre-calculated.
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape. But i need only one countour for shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
img = cv2.imread("target2.jpg",0)
img = cv2.resize(img,(1280,720)) # resizing image as it is large in size
_,thr1 = cv2.threshold(img,220,255,cv2.THRESH_BINARY_INV) # convering to binary
kernal = np.ones((2,2),np.uint8) #creating small size kernel
dilation = cv2.dilate(thr1, kernal , iterations=1) # dialating the small pixels in image
contours, hireracy = cv2.findContours(dilation,cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE) # finding the countours in image
count =0 #conting the rectangles
for i,contour in enumerate(contours):
approx = cv2.approxPolyDP(contour,0.01*cv2.arcLength(contour,True),True); #using approxpoly to define the arc thickness
if len(approx) ==4: # is the poly contains four vertices then its an rectangle
X,Y,W,H = cv2.boundingRect(approx)
aspectratio = float(W)/H
if aspectratio >=1.5 :
count = count +1
cv2.drawContours(img, [approx], 0, (0,255,0),5)
x = approx.ravel()[0]
y = approx.ravel()[1]
cv2.putText(img, "rectangle"+str(i),(x,y),cv2.FONT_HERSHEY_COMPLEX, 0.5, (0,255,0))
print(count)
cv2.imshow("image",img)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
I have written a function that takes two images of equal size and returns a combined image of the same size such such that all black pixels (where the BGR value is [0, 0, 0]) of the first image will be replaced by pixels of the second image.
My code looks like this:
def combine(img1, img2):
retImage = np.zeros((img1.shape[0], img1.shape[1], 3), dtype=np.uint8)
for x in range(img1.shape[0]):
for y in range(img1.shape[1]):
if (0 not in img1[x][y]):
retImage[x][y] = img1[x][y]
else:
retImage[x][y] = img2[x][y]
return retImage
Obviously this is very slow, especially since I'm processing several 4k images in sequence. Is there a more efficient way to do this (preferably using OpenCV functions, like thresholding/masks)?
The following code does what you want with Numpy operations which should be a lot more efficient than Python loops:
pixel_has_zero = np.any(img1 == 0, axis=2, keepdims=True)
retImage = np.where(pixel_has_zero, img2, img1)
This code is assuming that img1 and img2 are the same size. If that's not the case, you'll need to slice img2 beforehand.