I'm trying to apply a binary mask to an RGB image with numpy
I found this https://stackoverflow.com/a/26843467/4628384 but I don't have permissions to write a comment yet.
Anyway I'm running into a problem; Any help much appreciated.
def masktoRGB(self,image,image_mask):
# create mask with same dimensions as image
mask = np.zeros_like(image)
# copy your image_mask to all dimensions (i.e. colors) of your image
for i in range(3):
mask[:,:,i] = image_mask.copy()
# apply the mask to your image
# tried to swap axes, not a solution
#image = image.swapaxes(0,1)
#this gives the error:
masked_image = image[mask]
print(mask.shape)
print(image.shape)
print(image_mask.shape)
return masked_image
this gives me:
IndexError: index 213 is out of bounds for axis 0 with size 212
print output:
(188, 212, 3)
(188, 212, 3)
(188, 212)
image and image_mask are the same image, except the first is RGB and the second is mode L
Try to use broadcasting and multiplication:
image * image_mask[..., None]
I assume that the type of image_mask is bool that maps to numbers 0 and 1. Therefore pairwise multiplication of image and mask will set masked values to zero.
Similar efect can be achieved with np.where() or & operator.
The issue is that shapes of image and image_mask are not compatible. Numpy will first add extra dimensions at the head of shape until both tensor have the same shape. Thus image_mask is reshaped from (188, 212) to (1,188, 212). This new shape is not compatible with the shape of image
(188, 212, 3).
The trick is to reshape image_mask using fancy indexing. You can use None as index to add a dummy dimension of size 1 at the end of shape. Operation image_mask[..., None] will reshape it from (188, 212) to (188, 212,1).
Broadcast rules tells that dimensions of size 1 are expanded by repeating all values along broadcast dimension. Thus numoy will automatically reshape tensir from (188, 212,1) to (188, 212,3). Operation is very fast because no copy is created.
Now bith tensors can be multiplied producing desired result.
Related
Convolution for a grayscale image is straightforward. You have a filter of shape nxnx1and convolve the input image to extract whatever features you desire.
I also understand how convolution would work for a RGB image. The filter would have a shape of nxnx3. However, would all 3 'layers' in the filter hold the same kernel? For example, if the 0th layer a map as shown below, would layer 1 and 2 also hold the exact values? I am asking in regards to Convolutional Neural Networks and not conventional image processing. I understand the weights of each filter are learned and are randomized initially, am I correct in thinking that each layer would have different randomized values?
Would all 3 'layers' in the filter hold the same kernel?
The short answer is no. The longer answer is, there isn't a kernel per layer, but instead just one kernel which handles all input and output layer at once.
The code below shows step by step how one would calculate each convolution manually, and from this we can see that at a high level the calculation goes like this:
take a patch from the batch of images (BatchSize x 3x3x3 in your case)
flatten it [BatchSize, 27]
matrix multiply it by the reshaped kernel [27, output_filters]
add in the bias of shape [output_filters]
All the colors are processed at once using matrix multiplication with the kernel matrix. If we think about the kernel matrix, we can see that the values in the kernel matrix that are used to generate the first filter are in the first column, and the values to generate the second filter are in the second column. So, indeed, the values are different and not reused, but they are not stored or applied separately.
The code walkthrough
import tensorflow as tf
import numpy as np
# Define a 3x3 kernel that after convolution will create an image with 2 filters (channels)
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=3)
# Lets create a random input image
starting_image = np.array( np.random.rand(1,4,4,3), dtype=np.float32)
# and process it
result = conv_layer(starting_image)
weight, bias = conv_layer.get_weights()
print('size of weight', weight.shape)
print('size of bias', bias.shape)
size of weight (3, 3, 3, 2)
size of bias (2,)
# The output of the convolution of the 4x4x3 image input
# is a 2x2x2 output (because we don't have padding)
result.numpy()
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
# Now let's see how we can recreate this using the weights
# The way convolution is done is to extract a patch
# the size of the kernel (3x3 in this case)
# We will use the first patch, the first three rows and columns and all the colors
patch = starting_image[0,:3,:3,:]
print('patch.shape' , patch.shape)
# Then we flatten the patch
flat_patch = np.reshape( patch, [1,-1] )
print('New shape is', flat_patch.shape)
patch.shape (3, 3, 3)
New shape is (1, 27)
# next we take the weight and reshape it to be [-1,filters]
flat_weight = np.reshape( weight, [-1,2] )
print('flat_weight shape is ',flat_weight.shape)
flat_weight shape is (27, 2)
# we have the patch of shape [1,27] and the weight of [27,2]
# doing a matric multiplication of the two shapes [1,27]*[27,2] = a shape of [1,2]
# which is the output we want, 2 filter outputs for this patch
output_for_patch = np.matmul(flat_patch,flat_weight)
# but we haven't added the bias yet, so lets do that
output_for_patch = output_for_patch + bias
# Finally, we can see that our manual calculation matches
# what Conv2D does exactly for the first patch
output_for_patch
array([[-0.34940773, -0.64269245]], dtype=float32)
If we compare this to the full convolution above, we can see that this is exactly the first patch
array([[[[-0.34940776, -0.6426925 ],
[-0.81834394, -0.16166998]],
[[-0.37515935, -0.28143463],
[-0.60084903, -0.5310158 ]]]], dtype=float32)
We would repeat this process for each patch. If we want to optimize this code some more, instead of passing only one image patch at a time [1,27] we can pass [batch_number,27] patches at a time and the kernel will process them all at once returning [batch_number,filter_size].
I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape. But i need only one countour for shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
img = cv2.imread("target2.jpg",0)
img = cv2.resize(img,(1280,720)) # resizing image as it is large in size
_,thr1 = cv2.threshold(img,220,255,cv2.THRESH_BINARY_INV) # convering to binary
kernal = np.ones((2,2),np.uint8) #creating small size kernel
dilation = cv2.dilate(thr1, kernal , iterations=1) # dialating the small pixels in image
contours, hireracy = cv2.findContours(dilation,cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE) # finding the countours in image
count =0 #conting the rectangles
for i,contour in enumerate(contours):
approx = cv2.approxPolyDP(contour,0.01*cv2.arcLength(contour,True),True); #using approxpoly to define the arc thickness
if len(approx) ==4: # is the poly contains four vertices then its an rectangle
X,Y,W,H = cv2.boundingRect(approx)
aspectratio = float(W)/H
if aspectratio >=1.5 :
count = count +1
cv2.drawContours(img, [approx], 0, (0,255,0),5)
x = approx.ravel()[0]
y = approx.ravel()[1]
cv2.putText(img, "rectangle"+str(i),(x,y),cv2.FONT_HERSHEY_COMPLEX, 0.5, (0,255,0))
print(count)
cv2.imshow("image",img)
cv2.waitKey(0)
cv2.destroyAllWindows()
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
I am trying to extract the rectangles in an image after extraction, getting two contours for each shape that is detected i.e, lower bound and upper bound of each shape
I tried to convert the image to binary and applied dilation to it and extracted each contour. Here iam getting two countours for each shape but i need only one contour for each shape, how can i get only one countour for one shape.
I am trying to mask two image with each other. The first one is a regular RGB image and the second one is one channel binary image.
I stacked the binary image so it will be 3 channel like the first image and then used bitwise_and. I still get the error
(-209:Sizes of input arguments do not match) The operation is neither 'array op array' (where arrays have the same size and type), nor 'array op scalar', nor 'scalar op array' in function 'cv::binary_op'
im = cv2.imread(image_name) # shape : (1280, 960, 3)
h, w = im.shape[:2]
mask = predict[0] #output of some network with shape of (256, 256, 1) and type of numpy.ndarray
mask = cv2.resize(mask, (w, h)) # resizing mask to the same shape if input
mask = np.stack((mask,)*3, axis=-1) # make it 3 channel, shape : (1280, 960, 3)
output = cv2.bitwise_and(im, mask)
I examined the shape of the mask and input right before computing output and they were they same size and type. Does anyone know where the problem is?
I was making histogram using numpy array in Python with open cv. The code is as follows:
#finding histogram of an image
import numpy as np
import cv2
img = cv2.imread("cr7.jpg")
gry_img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
a=np.zeros((1,256),dtype=np.uint8)
#finding how many times a particular pixel intensity repeats
for x in range (0,183): #size of gray_img is (184,275)
for y in range (0,274):
g=gry_ img[x,y]
a[g]=a[g]+1
print(a)
Error is as follows:
IndexError: index 150 is out of bounds for axis 0 with size 1
Since you haven't supplied the image, it is only from guessing that it seems you've made a mistake with the dimensions of the image. Alternatively the issue is entirely with the shape of your results array a.
The code you have is rather fragile, and here is a cleaner way to interact with images. I use an image from opencv's data directory: aero1.jpg.
The code here resolves both potential issues identified above, whichever one it was:
fname = 'aero1.jpg'
im = cv2.imread(fname)
gry_img = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
gry_img.shape
>>> (480, 640)
# note that the image is 640pix wide by 480 tall;
# the numpy array shows the number of rows first.
# rows are in y / columns are in x
# NOTE the results array `a` need only be 1-dimensional, not 2d (1x256)
a=np.zeros((256, ), dtype=np.uint8)
# iterating over all pixels, whatever the shape of the image.
height, width = gry_img.shape
for x in xrange(width):
for y in xrange(height):
g = gry_img[y, x] # NOTE y, x not x, y
a[g] += 1
But note that you could also achieve this easily with a numpy function np.histogram (docs), with slightly careful handling of the bin edges.
histb, bin_edges = np.histogram(gry_img.reshape(-1), bins=xrange(0, 257))
# check that we arrived at the same result as iterating manually:
(a == histb).all()
>>> True