OpenCV and Matplotlib show an image in differnt ways - python-3.x

I am trying to plot an image after some processing. I get three different images using the three options below. The image obtained is after applying the Sobel filter twice on a road lane image.
sample_image.jpg
The three methods to plot are shown in the below Python code.
img = cv2.imread('sample_image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gaussian = cv2.GaussianBlur(gray,(3,3),0)
sobely = cv2.Sobel(gaussian,cv2.CV_64F,1,0,ksize=5) # y
sobelyy = cv2.Sobel(sobely,cv2.CV_64F,1,0,ksize=5) # y
# method 1
cv2.imshow('sobelyy', sobelyy)
# method 2
cv2.imwrite('filtered_img1.JPG', sobelyy)
s_img = cv2.imread('filtered_img1.JPG')
cv2.imshow('s_img', s_img)
# method 3
plt.figure()
plt.imshow(sobelyy, cmap='gray')
plt.title('Filtered sobelyy image, B(x,y)'), plt.xticks([]), plt.yticks([])
plt.show()
The images I get are:
method 1
method 2
method 3
The image I want to get is the one obtained in method 3.
Why are the images shown in different ways?
How can I get to save the output image like the result of method 3?
Thank you in advance!

Why are the images shown in different ways?
OpenCV and Matplotlib use different color spaces to display images - that's why they look differently even when they are actually the same.
As for your first 2 methods those should actually look the same and they do when I try out your code.
How can I get to save the output image like the result of method 3?
Matplotlib has a build in function to write plotted images to disc, just use:
plt.savefig('your_filename.png')

Related

How to detect an object in an image rather than screen with pyautogui?

I am using pyautogui.locateOnScreen() function to locate elements in chrome and get their x,y coordinates and click them. But at some point I need to take a screenshot of a part of the screen and search for the object I want in this screenshot. Then I get coordinates of it. Is it possible to do it with pyautogui?
My example code:
coord_one = pyautogui.locateOnScreen("first_image.png",confidence=0.95)
scshoot = pyautogui.screenshot(region=coord_one)
coord_two = # search second image in scshoot and if it can be detected get coordinates of it.
If it is not possible with pyautogui, can you advice the easiest-smartest way?
Thanks in advance.
I don't believe there is a built-in direct way to do what you need but the python-opencv library does the job.
The following code sample assumes you have an screen capture you just took "capture.png" and you want to find "logo.png" in that capture, which you know is an subsection of "capture.png".
Minimal example
"""Get bounding box of cropped image from original image."""
import cv2 as cv
import numpy as np
img_rgb = cv.imread(r'res/original.png')
# the cropped image, expected to be smaller
target_img = cv.imread(r'res/crop.png')
_, w, h = target_img.shape[::-1]
res = cv.matchTemplate(img_rgb,target_img,cv.TM_CCOEFF_NORMED)
# with the method used, the date in res are top left pixel coords
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
top_left = max_loc
# if we add to it the width and height of the target, then we get the bbox.
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img_rgb,top_left, bottom_right, 255, 2)
cv.imshow('', img_rgb)
MatchTemplate
From the docs, MatchTemplate "simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image." Under the hood, this offers methods such as square difference to compare the images represented as arrays.
See more
For a more in-depth explanation, check the opencv docs as the code is entirely based off their example.

Change width of image in Opencv using Numpy

I'm making a Python file that will make a filter to have color on the Canny filter in OpenCV. I do this change from grayscale to color using the code provided below. My problem is when I apply the concatenate method (to add the color back as Canny filter is converted to grayscale), it cuts the width of the screen in 3 as I show in the 2 screenshots of before the color is added and after. The code snippet shown is only the transformation from grayscale to colored images.
What I've tried:
Tried using NumPy.tile: this wasn't the wisest attempt as it just repeated the same 1/3 of the screen twice more and didn't expand it to take up the whole screen as I had hoped.
Tried changing the image to only be from the index of 1/3 of the screen to cover the entire screen.
Tried setting the column index that is blank to equal None.
Image without the color added
Image with the color added
My code:
def convert_pixels(image, color):
rows, cols = image.shape
concat = np.zeros(image.shape)
image = np.concatenate((image, concat), axis=1)
image = np.concatenate((image, concat), axis=1)
image = image.reshape(rows, cols, 3)
index = image.nonzero()
#TODO: turn color into constantly changing color wheel or shifting colors
for i in zip(index[0], index[1], index[2]):
color.next_color()
image[i[0]][i[1]] = color.color
#TODO: fix this issue below:
#image[:, int(cols/3):cols] = None # turns right side (gliched) into None type
return image, color
In short, you're using concatenate on the wrong axis. axis=1 is the "columns" axis, so you're just putting two copies of zeros next to each other in the x direction. Since you want a three-channel image I would just initialize color_image with three channels and leave the original grayscale image alone:
def convert_pixels(image,color):
rows, cols = image.shape
color_image = np.zeros((rows,cols,3),dtype=np.uint8)
idx = image.nonzero()
for i in zip(*idx):
color_image[i] = color.color
return color_image,color
I've changed the indexing to match. I can't check this exactly since I don't know what your color object is, but I can confirm this works in terms of correctly shaping and indexing the new image.

I can't generate a word cloud with some images

I just start the module worcloud in Python 3.7, and I'm using the next cxode to generate wordclouds from a dictionary and I'm trying to use differents masks, but this works for some images: in two cases works with images of 831x816 and 1000x808. This has to be with the size of the image? Or is because the images is kind a blurry? Or what is it?
I paste my code:
from PIL import Image
our_mask = np.array(Image.open('twitter.png'))
twitter_cloud = WordCloud(background_color = 'white', mask = our_mask)
twitter_cloud.generate_from_frequencies(frequencies)
twitter_cloud.to_file("twitter_cloud.jpg")
plt.imshow(twitter_cloud)
plt.axis('off')
plt.show()
How can i fix this?
I had a similar problem with a black-and-white image I used. What fixed it for me was when I cropped the image more closely to the black drawing so there was no unnecessary bulk white area on the edges.
Some images should be adjusted for the process. Note only white point values for image is mask_out (other values are mask_in). The problem is that some of images are not suitable for masking. The reason is that the color's np.array somewhat mismatches. To solve this, following can be done:
1.Creating mask object: (Please try with your own image as I couldn't upload:)
import numpy as np;
import pandas as pd;
from PIL import Image;
from wordcloud import WordCloud
mask = np.array(Image.open("filepath/picture.png"))
print(mask)
If the output values for white np.array is 255, then it is okay. But if it is 0 or probably other value, we have to change this to 255.
2.In the case of other values, the code for changing the values:
2-1. Create function for transforming (here our value = 0)
def transform_zeros(val):
if val == 0:
return 255
else:
return val
2-2. Creating the same shaped np.array:
maskable_image = np.ndarray((mask.shape[0],mask.shape[1]), np.int32)
2-3. Transformation:
for i in range(len(mask)):
maskable_image[i] = list(map(transform_zeros, mask[i]))
3.Checking:
print(maskable_image)
Then you can use this array for your mask.
mask = maskable_image
All this is copied and interpreted from this link, so check it if you find my attempted explanation unclear, as I just provided solution but don't understand that much about color arrays of image and its transformation.

How to adaptively split an image into regions and set a different text orientation for each one?

Input-Sample
I am trying to pre-process my images in order to improve the ocr quality. However, I am stuck with a problem.
The Images I am dealing with contain different text orientations within the same image (2 pages, 1st is vertical, the 2nd one is horizontally oriented and they are scanned to the same image.
The text direction is automatically detected for the first part. nevertheless, the rest of the text from the other page is completely missed up.
I was thinking of creating a zonal template to detect the regions of interest but I don't know how.
Or automatically detect the border and split the image adaptively then flip the splitted part to achieve the required result.
I could set splitting based on a fixed pixel height but it is not constant as well.
from tesserocr import PyTessBaseAPI, RIL
import cv2
from PIL import Image
with PyTessBaseAPI() as api:
filePath = r'sample.jpg'
img = Image.open(filePath)
api.SetImage(img)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print('Found {} textline image components.'.format(len(boxes)))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
for box in boxes:
box = boxes[0][1]
x = box.get('x')
y = box.get('y')
h = box.get('h')
w = box.get('w')
cimg = cv2.imread(filePath)
crop_img = cimg[y:y+h, x:x+w]
cv2.imshow("cropped", crop_img)
cv2.waitKey(0)
output image
as you can see i can apply an orientation detection but I wount get any meaningful text out of such an image.
Try Tesseract API method GetComponentImages and then DetectOrientationScript on each component image.

Creating a greyscale image with a Matrix in python

I'm Marius, a maths student in the first year.
We have recieved a team-assignment where we have to implement a fourier transformation and we chose to try to encode the transformation of an image to a JPEG image.
to simplify the problem for ourselves, we chose to do it only for pictures that are greyscaled.
This is my code so far:
from PIL import Image
import numpy as np
import sympy as sp
#
#ALLEMAAL INFORMATIE GEEN BEREKENINGEN
img = Image.open('mario.png')
img = img.convert('L') # convert to monochrome picture
img.show() #opens the picture
pixels = list(img.getdata())
print(pixels) #to see if we got the pixel numeric values correct
grootte = list(img.size)
print(len(pixels)) #to check if the amount of pixels is correct.
kolommen, rijen = img.size
print("het aantal kolommen is",kolommen,"het aantal rijen is",rijen)
#tot hier allemaal informatie
pixelMatrix = []
while pixels != []:
pixelMatrix.append(pixels[:kolommen])
pixels = pixels[kolommen:]
print(pixelMatrix)
pixelMatrix = np.array(pixelMatrix)
print(pixelMatrix.shape)
Now the problem forms itself in the last 3 lines. I want to try to convert the matrix of values back into an Image with the matrix 'pixelMatrix' as it's value.
I've tried many things, but this seems to be the most obvious way:
im2 = Image.new('L',(kolommen,rijen))
im2.putdata(pixels)
im2.show()
When I use this, it just gives me a black image of the correct dimensions.
Any ideas on how to get back the original picture, starting from the values in my matrix pixelMatrix?
Post Scriptum: We still have to implement the transformation itself, but that would be useless unless we are sure we can convert a matrix back into a greyscaled image.

Resources