How to detect an object in an image rather than screen with pyautogui? - python-3.x

I am using pyautogui.locateOnScreen() function to locate elements in chrome and get their x,y coordinates and click them. But at some point I need to take a screenshot of a part of the screen and search for the object I want in this screenshot. Then I get coordinates of it. Is it possible to do it with pyautogui?
My example code:
coord_one = pyautogui.locateOnScreen("first_image.png",confidence=0.95)
scshoot = pyautogui.screenshot(region=coord_one)
coord_two = # search second image in scshoot and if it can be detected get coordinates of it.
If it is not possible with pyautogui, can you advice the easiest-smartest way?
Thanks in advance.

I don't believe there is a built-in direct way to do what you need but the python-opencv library does the job.
The following code sample assumes you have an screen capture you just took "capture.png" and you want to find "logo.png" in that capture, which you know is an subsection of "capture.png".
Minimal example
"""Get bounding box of cropped image from original image."""
import cv2 as cv
import numpy as np
img_rgb = cv.imread(r'res/original.png')
# the cropped image, expected to be smaller
target_img = cv.imread(r'res/crop.png')
_, w, h = target_img.shape[::-1]
res = cv.matchTemplate(img_rgb,target_img,cv.TM_CCOEFF_NORMED)
# with the method used, the date in res are top left pixel coords
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
top_left = max_loc
# if we add to it the width and height of the target, then we get the bbox.
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img_rgb,top_left, bottom_right, 255, 2)
cv.imshow('', img_rgb)
MatchTemplate
From the docs, MatchTemplate "simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image." Under the hood, this offers methods such as square difference to compare the images represented as arrays.
See more
For a more in-depth explanation, check the opencv docs as the code is entirely based off their example.

Related

opencv python alternative for pyautogui.locatecenteronscreen(), just for an image instead of screen

i was wondering how you would use opencv (cv2) in python for making an alternative to pyautogui.locatecenteronscreen() function, just useing an image instead of an screen.
i will try useing an example.
maybe an user defined function,locateCenterOfTemplate("Path/to/template.png")
and now since im useing a screenshot as original image, it will ofc be the same as if i would
use pyautoguis, but for my main purpose i wont ofc.
import cv2
import pyautogui
pyautogui.screenshot(Path/to/original_image.png)
def locateCenterOfTemplate(image, template, accuracy=100,
region=#whole screen idk how to do this eaither):
temp = locateCenterOfTemplate("Path/to/original_image.png", "Path/to/template.png")
# now variable "temp" is the same as the posision of the center of the template,
# inside of the source immage
pyautogui.click(temp)
Basicly, i would like to have template matching with reagion, confidence and both template and original image as a functino :)
Thanks :D
If you load the image and template using cv2.imread(path). You can use cv2.matchTemplate. A while back I used this code to match all templates on a screen with a confidence higher than threshold. You can use debug=True, to draw a box around the found templates in red (cv2 uses BGR).
def match_all(image, template, threshold=0.8, debug=False, color=(0, 0, 255)):
""" Match all template occurrences which have a higher likelihood than the threshold """
width, height = template.shape[:2]
match_probability = cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
match_locations = np.where(match_probability >= threshold)
# Add the match rectangle to the screen
locations = []
for x, y in zip(*match_locations[::-1]):
locations.append(((x, x + width), (y, y + height)))
if debug:
cv2.rectangle(image, (x, y), (x + width, y + height), color, 1)
return locations
It will return a list of bounding boxes for the areas that match. If you only want to return the highest match, you should adjust the match_locations line to:
match_location = np.unravel_index(match_probability.argmax(), match_probability.shape)
Alternatively if you are OK to use another library, you can take a look at Multi-Template-Matching, which returns a pandas DataFrame with the template name, bounding box and score.

How to adaptively split an image into regions and set a different text orientation for each one?

Input-Sample
I am trying to pre-process my images in order to improve the ocr quality. However, I am stuck with a problem.
The Images I am dealing with contain different text orientations within the same image (2 pages, 1st is vertical, the 2nd one is horizontally oriented and they are scanned to the same image.
The text direction is automatically detected for the first part. nevertheless, the rest of the text from the other page is completely missed up.
I was thinking of creating a zonal template to detect the regions of interest but I don't know how.
Or automatically detect the border and split the image adaptively then flip the splitted part to achieve the required result.
I could set splitting based on a fixed pixel height but it is not constant as well.
from tesserocr import PyTessBaseAPI, RIL
import cv2
from PIL import Image
with PyTessBaseAPI() as api:
filePath = r'sample.jpg'
img = Image.open(filePath)
api.SetImage(img)
boxes = api.GetComponentImages(RIL.TEXTLINE, True)
print('Found {} textline image components.'.format(len(boxes)))
for i, (im, box, _, _) in enumerate(boxes):
# im is a PIL image object
# box is a dict with x, y, w and h keys
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
for box in boxes:
box = boxes[0][1]
x = box.get('x')
y = box.get('y')
h = box.get('h')
w = box.get('w')
cimg = cv2.imread(filePath)
crop_img = cimg[y:y+h, x:x+w]
cv2.imshow("cropped", crop_img)
cv2.waitKey(0)
output image
as you can see i can apply an orientation detection but I wount get any meaningful text out of such an image.
Try Tesseract API method GetComponentImages and then DetectOrientationScript on each component image.

Creating a greyscale image with a Matrix in python

I'm Marius, a maths student in the first year.
We have recieved a team-assignment where we have to implement a fourier transformation and we chose to try to encode the transformation of an image to a JPEG image.
to simplify the problem for ourselves, we chose to do it only for pictures that are greyscaled.
This is my code so far:
from PIL import Image
import numpy as np
import sympy as sp
#
#ALLEMAAL INFORMATIE GEEN BEREKENINGEN
img = Image.open('mario.png')
img = img.convert('L') # convert to monochrome picture
img.show() #opens the picture
pixels = list(img.getdata())
print(pixels) #to see if we got the pixel numeric values correct
grootte = list(img.size)
print(len(pixels)) #to check if the amount of pixels is correct.
kolommen, rijen = img.size
print("het aantal kolommen is",kolommen,"het aantal rijen is",rijen)
#tot hier allemaal informatie
pixelMatrix = []
while pixels != []:
pixelMatrix.append(pixels[:kolommen])
pixels = pixels[kolommen:]
print(pixelMatrix)
pixelMatrix = np.array(pixelMatrix)
print(pixelMatrix.shape)
Now the problem forms itself in the last 3 lines. I want to try to convert the matrix of values back into an Image with the matrix 'pixelMatrix' as it's value.
I've tried many things, but this seems to be the most obvious way:
im2 = Image.new('L',(kolommen,rijen))
im2.putdata(pixels)
im2.show()
When I use this, it just gives me a black image of the correct dimensions.
Any ideas on how to get back the original picture, starting from the values in my matrix pixelMatrix?
Post Scriptum: We still have to implement the transformation itself, but that would be useless unless we are sure we can convert a matrix back into a greyscaled image.

Detect Color of particular area of Image Nodejs OpenCV

I'm trying to write code to detect the color of a particular area of an image.
So far I have come across is using OpenCV, we can do this, But still haven't found any particular tutorial to help with this.
I want to do this with javascript, but I can also use python OpenCV to get the results.
can anyone please help me with sharing any useful link or can explain how can I achieve detecting the color of the particular area in the image.
For eg.
The box in red will show a different color. I need to figure out which color it is showing.
What I have tried:
I have tried OpenCV canny images, though I am successful to get area separated with canny images, how to detect the color of that particular canny area is still a challenge.
Also, I tried it with inRange method from OpenCV which works perfect
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, lower, upper)
output = cv2.bitwise_and(image, image, mask = mask)
# show the images
cv2.imshow("images", np.hstack([image, output]))
It works well and extracts the color area from the image But is there any callback which responds if the image has particular color so that it can be all done automatically?
So I am assuming here that, you already know the location of the rect which is going to be dynamically changed and need to find out the single most dominant color in the desired ROI. There are a lot of ways to do the same, one is by getting the average, of all the pixels in the ROI, other is to count all the distinct pixel values in the given ROI, with some tolerance difference.
Method 1:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227) # left, top, bottom, right
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
print cv2.mean(cropped_img)
>>> (53.430516018839604, 41.05708814243569, 244.54991977640907, 0.0)
Method 2:
To find out the various dominant clusters in the given image you can use cv2.kmeans() as:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227)
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
Z = cropped_img.reshape((-1, 3))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 4
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Sort all the colors, as per their frequencies, as:
print center[sorted(range(K), key=lambda x: np.count_nonzero(label == [x]), reverse=True)[0]]
>>> [ 52.96525192 40.93861389 245.02325439]
#Prateek... nice to have the question narrowed down to the core. The code you provided does not address this issue at hand and remains just a question. I'll hint you towards a direction but you have to code it yourself.
steps that guide you towards a scripting result:
1) In your script add two (past & current) pixellists to store values (pixeltype + occurance).
2) Introduce a while-loop with an action true/stop statement (link to "3") for looping purpose because then it becomes a dynamic process.
3) Write a GUI with a flashy warning banner.
4) compare the pixellist with current_pixellist for serious state change (threshhold).
5) If the delta state change at "4" meets threshold throw the alert ("3").
When you've got written the code and enjoyed the trouble of tracking the tracebacks... then edit your question, update it with the code and reshape your question (i can help wiht that if you want). Then we can pick it up from there. Does that sound like a plan?
I am not sure why you need callback in this situation, but maybe this is what you mean?
def test_color(image, lower, upper):
mask = cv2.inRange(image, lower, upper)
return np.any(mask == 255)
Explanations:
cv2.inRange() will return 255 when pixel is in range (lower, upper), 0 otherwise (see docs)
Use np.any() to check if any element in the mask is actually 255

mangle images of vtk from itk

I am reading an image from SimpleITK but I get these results in vtk any help?
I am not sure where things are going wrong here.
Please see image here.
####
CODE
def sitk2vtk(img):
size = list(img.GetSize())
origin = list(img.GetOrigin())
spacing = list(img.GetSpacing())
sitktype = img.GetPixelID()
vtktype = pixelmap[sitktype]
ncomp = img.GetNumberOfComponentsPerPixel()
# there doesn't seem to be a way to specify the image orientation in VTK
# convert the SimpleITK image to a numpy array
i2 = sitk.GetArrayFromImage(img)
#import pylab
#i2 = reshape(i2, size)
i2_string = i2.tostring()
# send the numpy array to VTK with a vtkImageImport object
dataImporter = vtk.vtkImageImport()
dataImporter.CopyImportVoidPointer( i2_string, len(i2_string) )
dataImporter.SetDataScalarType(vtktype)
dataImporter.SetNumberOfScalarComponents(ncomp)
# VTK expects 3-dimensional parameters
if len(size) == 2:
size.append(1)
if len(origin) == 2:
origin.append(0.0)
if len(spacing) == 2:
spacing.append(spacing[0])
# Set the new VTK image's parameters
#
dataImporter.SetDataExtent (0, size[0]-1, 0, size[1]-1, 0, size[2]-1)
dataImporter.SetWholeExtent(0, size[0]-1, 0, size[1]-1, 0, size[2]-1)
dataImporter.SetDataOrigin(origin)
dataImporter.SetDataSpacing(spacing)
dataImporter.Update()
vtk_image = dataImporter.GetOutput()
return vtk_image
###
END CODE
You are ignoring two things:
There is an order change when you perform GetArrayFromImage:
The order of index and dimensions need careful attention during conversion. Quote from SimpleITK Notebooks at http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/01_Image_Basics.html:
ITK's Image class does not have a bracket operator. It has a GetPixel which takes an ITK Index object as an argument, which is an array ordered as (x,y,z). This is the convention that SimpleITK's Image class uses for the GetPixel method as well.
While in numpy, an array is indexed in the opposite order (z,y,x).
There is a change of coordinates between ITK and VTK image representations. Historically, in computer graphics there is a tendency to align the camera in such a way that the positive Y axis is pointing down. This results in a change of coordinates between ITK and VTK images.

Resources