Python: PyAutoGui click location is off by a few pixels when using an image to locate - python-3.x

Goal of the program: open a web browser tab to youtube, use a saved image of the "Youtube" button on the Youtube home screen to move the mouse position to that location, do a mouse click when there
Issue: The mouse moves to a location that is off by a few pixels (-29 x, -35 y) when performing the click() step. The coordinates are correct at the time of locateCenterOnScreen but are different when it does click()
What I've tried: I had the program print out the coordinates of the picture when it takes it's location and at that point in time the coordinates are correct, I used a mouse position program to narrow down how much its off by.
My Question: What is causing the position of the click() to be offset by these few pixels and how do I fix it?
import pyautogui as auto
import webbrowser
import time
site = "https://www.youtube.com/"
webbrowser.open_new_tab(site)
time.sleep(5)
x, y = auto.locateCenterOnScreen('test.png')
print(x)
print(y)
try:
auto.click(x,y)
except:
print("Not Found")

I ended up re-taking the picture I used for the program to locate and it works now. I'm unsure why the original one did not work as intended though.

probably your window was resized so the widht and height of the image you where looking for also changed.
I recommend using:
win = pygetwindow.getWindowsWithTitle('windownname')[0]
win.size = (1600, 900)
to resize a window

Related

How to save pygame window as a image? (python3) (PyGame)

I'm making a pixel editor / a trash version of ms paint in python with pygame, and I want to save the window (canvas?) as a png or jpg. I've seen pygame.image.save, but that only saves a certain surface as an image, I want to save the entire window.
Give the following a try:
pygame.image.save(window, "screenshot.png")
Use pygame.image.save(), which requires PyGame 1.8 or later. If you give it the base-level window surface, it will indeed save the entire window content.
For example:
pygame.image.save( window, 'surface.png' )
The image type is determined by the filename suffix.

Matplotlib Text artist - how to get size? (not using pyplot)

Background
I've moved some code to use matplotlib only, instead of pyplot (The reason is it's generating png files in multi-process with futures, and pyplot isn't thread/process safe that way).
So I'm creating my figure and axis via matplotlib.Figure(), not via pyplot.
I've got some code that draw's a text 'table' on a figure, with one side right justified and the other left. In order to keep the spacing between the two sides constant, I was previously using get_window_extent() on my left-side text artist to figure out the location of the right hand side:
# draw left text at figure normalised coordinates, right justified
txt1 = figure.text(x, y, left_str,
ha='right', color=left_color, fontsize=fontsize)
# get the bounding box of that text - this is in display coords
bbox = txt1.get_window_extent(renderer=figure.canvas.get_renderer())
# get x location for right hand side offset from left's bbox
trans = figure.transFigure.inverted()
xr, _ = trans.transform((bbox.x1 + spacing, bbox.y0))
# draw right side text, using lhs's y value
figure.text(xr, y, right_str,
ha='left', color=right_color, fontsize=fontsize)
Problem
My problem is now that I'm not using pyplot, the above code fails to work, because figure.canvas.get_renderer() fails to return a renderer, as I haven't set one, and am simply using Figure.savefig(path) to save my file when I'm done.
So is there a way to find out the bounding box size of an Artist in a Figure without having a renderer set?
From looking at legend, which allows you to use a bounding box line with variable text size, I'm presuming there is, but I can't find how.
I've had a look at https://matplotlib.org/3.1.3/tutorials/intermediate/artists.html, and also tried matplotlib.artist.getp(txt1) on the above, but didn't find any seemingly helpful properties.

Tkinter fullscreen leaves border

I am trying to make an image slide show application using Tkinter and Pillow. I would like the image to go full screen, so currently my code looks like this (I think these are all the important bits, ask me if you need to see more):
canvas = Canvas(root, width=screenwidth, height=screenheight, bg=‘black’)#screenwidth and height previously assigned (checked to be correct) variables containing screen dimensions.
image = image.resize((resizew, resizeh) Image.ANTIALIAS)
imagesprite = canvas.create_image(midx, midy, image=photo) #had changed our resized image to a tkinter photo image previously, midx and midy are just half the screen dimensions.
The problem:
No matter what settings I change there is always some form of grey bar around the edge of the window. I have tried changing the window size, changing the canvas size, setting the window geometry manually using root.geometry to no avail. However, some of the combinations of settings lead to there being fewer bars; I have seen between 1 and 3. Pictures of the output in its current state are attached. There are no errors in the shell, not (currently) is there a border on the left of the image
[1]: https://i.stack.imgur.com/1DLfg.jpg
You need to set highlightthickness=0 when creating the canvas:
canvas = Canvas(root, width=screenwidth, height=screenheight, bg='black', highlightthickness=0)

Store Click Coordinates Using Matplotlib

I am trying to display an image, click on it somewhere, then store those coordinates into a variable. However, I am unable to do so. I can print the click coordinates no problem, but I cannot figure out a way to actually store those coordinates. The matplotlib documentation has some tutorials on how to use "fig.canvas.mpl_connect" in general, but none of the routines cover storing the click coordinates, which is what I want to do. There are some tutorials on StackExchange, as well as other websites, but they seem to be for outdated versions of python and/or matplotlib.
Here is my simple code right now:
import matplotlib.pyplot as plt
x = 0
def onclick(event):
print(event.xdata)
print(event.ydata)
global x
x = event.xdata
fig, ax = plt.subplots(figsize=(8,8))
plt.show()
cid = fig.canvas.mpl_connect('button_press_event', onclick)
print(x)
Upon running this code, it immediately prints '0', THEN displays the image. When I then click on that figure, I get coordinates printed to the console. I have tried putting a pause command before the print statement, but it just waits to print '0', then displays the image. Essentially, I need it to display the image so I can click it, THEN print the coordinates of my click.
Any help would be appreciated. I am also open to another method of obtaining the click coordinates, if one exists. Thank you.

Move to searched text on active screen with pyautogui

I am trying to make a program that searches for a text on a web-page, then places the mouse cursor on the highlighted text after it has been found. Is this possible using pyautogui? If so, how. If not, are there any other alternatives to do this?
Example code below:
import webbrowser
import pyautogui
var = 'Filtered Questions'
webbrowser.open('https://stackexchange.com/')
time.sleep(2)
pyautogui.hotkey('ctrl', 'f')
pyautogui.typewrite(var)
#code to place mouse cursor to the occurrence of var
I would prefer to not use the pyautogui.moveTo() or pyautogui.moveRel() because the text I am searching for on the website is not static. The position of the searched text varies when the web page loads. Any help would be highly appreciated.
When you use Chrome or Chromium as a browser there is a much easier and much more stable approach using ONLY pyautogui:
Perform Crtl + F with pyautogui
Perform Ctrl + Enter to 'click' on search result / open the link related to the result
With other browsers you have to clarify if there keyboard shortcuts also exists.
Yes, you can do that, but you additionally need Tesseract (and the Python-module pytesseract) for text recognition and PIL for taking screenshots.
Then perform the following steps:
Open the page
Open and perform the search (ctrl+f with pyautogui) - the view changes to the first result
Take a screenshot (with PIL)
Convert the image to text and data (with Tesseract) and find the text and the position
Use pyautogui to move the mouse and click on it
Here is the needed code for getting the image and the related data:
import time
from PIL import ImageGrab # screenshot
import pytesseract
from pytesseract import Output
pytesseract.pytesseract.tesseract_cmd = (r"C:\...\AppData\Local\Programs\Tesseract-OCR\tesseract") # needed for Windows as OS
screen = ImageGrab.grab() # screenshot
cap = screen.convert('L') # make grayscale
data=pytesseract.image_to_boxes(cap,output_type=Output.DICT)
print(data)
In data you find all required information you need to move the mouse and click on the text.
The downside of this approach is the ressource consuming OCR part which takes a few seconds on slower machines.
I stumbled upon this question while researching the topic. Basically the answer is no. " major points:
1) Pyautogui has the option of searching using images. Using this you could for example screenshot all the text you want to find and save as individual text files then use that to search for it dynamically and move the mouse there/click/do whatever you need to. However, as explained in the docs, it takes 1-2 seconds for each search which is rather unpractical.
2) In some cases, but not always, using ctrl+f on a website and searching for the text will scroll so that the result is in the middle (vertical) of the page. However that relies on some heavy implications about where the text to search is. If it's at the top of the page you obviously won't be able to use that method, same as if it's at the bottom.
If you're trying to automate clicks and have links with distinguishable names, my advice would be to parse the source code and artificially clicking the link. Otherwise you're probably better off with a automation suite like blue prism.
pyautogui is for controlling mouse and keyboard and for automating other GUI applications. If your need is to find a text on a webpage, you may look for better options that are intended for scraping webpages. For instance: Selenium
If you are a newcomer looking for how to find a string of text anywhere on your screen and stumbled upon this old question through a Google search, you can use the following snippet which I have used in my own projects (It takes a raw string of text as an input, and if the text is found on the screen, return the coordinates, and if not, return None):
import pyautogui
import pytesseract
import cv2
import numpy as np
# In case you're on Windows and pytesseract doesn't
# find your installation (Happened to me)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
def find_coordinates_text(text, lang='en'):
# Take a screenshot of the main screen
screenshot = pyautogui.screenshot()
# Convert the screenshot to grayscale
img = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2GRAY)
# Find the provided text (text) on the grayscale screenshot
# using the provided language (lang)
data = pytesseract.image_to_data(img, lang=lang, output_type='data.frame')
# Find the coordinates of the provided text (text)
try:
x, y = data[data['text'] ==
text]['left'].iloc[0], data[data['text'] == text]['top'].iloc[0]
except IndexError:
# The text was not found on the screen
return None
# Text was found, return the coordinates
return (x, y)
Usage:
text_to_find = 'Filtered Questions'
coordinates = find_coordinates_text(text_to_find)

Resources