Move to searched text on active screen with pyautogui - python-3.x

I am trying to make a program that searches for a text on a web-page, then places the mouse cursor on the highlighted text after it has been found. Is this possible using pyautogui? If so, how. If not, are there any other alternatives to do this?
Example code below:
import webbrowser
import pyautogui
var = 'Filtered Questions'
webbrowser.open('https://stackexchange.com/')
time.sleep(2)
pyautogui.hotkey('ctrl', 'f')
pyautogui.typewrite(var)
#code to place mouse cursor to the occurrence of var
I would prefer to not use the pyautogui.moveTo() or pyautogui.moveRel() because the text I am searching for on the website is not static. The position of the searched text varies when the web page loads. Any help would be highly appreciated.

When you use Chrome or Chromium as a browser there is a much easier and much more stable approach using ONLY pyautogui:
Perform Crtl + F with pyautogui
Perform Ctrl + Enter to 'click' on search result / open the link related to the result
With other browsers you have to clarify if there keyboard shortcuts also exists.

Yes, you can do that, but you additionally need Tesseract (and the Python-module pytesseract) for text recognition and PIL for taking screenshots.
Then perform the following steps:
Open the page
Open and perform the search (ctrl+f with pyautogui) - the view changes to the first result
Take a screenshot (with PIL)
Convert the image to text and data (with Tesseract) and find the text and the position
Use pyautogui to move the mouse and click on it
Here is the needed code for getting the image and the related data:
import time
from PIL import ImageGrab # screenshot
import pytesseract
from pytesseract import Output
pytesseract.pytesseract.tesseract_cmd = (r"C:\...\AppData\Local\Programs\Tesseract-OCR\tesseract") # needed for Windows as OS
screen = ImageGrab.grab() # screenshot
cap = screen.convert('L') # make grayscale
data=pytesseract.image_to_boxes(cap,output_type=Output.DICT)
print(data)
In data you find all required information you need to move the mouse and click on the text.
The downside of this approach is the ressource consuming OCR part which takes a few seconds on slower machines.

I stumbled upon this question while researching the topic. Basically the answer is no. " major points:
1) Pyautogui has the option of searching using images. Using this you could for example screenshot all the text you want to find and save as individual text files then use that to search for it dynamically and move the mouse there/click/do whatever you need to. However, as explained in the docs, it takes 1-2 seconds for each search which is rather unpractical.
2) In some cases, but not always, using ctrl+f on a website and searching for the text will scroll so that the result is in the middle (vertical) of the page. However that relies on some heavy implications about where the text to search is. If it's at the top of the page you obviously won't be able to use that method, same as if it's at the bottom.
If you're trying to automate clicks and have links with distinguishable names, my advice would be to parse the source code and artificially clicking the link. Otherwise you're probably better off with a automation suite like blue prism.

pyautogui is for controlling mouse and keyboard and for automating other GUI applications. If your need is to find a text on a webpage, you may look for better options that are intended for scraping webpages. For instance: Selenium

If you are a newcomer looking for how to find a string of text anywhere on your screen and stumbled upon this old question through a Google search, you can use the following snippet which I have used in my own projects (It takes a raw string of text as an input, and if the text is found on the screen, return the coordinates, and if not, return None):
import pyautogui
import pytesseract
import cv2
import numpy as np
# In case you're on Windows and pytesseract doesn't
# find your installation (Happened to me)
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
def find_coordinates_text(text, lang='en'):
# Take a screenshot of the main screen
screenshot = pyautogui.screenshot()
# Convert the screenshot to grayscale
img = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2GRAY)
# Find the provided text (text) on the grayscale screenshot
# using the provided language (lang)
data = pytesseract.image_to_data(img, lang=lang, output_type='data.frame')
# Find the coordinates of the provided text (text)
try:
x, y = data[data['text'] ==
text]['left'].iloc[0], data[data['text'] == text]['top'].iloc[0]
except IndexError:
# The text was not found on the screen
return None
# Text was found, return the coordinates
return (x, y)
Usage:
text_to_find = 'Filtered Questions'
coordinates = find_coordinates_text(text_to_find)

Related

How to save pygame window as a image? (python3) (PyGame)

I'm making a pixel editor / a trash version of ms paint in python with pygame, and I want to save the window (canvas?) as a png or jpg. I've seen pygame.image.save, but that only saves a certain surface as an image, I want to save the entire window.
Give the following a try:
pygame.image.save(window, "screenshot.png")
Use pygame.image.save(), which requires PyGame 1.8 or later. If you give it the base-level window surface, it will indeed save the entire window content.
For example:
pygame.image.save( window, 'surface.png' )
The image type is determined by the filename suffix.

showing PIL ImageGrab as numpy Array in Kivy Window

I am trying to capture a part of my screen with PIL.ImageGrab and then converting it to an numpy.array as you can see below.
np.array(ImageGrab.grab(bbox=(0, 0, 720, 480)))
What I can't figure out is, how to get this data into a Kivy window or a widget?
All my Widgets are in separate classes so I would like to have one Screen class with all the code necessary for this part of my Window.
I have already tried methods like this one, but I don't know how to implement it since Kivy confuses me a lot sometimes.
If my idea just isn't realizable, I guess you could save each captured frame as .jpg format and then load it into Kivy. But I imagine this is not the most efficient way.
Thanks a lot in advance! Dominik

Python: PyAutoGui click location is off by a few pixels when using an image to locate

Goal of the program: open a web browser tab to youtube, use a saved image of the "Youtube" button on the Youtube home screen to move the mouse position to that location, do a mouse click when there
Issue: The mouse moves to a location that is off by a few pixels (-29 x, -35 y) when performing the click() step. The coordinates are correct at the time of locateCenterOnScreen but are different when it does click()
What I've tried: I had the program print out the coordinates of the picture when it takes it's location and at that point in time the coordinates are correct, I used a mouse position program to narrow down how much its off by.
My Question: What is causing the position of the click() to be offset by these few pixels and how do I fix it?
import pyautogui as auto
import webbrowser
import time
site = "https://www.youtube.com/"
webbrowser.open_new_tab(site)
time.sleep(5)
x, y = auto.locateCenterOnScreen('test.png')
print(x)
print(y)
try:
auto.click(x,y)
except:
print("Not Found")
I ended up re-taking the picture I used for the program to locate and it works now. I'm unsure why the original one did not work as intended though.
probably your window was resized so the widht and height of the image you where looking for also changed.
I recommend using:
win = pygetwindow.getWindowsWithTitle('windownname')[0]
win.size = (1600, 900)
to resize a window

Store Click Coordinates Using Matplotlib

I am trying to display an image, click on it somewhere, then store those coordinates into a variable. However, I am unable to do so. I can print the click coordinates no problem, but I cannot figure out a way to actually store those coordinates. The matplotlib documentation has some tutorials on how to use "fig.canvas.mpl_connect" in general, but none of the routines cover storing the click coordinates, which is what I want to do. There are some tutorials on StackExchange, as well as other websites, but they seem to be for outdated versions of python and/or matplotlib.
Here is my simple code right now:
import matplotlib.pyplot as plt
x = 0
def onclick(event):
print(event.xdata)
print(event.ydata)
global x
x = event.xdata
fig, ax = plt.subplots(figsize=(8,8))
plt.show()
cid = fig.canvas.mpl_connect('button_press_event', onclick)
print(x)
Upon running this code, it immediately prints '0', THEN displays the image. When I then click on that figure, I get coordinates printed to the console. I have tried putting a pause command before the print statement, but it just waits to print '0', then displays the image. Essentially, I need it to display the image so I can click it, THEN print the coordinates of my click.
Any help would be appreciated. I am also open to another method of obtaining the click coordinates, if one exists. Thank you.

Creating a Retro-themed loading screen in Python 3.x

I have an idea to include a loading screen, just for a fun project I'm working on, prior to starting the actual program in Python.
Back in the 80s, we had loading screens on the ZX Spectrum, and other 8-bit computers, that displayed an image one line at a time, then coloured the image in - something to look at while the game loaded. You can find an example at: https://youtu.be/MtBoRp_cSxQ
What I'd like to do is part-replicate this feature. I'd like to be able to take a JPG or PNG and have Python code to load it one line at a time, much in the same as the video in the link above. I don't mind if it can't be coloured in after, or that there's no funky raster bars in the borders. Just an image, 'loading' in at one line at a time (if you see what I mean).
I can achieve much the same effect in the console with some basic code, using a text file with ASCII art, but it'd be great with an actual image.
Any help will be greatly appreciated.
As always, thanks in advance.
Dave.
import os
import time
def splash_screen(seconds):
splash=open("test.txt", 'r')
for lines in splash:
print(lines)
time.sleep(seconds)
splash.close()
#Main Code Start
os.system('cls' if os.name == 'nt' else 'clear')
splash_screen(.5)
username=input("Type your username:")

Resources