Python Gui Automation - python-3.x

I've been looking around for a few weeks trying to find a library to help fit my needs.
I'm looking for a way to create what would be as a secondary virtual keyboard / mouse cursor. It doesn't necessarily have to be visible per se, but the idea behind it is that I want to be able to have my normal cursor and keyboard free to be used, while the secondary virtual keyboard and mouse are used for automation in another window.
I'm not entirely sure where to begin on this journey... Any ideas, stackoverflow community?

Related

How can I access to the monitor image data?

1. The problem I've encountered
Hi, I'm currently making a desktop application with Electron.js. Meanwhile, I have needed a feature of taking a screenshot (including the mouse cursor) but this is a problem for me because I do not know how to do this.
I think the reason for me not to be able to solve this problem is that I have no knowledge about operating systems. I think the meaning of "taking a screenshot" is "getting the image data displayed on the computer monitor", but how I can access to that?
2. What I've tried or considered
At first I tried Electron.BrowserWindow.capturePage() but its result didn't meet my want. It is because of two reasons: 1) My application has a transparent background and wherever area of transparency becomes black if I take a screenshot. 2) Mouse cursor is not captured together.
Meanwhile, I am aware of the existence of some APIs such as Screen Capture API and Media Capture and Streams API (in web browsers) and perhaps I can give it a try because I'm using Electron.js and Electron.js uses Chromium web browser and web browsers have implementations of those APIs.
However, it is still a problem that what those APIs handle is media streams (= video), which is not suitable for my case. Of course I think it is possible to take only one frame(?) out of a media stream somehow, but I think it is an overwork, given that what I desire for is just a single screenshot.
Meanwhile, because Electron.js also uses Node.js, I think it is also somehow possible to call Windows API (maybe via Foreign Function Interface?) or to invoke child_process.exec() in order to take a screenshot.
3. The question I would like to ask
How can I access to the monitor image data? So that I can implement "the screenshot feature which meets my requirements--see-through & mouse cursor" (if uses of third-party libraries needed, as least as possible).
What calculates a final image data which is going to be displayed on my computer monitor? It seems that it is a work of my graphics card because my monitor and graphics card are connected each other with a cable.
4. Miscellaneous curiosities (not much related to the question)
...Yet it is another curiosity that how, why, and where the transparent area is processed as #000000 color.
Meanwhile it is also interesting that there are some programs which do not allow me to take a screenshot of contents on them--the area where the programs are located looks black. How could the developers of this kind of programs implement this?
Thank you for reading my question.
After some internet searches, I found it difficult to access and get display data (specifically, video ram data from my graphics card). So I decided to use a workaround--It is a well-known aphorism that 'all loads lead to Rome'.
Which means,
See-through screenshots can be achieved by either "using native screenshot feature (the PrintScreen key)" or "using some scripts that take a picture of the entire screen".
Screenshots with mouse cursor can be achieved by adding (= overlaying) mouse cursor image at the coordinates where my mouse cursor is located at.
However, in my case I do not actually need to save screenshots as files, so I think it is enough to just draw a custom mouse cursor image, hide the original mouse cursor image, make it follow the mouse cursor, and take a screenshot with a manual key press. (I think it is also a feasible option to take a screenshot with the PrintScreen key press, get the screenshot data from the clipboard and do some image processings like adding effects relevant to a mouse cursor.
※ I saw a code that simulates "key press" (SendKey()) in order to take a screenshot and I think this is a good approach because of no manual key press needed.
I think whom interested in this topic may find it helpful from the following links (the numerical order does not represent importance):
Keywords mentioned: GetDC(), BitBlt(), CAPTUREBLT flag, GDI
What is the best way to take screenshots of a Window with C++ in Windows?
How can I take a screenshot in a windows application?
Keywords mentioned: DirectX, buffer
Fastest method of screen capturing on Windows
How to save backbuffer to file in DirectX 10?
Keywords mentioned: mouse cursor, cursor image, hot spot
Capture screen shot with mouse cursor
C# - Capturing the Mouse cursor image
Python - Take screenshot including mouse cursor
Keywords mentioned: PowerShell, CopyFromScreen()
How can I do a screen capture in Windows PowerShell?
Capture screenshot of active window?
Q/A about accessing to video memory
DRM Access the whole video memory
raw video memory, video driver Access the whole video memory through OpenGL programming
graphics RAM API to get the graphics or video memory
direct data write to video memory
Direct video buffer access
How to write data directly into video memory?
Is direct video card access possible? (No API)

Convert Mouse position in pixels into rows and column (or PS) in Mainframe emulator

Need to convert mouse coordinates into PS position or row and column on mainframe emulator.
I'm using Whllapi to connect and automate mainframe emulator. I need to find a underlying field when user move mouse or click on a field at emulator screen. To identify a field on mainframe emulator i need to know row and column or PS position. I need to convert mouse position (in pixels) to emulator row and column. But there is no API in whllapi that provides such functionality.
I used whllapi api "QueryWindowCoordinates" and 'WindowStatus" to get emulator window coordinates and window hwnd. I used that handle in window API "SreenToCleint" to get mouse position with respect to emulator window. But i'm unable to translate those co-ordinates into emulator rows and column. I tried many algorithm but unable to get consistent results. I need translate mouse position precisely into PS position.
Whllap documentation has mentioned "WindowStatus" api to return font sizes for x and y but i'm unable to retrieve any value from Rumba emulator. In order to get fond height and width, I also tried window api 'GetTextMetrcies' but that was not much help either.
IBM Personal Communications for Windows offers a "Get Mouse Input" DDE function which returns PS position data (row, column) when the user clicks the mouse. There's another DDE function, Set Mouse Intercept Condition, to establish which mouse click(s) (left, right, middle, single, double, etc.) should be intercepted. I don't see a direct way to capture mere mouse movements using the DDE functions, but it might be possible (if you're very careful in your Windows programming) if you generate simulated, rate limited mouse clicks and only when the mouse pointer is moved within the emulator window.
Perhaps Rumba offers similar functions? Rumba evidently has some DDE functions, but I haven't found any DDE function reference for Rumba publicly available online.
One possible caveat is that the DDE functions are 32-bit (and 16-bit functions are also still supported since there's still some 16-bit code running on 32-bit Windows). You can use the 32-bit functions if you're doing 64-bit Windows programming, but of course you'll need to know how to do that if you don't already. Another caveat is that you probably ought to test whatever you're doing for user accessibility, for example with screen reading tools that aid vision impaired users.
Another possible approach is to embed the whole emulator within your own "wrapper" application since that might give you more programming power and control. IBM offers both ActiveX/OLE-style embedding and Java-style embedding (their "Host Access Class Libraries," a.k.a. HACL). Rumba might offer something broadly similar.
And yet another possible approach is to shift the interactions with these applications toward APIs and in favor of brand new, more portable user interfaces, usually Web and mobile interfaces. There are myriad ways to do that. If you still need terminal (3270)-driven automation -- maybe because the application source code is lost or it's otherwise really hard to create useful APIs for it? -- there are a variety of ways to shift that automation into the backend. For example, CICS Transaction Server for z/OS comes with multiple terminal automation technologies as standard included features. Look for references to "3270 bridge" and "FEPI" in IBM's Knowledge Center for CICS to explore that range of choices.

Edit Windows 10 start menu programmatically

To begin with, I understand that Microsoft offers no way to programatically alter the (modern) start menu - on purpose.
Nevertheless, I'm looking for a way to still do it. I might use it to make a tool to sync the start menu between devices - or to automatically place often used items into thematically sorted groups (office, games, tools). The reason is that I have multiple devices, and really suck at manually managing the start menu - so I just use search or the alphabetic list most of the time.
So, does anybody know how to programatically add, remove, edit tiles? I could imagine solutions including:
Using undocumented APIs (can you still call it an API if it is not documented?)
Directly editing the tile database (e.g. TileDataLayer) - downside is that it seems to be a binary format, which is not known, and you'd have to restart the shell for changes to take effect.
Hooking DLLs or poking around in memory - yikes - but not worse than what other "desktop modding" tools like WindowBlinds would do
Using accessibility APIs, or faking mouse/keyboard input - this would most probably work, but it would be a bit spooky seeing the cursor move around, and it seems even more frail than the others.
I searched a bit, and think there is probably no solution available right now, but you can see this as a challenge to come up with a solution :-)
As you say, there isn't a way to do this.
As an alternative, did you know that you can easily find apps to launch by pressing the windows key and then typing the name of the app you want to launch? This is how I launch anything that isn't pinned to my taskbar. The device I'm on and the order of items in a list or what's pinned where become irrelevant when working this way.

how to control mouse movement in visual c++?

I am doing a project where a regular monitor could be converted into a touchscreen.
For this purpose i have designed a grid of ir sensors and installed them into a frame that could be put around the screen. That concludes the hardware.
What i want to do is control the mouse movement using the grid, so that when the user moves his/her finger inside the frame it moves the mouse on the screen. Thus giving the effect of a touchscreen. I hope I am clear in explaining my problem. I am using Windows,MS Visual C++.
If there is any suggestion other than visual C++, please let me know.
Thank you.
You can use the SetCursorPos function.

Map keyboard for common paste operation

I want to program a key on my keyboard to paste certain static text when pressed.
For instance, I'd like to program the F12 key so, when pressed, it pastes my email address every time. Is there an easy way to do this?
There are some excellent free software products that allow you to remap keys and create macros. For Windows, you can use AutoHotKey. I've used it before and it can be quite handy for this sort of thing.
It's a very powerful tool and can be a little intimidating to use at first but it's worth taking the time to get to know it. The Quick Start guide is quite helpful.
If you mean instead, the ability to do this within your own program using C++ or .NET or something, give some more details.

Resources