Fastest way to get X/Y position of text or text-based shape in screenshot? - linux

I'm trying to create a script for Linux that will detect where the text cursor is. This should be done in maximum 1 second. In order to implement this, the best solution seems to be to programmatically add some text via xdotool, take a screenshot via some other utility, try to figure out the position of that text, and then remove the text we've inserted using xdotool again.
I tried inserting a random string (like <-- CURSOR HERE). Using Tesseract 4 it takes about 20 seconds to find the position of the string, although it's very precise in terms of pixel coordinates. I was not able to use whitelisting (in version 4 of Tesseract) to narrow result to specific letters or digits only, which I assume would speed up processing.
I don't know what font the user will be using, but every font has dashes and slashes, so I could create some sort of shape (for instance, |/\|/\|/\|/\|), and use some library to detect that shape. What would be a good choice?
I don't care about what's on the rest of the screen: it could be more text, images, etc. I only need o know where my random string is (<-- CURSOR HERE, |/\|/\|/\|/\|, or can you think of anything else), and get its X/Y position in pixels.

Related

Is there any way to draw large characters using ncursesw?

I want to create a main page for a program I'm trying to write and basically, I want to print the title at the center with a large font and menu and the rest after that. What I'm looking for is any character set in ncursesw or just anything which will help me draw large characters for title more precisely
For this (in ncurses), I tried A_REVERSE attribute and printed the respective places white, something like this
Large text
But other letters like z, g and t are hard to draw and they don't look good either.

Partial ligature selection with DirectWrite

Using HitTestTextPosition style API from IDWriteTextLayout I did not managed to handle properly text positions inside "ti", "ffi" or other ligatures with fonts like Calibri. It always returns position after or before ligature not inside like t|i or f|f|i.
What is the recommended way to do a caret movement inside ligatures with DirectWrite API?
There... is no "inside" position if you have GSUB replacements turned on?
Opentype GSUB ligatures are single glyph replacements for codepoint sequences, rather than being "several glyphs, smushed together". They are literally distinct, single glyphs, with single bounding boxes, and a single left and right side bearing for cursor placement/alignment. If you have the text A + E and the font has a ligature replacement that turns it into Ӕ then with ligatures enabled there really are only two cursor positions in that code sequence: |Ӕ and Ӕ|. You can't place the cursor "in the middle", because there is no "middle"; it's a single, atomic, indivisible element.
The same goes for f. ligatures like ff, fi, fl, ffi, ffl, or ſt: these are single glyphs once shaped with GSUB turned on. This is in fact what's supposed to happen: having GSUB ligatures enabled means you expressly want text to be presented—for all intents and purposes—as having atomic glyphs for many-to-one substitutions, like turning the full phrase "صلى الله عليه وعلى آله وسلم‎", as well as variations of that, into the single glyph ﷺ.
If you want to work with the base codepoint sequences (so that if you have a text with f + f + i it doesn't turn that into ffi) you will need to load the font with the liga OpenType feature disabled.
The text editors I know of use the simple hack of (1) dividing the width of the glyph cluster by the number of code points within the cluster (excluding any zero width combining marks), rather than use the GDEF caret positioning information. This includes even Word, which you can tell if you look closely enough below. It's not precise, but since it's simple and close enough at ordinary reading sizes, it's what many do:
(2) I've heard that some may (but don't know which) also use the original glyph advances of the unshaped characters (pre-ligation) and scale them proportionally to the ligature cluster width.
(3) Some text editors may use the GDEF table, but I never knew of any for sure (possibly Adobe In-Design?).
The most challenging aspect of using methods 2 or 3 with IDWriteTextLayout is that accessing the corresponding IDWriteFontFace in that run requires quite the indirection because the specific IDWriteFontFace used (after resolving font family name+WWS+variable font axes) is stored in the layout but not publicly accessible via any "getter" API. The only way you can extract them is by "drawing" the glyph runs via IDWriteTextLayout::Draw into a user-defined IDWriteTextRenderer interface to record all the DWRITE_GLYPH_RUN::fontFace's. Then you could call IDWriteFontFace::GetDesignGlyphAdvances on the code points or IDWriteFontFace::TryGetFontTable to read the OpenType GDEF table (which is complex to read). It's a lot of work, and that's because...
The official PadWrite example has the same issue
IDWriteTextLayout was designed for displaying text rather than editing it. It has some functionality for hit-testing which is useful if you want to display an underlined link in a paragraph and test for it being clicked (in which case the ligature would be whole anyway within a word), or if you want to draw some decorations around some text, but it wasn't really intended for the full editing experience, which includes caret navigation. It was always intended that actual text editing engines (e.g. those used in Word, PowerPoint, OpenOffice, ...) would call the lower level API's, which they do.
The PadWrite sample I wrote is a little misleading because although it supports basic editing, that was just so you can play around with the formatting and see how things worked. It had a long way to go before it could really be an interactive editor. For one (the big one), it completely recreated the IDWriteTextLayout each edit, which is why the sample only presented a few paragraphs of text, because a full editor with several pages of text would want to incrementally update the text. I don't work on that team anymore, but I've thought of creating a DWrite helper library on GitHub to fill in some hindsight gaps, and if I ever did, I'd probably just ... use method 1 :b.

How can I draw to an XY position in Emacs?

I wanted to allow the Emacs cursor to move around freely outside of actual text (similar to virtualedit=all in Vim).
"Oh," I thought, "I'll just keep track of a virtual cursor and draw it to the screen myself."
But it turns out the actual native C drawing routines (such as draw_glyphs) seem to refer back to the buffer contents to decide what to draw (I could be wrong though).
My next idea was to make a giant overlay of all spaces so I'd have complete freedom where to put stuff. But an overlay only goes over ranges of actual text, so again, this does not seem to give me what I'm looking for.
Is this a reasonable goal without hacking the C code?
I believe the writeable area of a window is intrinsically limited to the buffer with which it is associated, i.e. you have to draw in an area where buffer content exists.
(One example of this limitation is the impossibility of drawing a vertical guide line in the 80th column to help the user identify long lines; currently the best possible implementation of such a feature is to highlight the "overflow" of each too-long line.)
You can do the same as what artist-mode does without adding spaces to the buffer:
when trying to place the cursor after the end of the line, just use an overlay with an after-string property which adds the spaces in the display without modifying the buffer.
Have a look at "artist-mode" (M-xartist-modeRET) - it allows you to draw in Emacs.
From the function documentation: "Artist lets you draw lines, squares, rectangles and poly-lines, ellipses and circles with your mouse and/or keyboard."
You can look at popup.el from the auto-complete package, which can pop up tooltips and menus and such at any position, including positions outside the contents of the buffer. Maybe that will show you how you can do it.

LaTeX \includegraphics and textline

Ok, I am beat. I tried a few things but I am unable to make this happen. I need some help now.
I want to be able to have some text and picture side by side (only one line, thus no need for wrapping or other fun. The picture is small enough to fit in a text line):
This is a text <temp.jpg placed center to the textline>
Problem is, when I use
This is a text \includegraphics{temp.jpg}
the pictures baseline is alligned with the text baseline. I want the picture (vertical) center to be aligned with the text baseline. How can I make this possible?
This is a text $\vcenter{\hbox{\includegraphics{temp.jpg}}}$
It sounds like you want \raisebox (see the raisebox section of the LaTeX wikibook), with a negative argument. Use dimensions ex (the notional height of an 'x' in the current font) or \baselineskip (the size between text baselines) as your units.
If you want to do more complicated things, such as move the graphics box down by half its height, you can, but it gets fiddly. If the graphic size isn't unpredictable, you're probably better off tuning this by hand anyway.
In my opinion, most simple answer \raisebox{-0.5\totalheight}{<your graphic here>}
This is a text \raisebox{-0.5\totalheight}{\includegraphics{temp.jpg}}
Explanation:
\raisebox moves vertically the whole text/picture given as second argument. The first argument is the vertical shift as a length. This command provides the length \totalheight which is, self-explanatory, the height of the whole text/picture that you want to raise. The factor -0.5 lowers exactly at the half of the length(as the question demands). For aesthetic adjustments just modify the factor's value.
By the way, with this method there is no need to get into math mode as in #AlexeiMalistov answer, and no need of double command \vcenter + \hbox

How to display conjuncted letters [Bengali Language] using LWUIT in mobile?

I have been trying to develop a simple J2ME application using LWUIT in Bengali language. However, because of heavy usage of vowels as conjuncted letters in Bengali language, I am facing some problems with LWUIT.
For example let us say, “X” is a consonant letter and “#” works as a vowel in Bengali; now they are combined together when needed becoming a conjuncted format “X#”.
Using LWUIT, when I add such vowels and try to display them as the conjuncted format with a consonant in a real application, they are combined with their previous letter (which is in a consecutive order) as defined in the charset. Although interestingly, in the LWUIT designer display/preview, the characters appear correctly.
For details, kindly download this document here (http://dibbaa.com/lwuit/doc/lwuit.doc) and see the real-life examples.
I will appreciate if anybody can help me out on this. Just let me know how can I set LWUIT framework in such a way so that it doesn’t combine the letters as they defined in the charset by consecutive order while painting them.
I have used LWUIT version 1.3 and font “KarnaphuliP.ttf” for my application.
Thanks
I know nothing about Bangali, and I cannot find the font you mentioned. But I managed to get an alternative font for recreating the problem:"Bengali-Progoty.TTF" (which, unfortunately, is not a bit similar to yours). You can get the font here:Bengali-Progoty.TTF.
Those vowels are special, in that their width are zero ,and I bet their origin point is the right-top point, instead of left-top. This way, vowels can be drawn on top of other characters preceding them.
When lwuit designer generates bitmap font, it draws every character (What I mean is, unicode character) onto a big bitmap, calculates the width of current character, add that width to current offset, and draws the next character. As a vowel has a width of zero, it will be combined into the last non-vowel character preceding it.
To solve this problem, you can either switch to unicode font (Bangali has a place in unicode), or you can stick to the current font and do some customization work to the font generation process.
1 Create your own class overriding the EditorFont class in lwuit's editor.jar.
2 Override EditorFont#getBitmapFont() method, do your own drawing of every character. You can test if any character is a vowel, and if so, draw it with a preceding space.
3 Override the FontTask Ant task provided in lwuit's editor.jar.
4 Override the FontTask#addToResources() method, insert your own EditorFont instance instead of the original one.
5 Override the LWUITTask class, add an AddXXX method to support your overriden FontTask.
6 Build a resource using ant, and use your own version of LWUITTask and FontTask instead of the original version.
7 As vowels have become regular characters, they will take up the same space as other characters and cannot be drawn on top of other characters any more. You have to draw them on top of other characters manually. The com.sun.lwuit.CustomFont class may have to be overriden in order to draw these vowels correctly.
Given the complexity introduced, I highly recommend switching to unicode font. But as I have said, I know nothing about Bangali and cannot tell if it is adequate to use a unicode font. Maybe you have to do it the hard way after all.
Good luck.
I think it may be better for you to implement your own Virtual keyboard. There is a sample in the LWUIT developer guide and demo. I am sure if u invest time in that it will come in handy later on.

Resources