Word Bounding Boxes of Azure OCR results are shifted to the left? - azure

I am using the Azure OCR form recognizer to perform OCR. When I draw the line bounding boxes, it works great, but when I use the word bounding boxes, they are slightly shifted to the left.
For example, the line bounding boxes (ignore the red box) would look like this:lineocr
But when I draw the word bounding boxes from the same OCR results the result is shifted as follows:wordocr
Would anyone happen to know a solution for this problem, or maybe a nice workaround?
I have tried shifting the box by a certain percentage of the width of the bounding box but I would prefer to get the correct bounding box. The line bounding boxes have correct edges and I would expect the words to have them as well.

Related

Generating bounding boxes if I have the text labels and co-ordinates of bounding boxes

I am trying to implement the Convolutional character network or CharNet model
https://github.com/MalongTech/research-charnet
But I want to generate the bounding boxes in the images, but int he results I have only the co-ordinates and the character labels
Like the example above, so how can I generate the bounding boxes. Please help me out here.
Thanks in advance

Bounding box for multiple SVG paths

I have some SVG paths which represent equations. There will be multiple on the canvas. I would like to pass all of these paths and get back grouped paths where each group represents a single equation.
Assumptions there is no overlap between the equations bounding boxes
After processing I should then be able to apply a bounding box to check the success the end result should be;
I already know how to apply a bounding box to a set of points. I am specifically struggling with how to determine which set of points or paths should go into a single group.
For example I would not want it to give me a bounding box for the "y", "=", "m" e.t.c separately that would be of little use.

Grouping bounding boxes and Separating them - PYTHON

So I have this output image of an arabic text, I want to group the small bounding boxes to the bigger ones, then I want to separate overlapping boxes
I have no clue how to start
This is an example of what I want to do

Getting the coordinates of the "text" bounding box of a grayscaled picture by using command line in linux

Just what the title says .
Strictly speaking what I define as "text" bounding box for a grayscaled image is a set of 4 coordinates (x,y,x+width,y+height) that have to define a rectangle area in that image that has the maximum number of non white pixels and at the same time the least possible number of white pixels(without chagning the maximum amount of non-white pixel). I have text in quotation marks since images does not actually contain text because images do only contain pixels with colours.
Having installed ImageMagick in my Ubuntu and typing in the terminal the command: $convert input.png -trim ouput.png , I get :
Open the two images in new tabs in your web browser and you will understand the difference they have and you will also understand what I define as "text" bounding box.
The output.png has actually the width and height that I am looking for.I do not know how to get x and y coordinates.
The answer provided here (1) for pdf pages does not meet my criteria since the "text" bounding box that gs gives me has big white margins ( and actually as far as I can understand what gs defines as "text" bounding box for a pdf is something different from my definition of "text" bounding box for a picture).
I don't understand all the words in your description, and I think a diagram would help, but if you just want to know what -trim would do as your sample code implies:
identify -format "%#" image.png
200x100+10+20
So, for your image, you get
identify -format "%#" paper.png
406x620+38+68
which means that your box is 38 pixels to the right of the top left corner and 68 pixels down from the top left corner, and it is 406 pixels wide and 620 pixels tall.
And if I draw in that rectangle in red, I get:
convert paper.png -stroke red -fill none -draw "rectangle 38,68 444,688" result.png
An alternative way of getting the same result but using convert in place of identify is:
convert -format %# paper.png info:
406x620+38+68
Images don't have a 'text bounding box', because obviously there is no text.
The images in the PDF file may themselves contain white pixels, if they are scanned from books then they almost certainly will. These pixels count towards the bounding box of the image, because they are white not transparent and will obscure anything drawn beneath them.
Its also rather nonsensical to define a 'text bounding box' as 'an area in that picture that has no white margins and only text'. If its in an image then there is no text, only image samples which define pixels. That's a picture of text, not actually text. In order to differentiate between areas of an image containing text and areas containing non-text you will need OCR software, nothing else is going to do this because only OCR software is capable of detecting the difference between text and non-text.

itext multiline text on diagonal

How i can add a multi-line text with itext on diagonal. meaning that if the text is to large for the first diagonal(the largest diagonal) it should move on the next or above diagonal and so one, to see all the text.
I already calculated the text angle for the diagonal and uses pdfcontentbyte to stamp but if my text is longer than the diagonal the rest of the words that don't fit on the diagonal it is not showed. I think i have to make something mathematical or i saw something with setsimplecolumn and chunks but this will show my text aligned horizontal.
If anyone has some ideas? Thanks, and of course need some code examples.
Don't shoot me if I'm wrong, but based on your description I think you're talking about 'irregular columns'. See http://itextpdf.com/examples/iia.php?id=67
This type of column isn't a rectangle. Basically, you define coordinates for the left border (can be a diagonal line) and coordinates for the right border. Then you pour text between these two lines.
If that's not what you meant, maybe you want to write text diagonally. In that case, you can still use ColumnText, but you need to change the coordinate system, so that the text isn't written in horizontal lines from left to right, but in diagonal lines from top to bottom (or bottom to top). Changing the coordinate system is done with the concatCTM method.

Resources