How do I get the bounding boxes from individual letters using Azure OCR result? - azure

I am using the Azure OCR to analyze some images. For my purposes I would need to know the location of each letter. I have managed to get the bounding boxes around each line so this works perfectly, but now I need to 'zoom in' more, such that I get the bounding box around each letter. The lowest I see in the API is the bounding box property of each Word. Does anyone know how to do this?

Related

Replace real world text using ARKit and Vision(?)

I'm trying to build a translator app which would be able to replace foreign text in the real-time, but after exploring possible approaches got a bit cornered.
Even though I was able to extract words images using Vision, I couldn't replace them in place in ARKit scene. Then I tried using ARReferenceImage and image tracking, but it needs to know the physical width of the target image which I can not guarantee, as the text could be on any surface from a book to a billboard.
Am I missing something? What would you guys suggest?

How do I get the color of the text?

I've been using the Microsoft OCR API and I'm getting the text from the images but I would like to know if the text is in an specific color or has an specific background color.
For example I have the following image and I would like to know if there is text in red
i.e. image
I thought that this line:
string requestParameters = "language=unk&detectOrientation=true";
would help me to establish the parameters I'd like to recieve from the image so if I wanted to know the color in a line of words. So I added a visual feature like this:
string requestParameters = "visualFeatures=Color,language=unk&detectOrientation=true";
But this did not solve the problem.
Also: Can I mix the uriBase link from the image analysis and the one from the OCR?
There is currently no way to retrieve the color information and OCR results in a single call.
You could try using the bounding boxes returned from OCR to crop the original image, and then send the crop it to the analyze endpoint with visualFeatures=color to get the color information for the detected text.
According to documentation, the possible request parameters of this api are:
language, detectOrientation
and the returned metadata has these entities:
orientation, language, regions, lines, words, boundingBox, text
It will be possible to combine the OCR algorithm with another one of the computer vision algorithms to detect the dominating colors in the text regions that the OCR identified.

Image map with links to other tabs

I Have an image for a homepage screen. The top part of the image when clicked should lead to the second tab, the left hand side of the image when clicked goes to the third tab and so on.
Basically geotagging an image , so that i can make areas of the image clickable leading to different tabs
I tried implementing using a map chart where i added an image layer, and added this image. Some solutions asked me to add a marker layer with x,y coordinates but I'm unsure on how to proceed on my image
Kindly help with any alternative solution
it sounds like you want an image map. "geo tagging" is when geographic info like latitude and longitude are added to an image.
your best bet is to use a text area with a table filled with image-type action controls. if you have Photoshop, you can use a technique called Image Slicing to prepare your images.
FYI, this is probably not a simple task, especially if you don't know much about HTML. you may want to consider a different navigation scheme.
if you update your question with more detail about the end result you are trying to achieve, maybe someone can share a more fitting solution. http://mywiki.wooledge.org/XyProblem

Detect collision between svg path and svg text

Suppose I have a svg path and a piece of text. I want to figure out where they intersect. I'm not really sure where to start, because the svg path's getBBox() function does not help.
Where should I start?
You have the text bounding box via getBBox(). Unfortunately, as you may have already discovered, that is not a tight bounding box of the glyphs. It includes the full descender and ascender heights of the font. However it should get you a reasonable approximation.
The next step is to determine where the path hits the bounding box. Getting a perfect mathematical solution is very hard, but there are iterative approaches that are much easier and give good results.
Path elements have a couple of DOM functions that can help: getTotalLength() and getPointAtLength(). You can step along the path from 0 to the path length, calling getPointAtLength(), until the point returned is inside the text bbox.
If you want to get more accurate and determine which character in the text touches the line, there are some DOM functions on SVG text elements that should be useful. For instance, `getExtentOfChar(n) returns the bounds of the nth character in the text.

Incorrect coordinates retrieved from image using ABBYY OCR SDK

I'm trying to process an image using ABBYY OCR SDK using the sample code placed in this question but I'm not able get the co-ordinates right for a specific word say "OCR" on the screenshot below.
I want to draw an overlay (yellow rectangle over the word "OCR") and sometimes the rectangle is placed very far away from the actual word.
The XML you get is synthesised according to this schema.
For each recognized character it will contain an instance of charParams element as shown in the answer you linked to. The element will contain the coordinates in page pixels - the same XML also contains a page element:
<page width="..." height="..." resolution="..." originalCoords="...">
where the image width and height are stored. So l and r for each charParams element is in range 0..width-1 of the corresponding page and t and b for each charParams element is in range 0..height-1 of the corresponding page.
Also it's worth mentioning explicitly that all coordinates are in pixels - they are completely resolution-agnostic. This is why whenever you try to highlight anything on an image you have to take zoom into account - the image will likely not be always displayed as is by your device software, but will be downscaled and so you have to map page coordinates onto your zoomed-out image coordinates and highlight appropriately.
Have you checked the DPI of the original image and also check the documentation to make sure the OCR engine is using the same DPI and not returning the image in points or some other measurement system.
It could be that rectangle you are drawing in iOS is not based on pixels but on some other measurement system also.
You just need to work through the process, testing as you go, and work out where the problem is coming from. It is most likely a uniform scaling and the distance from the actual word is proportional to the distance of the word from the top left of the page.

Resources