Azure OCR unable to detect roman character "I" and "II" - azure

I have this image .
I am using Azure Computer Vision API - v2.0,
combination of Recognize Text API(POSt) and Get Recognize Text Operation Result(GET) as mentioned in https://westcentralus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/587f2c6a154055056008f200. to detect text characters in the image.
Currently it is able to detect all the characters except letter I and II.
Can someone help?

Related

A way to translate images?

In Python I am attempting to translate Arabic characters within a image. I can provide the language 'source' type (Arabic) and 'destination' (English). Is there a python library or API that is free that I can use for this? I.e that provides a service like https://translate.google.com, that allows for cloud image translation (the uploading of images containing non-translated characters) and downloading of images containing the destination characters translated within the image? Or a library to do this locally within my system (i.e. detect Arabic characters from an image containing Arabic text, extract the Arabic characters, for using cloud translation services (e.g. google translate) and then modify the image containing Arabic characters with the newly translated English characters? So, my goal is to modify/replace the Arabic characters within an image containing Arabic characters with English characters that are the translated characters of the original/extracted Arabic characters. I know Yandex / https://translate.yandex.com/ocr allows for this however you must pay for their translation API. How could I do this?
While I'm not sure if there is support for Arabic, there are libraries like OpenCV2 for python and pytesseract to extract text from image. Then you can use another library like translate to finish the process from there. https://pypi.org/project/translate/

How to do isolated word recognition using pocketsphinx

I tried to follow this link for isolated word recognition
Speech to text for single word
But when I am providing -keyphrase, it is giving me the result (keyphrase word) even if I am giving a wrong keyphrase.
Is this expected?

Pound sign (£) getting wrongly identified by Azure RecognizeText API in Cognitive Services

I have many cases of pictures of texts where one can find a pound sign (£) but the sign is NEVER correctly recognized by Azure Cognitive Services RecognizeText API, as far as I tested. Other symbols, like the dollar sign ($) for example, are identified without problems.
I made tests with print screens of texts containing £, since these should be easy for the OCR tool to convert, and again the pound sign is not correctly identified (it becomes an f, a 2, a 1, a $ etc).
I am suspecting that the pound sign is not included in the set of characters that the tool supports, although I couldn't find a specific mention of that in the documentation (only that the tool is experimental and is optimized for English).
Has anyone been able to correctly convert a £ using the tool, or does anyone know FOR SURE (possibly through documentation)
that £ is not included in their character set?
Thanks!

Capital letters and punctuation mark in Python

I am working on an application which has to write text on the screen based on the input coming from a keyboard written in an another Android app.
The server on the PC is written in Python and, actually I am using the keyboard package to deal with key events.
Here the sample code:
import keyboard
keyboard.write('a')
keyboard.write('b')
keyboard.write('1')
keyboard.write('!')
keyboard.write('\"')
Here the result:
ab112
As you can see, I have issues with punctuation mark that need the combination SHIFT + .

Connect textboxes in PDF automatically

I've a document in Fraktur font and performed an OCR with tesseract (language is deu-frak). It took me about 10 days (24h a day) to convert these 23 issues (with each about 400 pages).
The result is a searchable PDF with the original image embedded and the invisible text on top:
Now, I've removed the image with Master PDF Editor and turned the text type from "invisible" to "Full text". Now it turned out, that some words weren't recognized by tesseract as such, so each letter is positioned separately:
Notice, that "kommen" was recognized as word but "fruchtbaren" only as a sequence of characters. This makes it impossible to find "fruchtbaren" with the textsearch and when changing the font-size the letters overlap or create ugly gaps.
I'm using Linux and look for a command-line tool which allows to script all 23 PDF documents.
Is it possible to connect textboxes with a minimum distance or even connecting one line would be great?
Thanks.
Probably not what you want to hear, but I'd go back and experiment with pre-processing, Tesseract parameters, etc on a small representative sample until you get the initial OCR as good as possible (including word segmentation) and then re-run the OCR with your new settings. If you still find that you need some type of post-processing, I'd, again, build and refine the entire pipeline on a small sample before running your full dataset.
On the surface, it looks like something Tesseract could do a better job at, provided you're giving it clean images with enough scan resolution.

Resources