I scoured the internet but could not find an answer to this really simple question.
When I use Canny edge detection on my image the text shows up as outlined by the edges. For some reason, Tesseract OCR does not recognize the text then. It does recognize the text when it is not outlined like that and just normal text.
What do I have to do to get tesseract to recognize the outlined text? I tried thinning the image first implementing a thinning algorithm from scratch but that just seemed to make the Canny edge detection worse.
I googled and even asked chatGPT but I'm unable to find a solution and hope to get some guidance here.
First I've to mention that I'm not a programmer but rather a beginner.
Following a short description of what I'm trying to achive and what I've done so far.
I gather data and create a circular visualization using Circos which produces SVG
and PNG images.
(unfortunately the PNG doesn't give me the option of searching for
text an make replaecments), nevertheless I can use them to sucessfuly produce a
MPEG movie using FFmpeg. Therefore I need to use the SVG output to apply the
desired changes.
So I tried to use CairoSVG to render the SVG file to a PNG image but it does not
render emojis by default because the are not part of the SVG specification and
CairoSVG only supports features defined in the SVG specification. The Emojis are
stored as Unicode characters and are not natively supported in SVG
Next I tried to use PIL (Python Imaging Library) as it provides support for Unicode
characters, including emojis, when converting images to and from various formats.
Unfortunately PIL does not have native support for converting SVG files to PNG and
it seems that PIL is primarily designed for creating and manipulating images in a
variety of formats, but does not have built-in support for reading or converting
SVG files.
So now my questions are:
Would FFmpeg give me the desired results, if I compile it using the --enable-
librsvg option so it can convert a sequence of SVG images to a video but i'm not
sure if it supports emojis rendered correctly and want to spare me the hassle as
I'm pretty sure to struggle compiling it on my Mac running Ventura?
Are the maybe other ways or posibilities to solve that problem?
Many thanks in advance for your help or any hint :-)
Have all a nice weekend and take care
NB: an example of the circular visualization can be found here animated graph and the static version annotated graph
Problem solved, I used the html2image Python module which converts the SVG (including embedded Emoji's) nicely to a PNG image an then use those images to create a MPG4 video using FFmpeg.
i am trying to extract only the highlighted text from an image using pytesseract module in python.
Issue is that i am unable to extract the highlighted part and the whole image is getting converted to text, and i have no idea how to extract specific part based on the background colour.
The best way to achieve it is by crop and send just the part you need from the image, it will also improve the performance.
There is a related discussion that may help -> Select part of text that was extracted using the Tesseract OCR
I am trying to isolate shadows from this image and remove them:
The reason why I am doing that is because shadow is problematic for my edge detection algorithm.
What should I do to remove the shadow? I haven't done this before, so I do not even know where to start from.
From the similar questions on SO I wasn't able to find anything to help me with my task.
I have the image in both: png and jpg format, so I am not even sure which format to use to start with.
That's a very interesting question. One option you can try is to divide the RGB values in the image by the grayscale intensity of the image. There is apparently another method explained here: https://onlinelibrary.wiley.com/doi/full/10.1002/col.21889.
I'm trying to build a translator app which would be able to replace foreign text in the real-time, but after exploring possible approaches got a bit cornered.
Even though I was able to extract words images using Vision, I couldn't replace them in place in ARKit scene. Then I tried using ARReferenceImage and image tracking, but it needs to know the physical width of the target image which I can not guarantee, as the text could be on any surface from a book to a billboard.
Am I missing something? What would you guys suggest?
I've been using the Microsoft OCR API and I'm getting the text from the images but I would like to know if the text is in an specific color or has an specific background color.
For example I have the following image and I would like to know if there is text in red
i.e. image
I thought that this line:
string requestParameters = "language=unk&detectOrientation=true";
would help me to establish the parameters I'd like to recieve from the image so if I wanted to know the color in a line of words. So I added a visual feature like this:
string requestParameters = "visualFeatures=Color,language=unk&detectOrientation=true";
But this did not solve the problem.
Also: Can I mix the uriBase link from the image analysis and the one from the OCR?
There is currently no way to retrieve the color information and OCR results in a single call.
You could try using the bounding boxes returned from OCR to crop the original image, and then send the crop it to the analyze endpoint with visualFeatures=color to get the color information for the detected text.
According to documentation, the possible request parameters of this api are:
language, detectOrientation
and the returned metadata has these entities:
orientation, language, regions, lines, words, boundingBox, text
It will be possible to combine the OCR algorithm with another one of the computer vision algorithms to detect the dominating colors in the text regions that the OCR identified.