i am trying to extract only the highlighted text from an image using pytesseract module in python.
Issue is that i am unable to extract the highlighted part and the whole image is getting converted to text, and i have no idea how to extract specific part based on the background colour.
The best way to achieve it is by crop and send just the part you need from the image, it will also improve the performance.
There is a related discussion that may help -> Select part of text that was extracted using the Tesseract OCR
Related
I have an image and am trying to separate the background image and text.
For text I have used pytesseract and it gives me all the data. Now my aim is to translate this text and place it back on the image.
For that I need the background image and the position of the text where I need to put the text back.
I need some help or pointers as I have been trying to use OpenCV for same but no luck yet.
Thanks
-Megha
I have been working on PyTesseract OCR and converting PDF to JPEG inorder to OCR the image. A part of the image has a black background and white text, which Tesseract is unable to identify, whereas all other parts of my image are being read perfectly well. Is there a way to change a part of the image that has black background? I tried a few SO resources, but doesn't seem to help.
I am using Python 3, Open CV version 4 and PyTesseract
opencv has a bitwise not function wich correctly reverses the image
you can put a mask / freeze on the rest of the image (the part that is correct already) and use something like this:
imageWithMask = cv2.bitwise_not(imageWithMask)
alternatively you can also perform the operation on a copy of the image and only copy over parts / pixels / regions you need....
I was trying to convert Image PDF to text PDF in Tesseract OCR. In between i need to check for cover page and remove that from the result. Is it possible in Tessaract OCR itself to identify cover page based on specific properties of cover page(cover page text matching). or do i have to take the whole output of tessetact OCR result and provide my logic to scan PDF and remove cover page. I am fully confused and any help will be appreciated.
Theres no way for Tesseract to do that, you should remove the page beforehand, and then hand the PDF image to OCR.
There's a good answer on how to do what i told you at https://stackoverflow.com/a/11541587/9740486
I'm trying to build a translator app which would be able to replace foreign text in the real-time, but after exploring possible approaches got a bit cornered.
Even though I was able to extract words images using Vision, I couldn't replace them in place in ARKit scene. Then I tried using ARReferenceImage and image tracking, but it needs to know the physical width of the target image which I can not guarantee, as the text could be on any surface from a book to a billboard.
Am I missing something? What would you guys suggest?
I've been using the Microsoft OCR API and I'm getting the text from the images but I would like to know if the text is in an specific color or has an specific background color.
For example I have the following image and I would like to know if there is text in red
i.e. image
I thought that this line:
string requestParameters = "language=unk&detectOrientation=true";
would help me to establish the parameters I'd like to recieve from the image so if I wanted to know the color in a line of words. So I added a visual feature like this:
string requestParameters = "visualFeatures=Color,language=unk&detectOrientation=true";
But this did not solve the problem.
Also: Can I mix the uriBase link from the image analysis and the one from the OCR?
There is currently no way to retrieve the color information and OCR results in a single call.
You could try using the bounding boxes returned from OCR to crop the original image, and then send the crop it to the analyze endpoint with visualFeatures=color to get the color information for the detected text.
According to documentation, the possible request parameters of this api are:
language, detectOrientation
and the returned metadata has these entities:
orientation, language, regions, lines, words, boundingBox, text
It will be possible to combine the OCR algorithm with another one of the computer vision algorithms to detect the dominating colors in the text regions that the OCR identified.