my goal is to pre-process image (extracted from a video) for OCR detection.
Text is always black, like this example:
I tried to use age framering and HVS mask:
cv2.accumulateWeighted(frame,avg2,0.005)
#res2 = cv2.convertScaleAbs(avg2)
# Convert BGR to HSV
hsv = cv2.cvtColor(imgray, cv2.COLOR_BGR2HSV)
# define range of black color in HSV
lower_val = np.array([0,0,0])
upper_val = np.array([179,255,127])
# Threshold the HSV image to get only black colors
mask = cv2.inRange(hsv, lower_val, upper_val)
# invert mask to get black symbols on white background
mask_inv = cv2.bitwise_not(mask)
cv2.imshow("Mask", mask)
But result are not good enought.
Looking for some possible workaroud.
Thx
These type of images, where text instances can not be separated easily, tesseract won't provide with good results. Tesseract is a good option if you want to extract text from document/papaer/pdfs, etc. where text instances are clear.
For your problem, I would suggest you to follow text detection and text recognition models separetely. For text detection, you can use state-of-the-art models like east text detector, which is able to locate text in diffiuclt images. It will generate bounding boxes around text in the images and then this box are can be given to another text recognition model, which will perform actual recognition task.
For text detection : East or any other latest model
For text recognition: CRNN based models
Please tryto implement above models and I am sure they will perform way better than what you are getting from Tesseract:)
BR!
Related
`This has color defect.
This has crack defect.
This has scratch defect.
This has imprinting defect
input_img = cv2.resize(input_img,(500,500),interpolation=cv2.INTER_LINEAR)
gray_i_image = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
blur_image = cv2.blur(gray_i_image,(3,3))
`
This is what I know in which I am resizing the image, then converting it to grey scale and then clearing the noise from data image. After that I don't know what to do.
I want an output image in which the defected area is highlighted as in rectangle, I know we have to use contours for it. but I dont know how to.
Now I have an image that contains some text and it has a colored background , I want to extract it using tesseract but first i want to replace the colored background with white one and make the text itself black to increase the accuracy of detection process .
i was trying to use Canny Detection
import cv2
import numpy as np
image=cv2.imread('tt.png')
cv2.imshow('input image',image)
cv2.waitKey(0)
gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
edged=cv2.Canny(gray,30,200)
edged = cv2.bitwise_not(edged)
cv2.imshow('canny edges',edged)
cv2.waitKey(0)
that worked fine to replace the colored background with white but made the text's color white with black outlines (check the below images) .
so is there any way to make the whole text colored black ?
or
is there another way i can use to make that ?
before Canny detection
after Canny detection
Edit
the image may has mixed background colors like
input image
You should simply do it by using THRESH_BINARY_INV, it is the code:
cv::namedWindow("Original_Image", cv::WINDOW_FREERATIO);
cv::namedWindow("Result", cv::WINDOW_FREERATIO);
cv::Mat originalImg = cv::imread("BCQqn.png");
cv::Mat gray;
cv::cvtColor(originalImg, gray, cv::COLOR_BGR2GRAY);
cv::threshold(gray, gray, 130, 255, cv::THRESH_BINARY_INV);
cv::imshow("Original_Image", originalImg);
cv::imshow("Result", gray);
cv::waitKey();
And it is the result:
You can play with the threshold value (130 in the above example).
Note: The code is in C++, if you are using Python, then you can go the same steps, and is that.
Good Luck!!
I have been working on PyTesseract OCR and converting PDF to JPEG inorder to OCR the image. A part of the image has a black background and white text, which Tesseract is unable to identify, whereas all other parts of my image are being read perfectly well. Is there a way to change a part of the image that has black background? I tried a few SO resources, but doesn't seem to help.
I am using Python 3, Open CV version 4 and PyTesseract
opencv has a bitwise not function wich correctly reverses the image
you can put a mask / freeze on the rest of the image (the part that is correct already) and use something like this:
imageWithMask = cv2.bitwise_not(imageWithMask)
alternatively you can also perform the operation on a copy of the image and only copy over parts / pixels / regions you need....
I've been using the Microsoft OCR API and I'm getting the text from the images but I would like to know if the text is in an specific color or has an specific background color.
For example I have the following image and I would like to know if there is text in red
i.e. image
I thought that this line:
string requestParameters = "language=unk&detectOrientation=true";
would help me to establish the parameters I'd like to recieve from the image so if I wanted to know the color in a line of words. So I added a visual feature like this:
string requestParameters = "visualFeatures=Color,language=unk&detectOrientation=true";
But this did not solve the problem.
Also: Can I mix the uriBase link from the image analysis and the one from the OCR?
There is currently no way to retrieve the color information and OCR results in a single call.
You could try using the bounding boxes returned from OCR to crop the original image, and then send the crop it to the analyze endpoint with visualFeatures=color to get the color information for the detected text.
According to documentation, the possible request parameters of this api are:
language, detectOrientation
and the returned metadata has these entities:
orientation, language, regions, lines, words, boundingBox, text
It will be possible to combine the OCR algorithm with another one of the computer vision algorithms to detect the dominating colors in the text regions that the OCR identified.
I'm trying to use optical character recognition (OCR) to read text printed on digital video (DV) tapes. I'm using cropped still frames from the video for the OCR process. The text is white, but there are color artifacts (maybe composite color artifacts) so that the white text has color bleeding onto it (see example below). The colors look to be in magenta-cyan-yellow colorspace, maybe?
OCR results would likely be improved if I could remove/filter those colors to leave only white on the text. Then I can create a binary black/white image. I can do this now, but I suspect results will improve if I can remove colors from the white text before OCR, and this will hopefully help separate the white text from the background image.
Are there any ways, using Imagemagick preferably, to filter out those colors from the white text? I'm not sure of the best way to approach this since there are multiple colors bleeding, and the background changes in each frame. Currently using Imagemagick version 6.9.2-3 Q16 x64 on Windows 7.
Sample full-frame image:
Sample of cropped region with text (note color-bleed and white text blending into background):
I would suggest leveraging ImageMagick's FX & Morphology Dilate to preprocess the image. But to be honest, it'll take a bit of trial & error to find the solution that would work for you. I would also recommend that whatever solution you develop allows graceful error handling (i.e. If attempted OCR process unsuccessful, emit warning, and progress video to next I-frame & repeat.)
Fx Preprocessing
The -fx operator will allow you to create user-defined mathematical expression. Some quick google search about chrome-keys, and other tolerance methods might be helpful. But for many OCR techniques, it's usually common to reduce the colors to a "uniformed" gray scale.
convert aaA7b.png -fx 'intensity' intensity.png
Morphology Preprocessing
Morphology allows common & custom kernels to alter surrounding pixels. As video scanlines + other artifacts are distorting the text, I would recommend exploring Dilate, but there are many other techniques listed in the Usage documents.
Diamond
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Diamond:1 diamond.png
Square
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Square:1 square.png
Plus
convert aaA7b.png -fx 'intensity' \
-morphology Dilate Plus:1 plus.png
Custom
And if you need something more exact, create your own kernel by supplying the following format size: row1 row2 ... rowN. In this example, I'm creating a 3x3 kernel with a single vertical line to offset the video scanlines.
convert aaA7b.png -fx 'intensity' \
-morphology Dilate \
'3x3: nan,1,nan nan,1,nan nan,1,nan' user_defined.png
But YMMV. Also take a look at Fred's TextCleaner script. The -deskew & -sharpen operators will help reduce the noise.
Sample of cropped region with text (note color-bleed and white text blending into background):
I think there's a saying "You can't make steak from a hamburger." or something like that. At some point the background will washout the text in the foreground, and it's time better spent to create a solution that acknowledges this.