Python pytesseract extract number from various images - python-3.x

I have various type of images like those:
As you see, they are all kinda similar, however I do not manage to properly extract the number on them.
So far my code consists in the following:
lower = np.array([250,200,90], dtype="uint8")
upper = np.array([255,204,99], dtype="uint8")
mask = cv2.inRange(img, lower, upper)
res = cv2.bitwise_and(img, img, mask=mask)
data = image_to_string(res, lang="eng", config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
numbers = int(''.join(re.findall(r'\d+', data)))
I tried twearking the psm parameter 6,8 and 13 they all work for some of those examples, but none on all, and I have no idea how I could circumvent my problem.
Another solution proposed is:
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
erd = cv2.erode(gry, None, iterations=1)
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
bnt = cv2.bitwise_not(thr)
However, on the first picture, bnt gives:
And then pytesseract sees 460..
Any idea please?

My approach:
Upsample
Erosion
Simple-thresholding
Bitwise-not
Upsampling is required for accurate recognition. Resizing two-times will make the image readable.
Erosion operation is a morphological operation helps to remove the boundary of the pixels. Erosion remove the strokes on the digit, make it easier to detect.
Thresholding (Binary and Inverse Binary) helps to reveal the features.
Bitwise-not is an arithmetic operation highly useful for extracting part of the image.
You can learn more methods simple reading from Improving the quality of the output
Erosion
Threshold
Bitwise-not
Update
The first image is easy to read, since it is not requiring any pre-processing technique. Please read How to Improve Quality of Tesseract
Result:
1460
720
3250
3146
2681
1470
Code:
import cv2
import pytesseract
img_lst = ["oqWjd.png", "YZDt1.png", "MUShJ.png", "kbK4m.png", "POIK2.png", "4W3R4.png"]
for i, img_nm in enumerate(img_lst):
img = cv2.imread(img_nm)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
if i == 0:
thr = gry
else:
gry = cv2.resize(gry, (w * 2, h * 2))
erd = cv2.erode(gry, None, iterations=1)
if i == len(img_lst)-1:
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
else:
thr = cv2.threshold(erd, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6 digits")
print("".join([t for t in txt if t.isalnum()]))
cv2.imshow("bnt", bnt)
cv2.waitKey(0)
If you want to display comma in the result, change print("".join([t for t in txt if t.isalnum()])) line to print(txt).
Not that on the fourth image the threshold method changed from binary to inverse-binary. Binary thresholding is not working accurately on all images. Therefore you need to change.

Related

Python Extract number from simple Image

I have the following image
lower = np.array([175, 125, 45], dtype="uint8")
upper = np.array([255, 255, 255], dtype="uint8")
mask = cv2.inRange(image, lower, upper)
img = cv2.bitwise_and(image, image, mask=mask)
plt.figure()
plt.imshow(img)
plt.axis('off')
plt.show()
now if I try to transform into grayscale like this:
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
I get that:
And I would like to extract the number on it.
The suggestion:
gray = 255 - gray
emp = np.full_like(gray, 255)
emp -= gray
emp[emp==0] = 255
emp[emp<100] = 0
gauss = cv2.GaussianBlur(emp, (3,3), 1)
gauss[gauss<220] = 0
plt.imshow(gauss)
gives the image:
Then using pytesseract on any of the images:
data = pytesseract.image_to_string(img, config='outputbase digits')
gives:
'\x0c'
Another suggested solution is:
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
thr = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV)[1]
txt = pytesseract.image_to_string(thr)
plt.imshow(thr)
And this gives
'\x0c'
Not very satisfying... Anyone has a better solution please?
Thanks!
I have a two step solution
Apply thresholding
Set psm mode to 7.
When you apply thresholding to the image:
Thresholding is a simplest method of displaying the features of the image.
Now from the output image, when we read:
txt = image_to_string(thr, config="--psm 7")
print(txt)
Result will be:
| 1,625 |
Now why do we set page-segmentation-mode (psm) mode to the 7?
Well, treating image as a single text line will give the accurate result.
But we have to modify the result. Since the current result is | 1,625 |
We should remove the |
print("".join([t for t in txt if t != '|']))
Result:
1,625
Code:
import cv2
from pytesseract import image_to_string
img = cv2.imread("LZ3vi.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.threshold(gry, 0, 255,
cv2.THRESH_BINARY_INV)[1]
txt = image_to_string(thr, config="--psm 7")
print("".join([t for t in txt if t != '|']).strip())
Update
how do you get this clean black and white image from my original image?
Using 3-steps
Reading the image using opencv's imread function
img = cv2.imread("LZ3vi.png")
Now we read the image in BGR fashion. (Not RGB)
Convert the image to the graysclae
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Result will be:
Apply threshold
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY_INV)[1]
Result will be:
Now if you are wondering about thresholding. Read the simple-threhsolding
All my filters, grayscale... get weird colored images
The reason is, when you are displaying the image using pyplot, you need to set color-map (cmap) to gray
plt.imshow(img, cmap='gray')
You can read the other types here
Two issues blocked the pytessract from detecting your number:
The white rectangle around the number(Inverting and filling is the solution).
The Noise in the numbers shape(Gaussian Smoothing dealt with that)
The solution that AlexAlex has proposed will work perfectly if it was followed by a Gaussian filter:
output: 1,625
import numpy as np
import pytesseract
import cv2
BGR = cv2.imread('11.png')
RGB = cv2.cvtColor(BGR, cv2.COLOR_BGR2RGB)
lower = np.array([175, 125, 45], dtype="uint8")
upper = np.array([255, 255, 255], dtype="uint8")
mask = cv2.inRange(RGB, lower, upper)
img = cv2.bitwise_and(RGB, RGB, mask=mask)
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
gray = 255 - gray
emp = np.full_like(gray, 255)
emp -= gray
emp[emp==0] = 255
emp[emp<100] = 0
gauss = cv2.GaussianBlur(emp, (3,3), 1)
gauss[gauss<220] = 0
text = pytesseract.image_to_string(gauss, config='outputbase digits')
print(text)

Difficulty reading text with pytesseract

I need to read the highest temperature on thermographic images, as shown below:
IR_1544_INFRA.jpg
IR_1546_INFRA.jpg
IR_1560_INFRA.jpg
IR_1564_INFRA.jpg
I used the following code, this was the best result.
I also tried several other ways, such as: blur, gray scale, binarization, and others but they all failed.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\User\AppData\Local\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold
entrada = cv2.imread('IR_1546_INFRA.jpg')
image = entrada[40:65, 277:319]
#image = cv2.imread('IR_1546_INFRA.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = 255 - cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
In the first image, I found
this
In the second image, I found this.
The imagem layout is always the same, that is, the temperature is always in the same place, so I cropped the image to isolate only the number. I would like (97.7 here, and 85.2 here).
My code needs to find from these images to always detect this temperature and generate a list indicating from highest to lowest.
What do you indicate for me to improve the assertiveness of pytesseract in the case of these images?
Note 1: When I annalyze the entire image (without cropping), it returns data that is not even present.
Note 2: In some images even with the binary number, pytesseract (image_to_string) does not return any data.
Thank you all and sorry for the typos, writing in english is still a challenge for me.
Because you have same images, you can crop the area you want and then do processing there. The processing is also simple. Change to gray, get threshold, invert, resize, and then do the OCR. You can see it in my code below. It works on all your attached images.
import cv2
import pytesseract
import os
image_path = "temperature"
for nama_file in sorted(os.listdir(image_path)):
print(nama_file)
img = cv2.imread(os.path.join(image_path, nama_file))
crop = img[43:62, 278:319]
gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.bitwise_not(thresh)
double = cv2.resize(thresh, None, fx=2, fy=2)
custom_config = r'-l eng --oem 3 --psm 7 -c tessedit_char_whitelist="1234567890." '
text = pytesseract.image_to_string(double, config=custom_config)
print("detected: " + text)
cv2.imshow("img", img)
cv2.imshow("double", double)
cv2.waitKey(0)
cv2.destroyAllWindows()

Crop the rectangular paper from the image

from the discussion : Crop exactly document paper from image
I'm trying to get the white paper from the image and I'm using the following code which not cropping exactly rectangular.
def crop_image(image):
image = cv2.imread(image)
# convert to grayscale image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# threshold
thresh = cv2.threshold(gray, 190, 255, cv2.THRESH_BINARY)[1]
# apply morphology
kernel = np.ones((7, 7), np.uint8)
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
kernel = np.ones((9, 9), np.uint8)
morph = cv2.morphologyEx(morph, cv2.MORPH_ERODE, kernel)
# Get Largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
contours = contours[0] if len(contours) == 2 else contours[1]
area_thresh = 0
for cnt in contours:
area = cv2.contourArea(cnt)
if area > area_thresh:
area_thresh = area
big_contour = cnt
# get bounding box
x, y, w, h = cv2.boundingRect(big_contour)
# draw filled contour on black background
mask = np.zeros_like(gray)
mask = cv2.merge([mask, mask, mask])
cv2.drawContours(mask, [big_contour], -1, (255, 255, 255), cv2.FILLED)
# apply mask to input
result = image.copy()
result = cv2.bitwise_and(result, mask)
# crop result
img_result = result[y:y+h, x:x+w]
filename = generate_filename()
cv2.imwrite(filename, img_result)
logger.info('Successfully saved cropped file : %s' % filename)
return img_result, filename
I'm able to get the desired result but not the rectangular image.
Here I'm attaching and here is what I'm getting after cropping image .
I want a rectangular image of the paper.
Please help me with this.
Thanks in advance
The first problem I can see is that the threshold value is not low enough so the bottom part of the paper is not correctly capture (it's too dark to be captured by the threshold)
The second problem as far I can understand is being able to fit the square to the image. What you need to do is wrapping perspective.
To do that you can find more information in this amazing post of PyImageSearch

Pupil Detection in eye images using python

I need to mark a pupil in an image like this of the eye. I have written this code
img_name='6.jpg'
image = cv2.imread(img_name)
image_copy_new=cv2.imread(img_name)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
retval, thresholded = cv2.threshold(gray, 30, 255, cv2.THRESH_BINARY_INV)
plt.imshow(thresholded,cmap="gray")
This produces output like this -
Then I searched for the contours in the images and tried to find only the most circular one in the image through this code
contours, hierarchy = cv2.findContours(thresholded, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
image_copy = np.zeros_like(image) # create a new emtpy image
for cnt in contours:
peri = cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
(x, y, w, h) = cv2.boundingRect(cnt)
ar = w / float(h)
if w*h > 20 and 0.9 < ar < 1.1: # filtering condition
cv2.drawContours(image, [cnt], 0, 255, -1)
While this produces great results in some cases where the eyes are in front facing direction but in other cases(like this one) it completely fails. I have tried many other things like "hough transform, different morphs" but I'm not able to tackle this problem.
The images are of only eyes and not the whole face else dlibs face detection would've worked
The cases where this code works is
Thanks for taking time and helping me out.
Adding some blurring, erosion and dilation may help. Eroding will remove very small features, like the noise around the eyelashes, and dilation will bring any surviving points back up to size. By tweaking the erosion and dilation sizes, you should be able to get rid of most of the noise and make that center pupil look much better.
Here's an example of how I would do this:
gray = cv2.cvtColor(frame_in, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
thresh = cv2.threshold(blurred, 30, 255, cv2.THRESH_BINARY)[1]
erosion_size = 10
dilate_size = 8
thresh = cv2.erode(thresh, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (erosion_size, erosion_size)))
thresh = cv2.dilate(thresh, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (dilate_size, dilate_size)))

How can i count segments in an image in python?

I am new to image processing and python. You might've seen my amateur codes on this site in the last couple of days.
I am trying to count the number of trees using aerial images. This is my code:
from PIL import Image
import cv2
import numpy as np
from skimage import io, filters, measure
from scipy import ndimage
img = Image.open("D:\\Texture analysis\\K-2.jpg")
row, col = img.size
hsvimg = img.convert('HSV')
hsvimg.mode = 'RGB'
hsvimg.save('newImage2.jpg')
npHSI = np.asarray(hsvimg) #Convert HSI Image to np image
blur = cv2.GaussianBlur(npHSI, (45, 45), 5)
assert isinstance(blur, np.ndarray) ##############################
assert len(blur.shape) == 3 #Convert np Image to HSI Image
assert blur.shape[2] == 3 ##############################
hsiBlur = Image.fromarray(blur, 'RGB')
hsiBlur.save('hsiBlur.jpg') #Save the blurred image
## Read
img = cv2.imread("D:\\Texture analysis\\hsiBlur.jpg")
## convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
#Threshold the image and segment the trees
mask = cv2.inRange(hsv, (36, 25, 25), (70, 255,255))
imask = mask>0
green = np.zeros_like(img, np.uint8)
green[imask] = img[imask]
## save
cv2.imwrite("green.png", green)
#Count the number of trees
im = io.imread('green.png', as_grey=True)
val = filters.threshold_otsu(im)
drops = ndimage.binary_fill_holes(im < val)
labels = measure.label(drops)
print(labels.max())
Original image:
HSI image with gaussian filter:
Segmented image:
The last part of the code returns 7, which is a wrong output. The value should be above 50. How can I properly count the number of green segments in the final segmented image?
EDIT
I converted green.png to binary and applied erosion with a 3x3 filter and iterated it 7 times to remove the noise.
This is what I did at the end. I followed this stackoverflow link
##save
cv2.imwrite("green.png", green)
#Convert to grayscale
gray = np.dot(green[...,:3], [0.299, 0.587, 0.114])
cv2.imwrite("grayScale.jpg", gray)
#Binarize the grayscale image
ret,bin_img = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
cv2.imwrite("bin_img.jpg", bin_img)
#Erosion to remove the noise
kernel = np.ones((3, 3),np.uint8)
erosion = cv2.erode(gray, kernel, iterations = 7)
cv2.imwrite("erosion.jpg", erosion)
#Count the number of trees
finalImage = cv2.imread('erosion.jpg')
finalImage = cv2.cvtColor(finalImage, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(finalImage, 127, 255, 1)
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
cv2.drawContours(finalImage,[cnt],0,(0,0,255),1)
Saurav mentioned in his answer ... size of "contours" will give you the count. This print(contour.size())gives an error and print(contour) just prints a long 2D array. How can i get the size of contour?
PS. I didn't upload the grayscale, binary and eroded image because i felt that the images were already taking too much space, I can still upload them if anyone wants to.
I've found 52 trees with that script:
from PIL import Image, ImageDraw, ImageFont
image = Image.open('04uX3.jpg')
pixels = image.load()
size = image.size
draw = ImageDraw.Draw(image)
font = ImageFont.truetype('arial', 60)
i = 1
for x in range(0, size[0], 100):
for y in range(0, size[1], 100):
if pixels[x, y][1] > 200:
draw.text((x, y), str(i), (255, 0, 0), font=font)
i += 1
image.save('result.png')
You can see that some trees weren't detected and some non-trees were detected. So this is very rough calculation:

Resources