pytesseract not recognizing numbers in picture using OCR - python-3.x

I'm trying to use Python-tesseract to extract the digits from this (picture) using optical character recognition (OCR). For some reason pytesseract won't recognize the digits and I don't fully understand why (distance between the numbers?).
Can someone assist me in understanding how to properly extract the digits from this image?
The code below doesn't print anything"sudo.png")
text = pytesseract.image_to_string(im)

A little bit of pre-processing and using ROIs to specify where the words are will help. By default, OCR uses page layout analysis to determine blocks of text. In this case, the image doesn't look like a normal page of text (like a PDF article for example).
To make it easier for OCR, first you can find the location of the words using regionprops and then pass the location of the words (as bounding boxes) to the OCR function. See the code below and results. They look accurate. You may have to play around more with the pre-processing to make this robust for a collection of different images. But hopefully, this gives you an idea on how to proceed:
capture = imread('Captura.PNG');
% Increase image size by 3x
my_image = imresize(capture, 3);
% Localize words
BW = imbinarize(rgb2gray(my_image));
BW1 = imdilate(BW,strel('disk',6));
s = regionprops(BW1,'BoundingBox');
bboxes = vertcat(s(:).BoundingBox);
% Sort boxes by image height
[~,ord] = sort(bboxes(:,2));
bboxes = bboxes(ord,:);
% Pre-process image to make letters thicker
BW = imdilate(BW,strel('disk',1));
% Call OCR and pass in location of words. Also, set TextLayout to 'word'
ocrResults = ocr(BW,bboxes,'CharacterSet','.0123456789','TextLayout','word');
words = {ocrResults(:).Text}';
words = deblank(words)


OpenCV Nodejs prepare image to OCR TesseractJS, remove dots

I'm trying to read data over Tesseract from image captured by webcamera. Here is example of used image:
I'm working on nodejs server, and I tried a lot of technique in Jimp including doing invert/grayscale, using sharpening to image, or fiiltering specific colors /yellow/blue/ ... after all I build separated docker container using opencv4nodejs and apply few techniques to extract text from that image.
I need mostly big texts (so small one are not neccessary /also are not sharp on this image/). So I applied this:
const src = cv.imread('./970f5b45-9f24-41d5-91f0-ef3f8b9d8914.jpeg');
let src2 = src.cvtColor(cv.COLOR_BGR2GRAY)
let dst = src2.adaptiveThreshold(255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 12, 2);
let dst2 = dst.morphologyEx(cv.MORPH_OPEN)
After that I have this result, which is almost ready for reading by OCR, problem is a lot of dots in that image. Is there any chance to remove that dots, but keep quality of result (readable texts) in opencv, or other technique?
Result is right now:
Is it possible to extract just texts from that result? If I use this result in ocr by tesseract, it takes really a long time to extract text, and there is a huge amount of weird characters (probably because of dots/shapes).

Convert unknown labels to Yolov5

I own a dataset of images with unknown label format, which is:
angry_actor_104.jpg 0 28 113 226 141 22.9362 0
It indicates an image as follows:
image_name face_id_in_image face_box_top face_box_left face_box_right face_box_bottom face_box_cofidence expression_label
My question is: How can this be converted into the yolov5 format?
I have been looking this up for a long time and hope someone can help.
Thank you very much in advance.
Since the format is unknown you are unlikely to find existing code to completely handle the transformation but I can share some tips to get started.
The annotations file does not have enough info to get converted to Yolo format. Because to convert to Yolo you also need to know the dimensions of the images. If all of your images are the same dimension then it easier but if all of the images are different then you will need additional code to extract the dimensions of the images. I will explain why below.
When you are done you will need to get the images and labels in a specific directly structure like this, with one txt file per image:
This is the shape that you want to get the annotation files into.
face_id_in_image x_center_image y_center_image width height
There is a clear description of what the values mean here
Now you need to do some math to calculate the values.
width = (face_box_right - face_box_left)/image_width
height = (face_box_bottom - face_box_top)/image_height
x_center_image = face_box_left/image_width + (width/2)
y_center_image = face_box_top/image_height + (height/2)
I have some bits of code that may help you with reading the text file and saving the text files here. and
If you are able to share your exact files I may be able to identify some shortcut to transform them.

How to differentiate Passport and PAN card Scanned images in python

The goal is to identify that the input scanned image is passport or PAN card using Opencv.
I have used structural_similarity(compare_ssim) method of skimage to compare input scan image with the images of template of Passport and PAN card.
But in both cases i got low score.
Here is the code that i have tried
from skimage.measure import compare_ssim as ssim
import matplotlib.pyplot as plt
import numpy as np
import cv2enter code here
img1 = cv2.imread('PAN_Template.jpg', 0)
img2 = cv2.imread('PAN_Sample1.jpg', 0)
def prepare_img(im):
size = 300, 200
im = cv2.resize(im, size)
return im
img1 = prepare_img(img1)
img2 = prepare_img(img2)
def compare_images(imageA, imageB):
s = ssim(imageA, imageB)
return s
ssim = compare_images(img1, img2)
Comparing the PAN Card Template with Passport i have got ssim score of 0.12
and Comparing the PAN Card template with a PAN Card the score was 0.20
Since both the score were very close i wast not able to distinguish between them through the code.
If anyone got any other solution or approach then please help.
Here is a sample image
PAN Scanned Image
You can also compare 2 images by the mean square error (MSE) of those 2 images.
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("float") - imageB.astype("float")) ** 2)
err /= float(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
As per my understanding Pan card and Passport images contain different text data, so i believe OCR can solve this problem.
All you need to do is- extract the text data from the images using any OCR library like Tesseract and look for a few predefined key words in the text data to differentiate the images.
Here is simple Python script showing the image pre-processing and OCR using pyteseract module:
img = cv2.imread("D:/pan.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret,th1 = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
cv2.imwrite('filterImg.png', th1)
pilImg ='filterimg.png')
text = pytesseract.image_to_string(pilImg)
Below is the binary image used for OCR:
I got the below string data after doing the OCR on the above image:
esraax fram EP aca ae
wrtterterad sg
Permanent Account Number. Card \xe2\x80\x98yf
PEF vom ; ae
Reviavs /Father's Name. e.
Though this text data contains noises but i believe it is more than enough to get the job done.
Another OCR solution is to use TextCleaner ImageMagick script from Fred's Scripts. A tutorial which explain how to install and use it (on Windows) is available here.
Script used:
C:/cygwin64/bin/textcleaner -g -e normalize -f 20 -o 20 -s 20 C:/Users/Link/Desktop/id.png C:/Users/Link/Desktop/out.png
I applied OCR on this with Tesseract (I am using version 4) and that's the result:
wort cra teat ears -
Permanent Account Number Card
TT aa
far aT ary /Father's Name
Wa RT /Date of Birth den. +
06/01/1997 genge / Signature
Code for OCR:
import cv2
from PIL import Image
import tesserocr as tr
number_ok = cv2.imread("C:\\Users\\Link\\Desktop\\id.png")
blur = cv2.medianBlur(number_ok, 1)
cv2.imshow('ocr', blur)
pil_img = Image.fromarray(cv2.cvtColor(blur, cv2.COLOR_BGR2RGB))
api = tr.PyTessBaseAPI()
boxes = api.GetComponentImages(tr.RIL.TEXTLINE, True)
text = api.GetUTF8Text()
Now, this don't answer at your question (passport or PAN card) but it's a good point where you can start.
Doing OCR might be a solution for this type of image classification but it might fail for the blurry or not properly exposed images. And it might be slower than newer deep learning methods.
You can use Object detection (Tensorflow or any other library) to train two separate class of image i.e PAN and Passport. For fine-tuning pre-trained models, you don't need much data too. And as per my understanding, PAN and passport have different background color so I guess it will be really accurate.
Tensorflow Object Detection: Link
Nowadays OpenCV also supports object detection without installing any new libraries(i.e.Tensorflow, caffee, etc.). You can refer this article for YOLO based object detection in OpenCV.
We can use:
Histogram Comparison - Simplest & fastest methods, using this we will get the similarity between histograms.
Template Matching - Searching and finding the location of a template image, using this we can find smaller image parts in a bigger one. (like some common patterns in PAN card).
Feature Matching - Features extracted from one image and the same feature will be recognised in another image even if the image rotated or skewed.

Align the Images properly

Hi I am trying to get the handwritten data only from an image, for that I took a empty image and a filled one and then I am doing ImageChops.difference to get the data out of it.
The problem is right now with the alignment of images, both are not equally aligned in terms of depth, so the results are not correct.
from PIL import Image, ImageChops
def compare_images(path_one, path_two, diff_save_location):
Compares to images and saves a diff image, if there
is a difference
#param: path_one: The path to the first image
#param: path_two: The path to the second image
image_one ='LA')
image_two ='LA')
diff = ImageChops.difference(image_one, image_two)
if diff.getbbox():
if __name__ == '__main__':
This is the result which I got.
the result which I am looking for:
Can anyone help me with this.
This site may be helpful: . The main idea is to first detect keypoints use SIFT, SURF or other algorithms in both images; then match the keypoints from the empty image with the keypoints from the handwritten image, to get a homography matrix; then use this matrix to align the two images.
After image alignment, post processing may be needed due to illumination or noise.

Google Earth Engine - RGB image export from ImageCollection Python API

I encounter some problems with the Google Earth Engine python API to generate a RGB image based on an ImageCollection.
Basically to transform the ImageCollection into an Image, I apply a median reduction. After this reduction, I apply the visualize function where I need to define the different variables like the min and max. The problem is that these two values are image dependent.
dataset = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
.filterBounds(ee.Geometry.Polygon([[39.05789266, 13.59051553],
[39.11335033, 13.59051553],
[39.11335033, 13.64477783],
[39.05789266, 13.64477783],
[39.05789266, 13.59051553]]))
.filterDate('2016-01-01', '2016-12-31')
.select(['B4', 'B3', 'B2'])
reduction = dataset.reduce('median')
.visualize(bands=['B4_median', 'B3_median', 'B2_median'],
Thus for each different image I need to process these two values that can sightly change. Since the number of images I need to generate is huge, It is impossible to do that manually. I do not know how to overcome this problem and I cannot find any answer to that problem. An idea would be to find the minimal value of the image and the maximum value. But I did not find any function that allows to do that on the Javascript or python API.
I hope that someone will be able to help me.
You can use img.reduceRegion() to get image statistics for the region you want and for each image to export. You will have to call the results of the region reduction into the visualization function. Here is an example:
geom = ee.Geometry.Polygon([[39.05789266, 13.59051553],
[39.11335033, 13.59051553],
[39.11335033, 13.64477783],
[39.05789266, 13.64477783],
[39.05789266, 13.59051553]])
dataset = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')\
.filterDate('2016-01-01', '2016-12-31')\
.select(['B4', 'B3', 'B2'])
reduction = dataset.median()
stats = reduction.reduceRegion(reducer=ee.Reducer.minMax(),geometry=geom,scale=100,bestEffort=True)
statDict = stats.getInfo()
prettyImg = reduction.visualize(bands=['B4', 'B3', 'B2'],
Using this approach, I get an output image like this:
I hope this helps!
