How to make tesseract work for different kinds of images? - python-3.x

I am working on a digit recognition task using Tesseract and OpenCV. I did use it and came to a solution that is specific for a particular image. If I change my image I do not obtain correct results. I should change the threshold value according to the image. The steps I did was:
Cropping the image to find an appropriate region
Change image into grayscale
Using Gaussian Blur
Taking appropriate threshold
passing the image through Tesseract
So, My question is how can I make my code generic i.e. it can be used for different images without updating my code.
While working on this image I processed as`
imggray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
imgBlur=cv2.GaussianBlur(imggray,(5,5), 0)
imgDil=cv2.dilate(imgBlur,np.ones((5,5),np.uint8),iterations=1)
imgEro=cv2.erode(imgDil,np.ones((5,5),np.uint8),iterations=2)
ret,imgthresh=cv2.threshold(imgEro,28,255, cv2.THRESH_BINARY )
And for this Image as
imggray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
imgBlur=cv2.GaussianBlur(imggray,(5,5), 0)
imgDil=cv2.dilate(imgBlur,np.ones((5,5),np.uint8),iterations=0)
imgEro=cv2.erode(imgDil,np.ones((5,5),np.uint8),iterations=0)
ret,imgthresh=cv2.threshold(imgEro,37,255, cv2.THRESH_BINARY )
I had to change the value of iterations and the minimum threshold to obtain proper results. What can be the solution so I should not change the values?

Related

Convert unknown labels to Yolov5

I own a dataset of images with unknown label format, which is:
angry_actor_104.jpg 0 28 113 226 141 22.9362 0
It indicates an image as follows:
image_name face_id_in_image face_box_top face_box_left face_box_right face_box_bottom face_box_cofidence expression_label
My question is: How can this be converted into the yolov5 format?
I have been looking this up for a long time and hope someone can help.
Thank you very much in advance.
Since the format is unknown you are unlikely to find existing code to completely handle the transformation but I can share some tips to get started.
The annotations file does not have enough info to get converted to Yolo format. Because to convert to Yolo you also need to know the dimensions of the images. If all of your images are the same dimension then it easier but if all of the images are different then you will need additional code to extract the dimensions of the images. I will explain why below.
When you are done you will need to get the images and labels in a specific directly structure like this, with one txt file per image:
/images/actor1.jpg
/images/actor2.jpg
/labels/actor1.txt
/labels/actor2.txt
This is the shape that you want to get the annotation files into.
face_id_in_image x_center_image y_center_image width height
There is a clear description of what the values mean here https://stackoverflow.com/a/66563144/5183735.
Now you need to do some math to calculate the values.
width = (face_box_right - face_box_left)/image_width
height = (face_box_bottom - face_box_top)/image_height
x_center_image = face_box_left/image_width + (width/2)
y_center_image = face_box_top/image_height + (height/2)
I have some bits of code that may help you with reading the text file and saving the text files here.
https://github.com/pylabel-project/pylabel/blob/main/pylabel/exporter.py and https://github.com/pylabel-project/pylabel/blob/main/pylabel/importer.py.
If you are able to share your exact files I may be able to identify some shortcut to transform them.

Compare images for finding the images are visibly same

Need to find the images are same , even if it has different resolution and size. No need of a pixel to pixel comparison, need to find all the images , texts, color etc. are same in the other image also.
Tried with different python packages to compare, but all are asking for same resolution. One of my image is screenshotted from Mac and other is from Ubuntu. Even both are same html, the contrast and resolution difference of the machines causing the images to become different when compare.
Tried.
Perceptual diff,
Image Hash etc.
• PDIFFER – Python wrapper for perceptualdiff tool (https://pypi.org/project/pdiffer/)
Problem - pip install pdiffer failed to install in the Macs for the latest as well as old versions
• NEEDLE - Installed needle (https://needle.readthedocs.io/en/latest/) This one has an option to specify comparison engine to be perceptualdiff/Imagemagick instead of default PIL.
Problem Found - Has an option to save baseline image first and then run assertions on them. It works when baseline is saved and then compared. I didn’t find anything that would compare the screenshots with existing images.
• OPENCV – Histogram based comparison. This converts images into grayscale, and into histograms and compares the histograms. Returns a value between -1 and 1 (-1 means not similar at all and 1 means highly similar) (https://www.pyimagesearch.com/2014/07/14/3-ways-compare-histograms-using-opencv-python/ )
Findings - I tested two images by converting into histograms and compared them Which returned a value of 0.8 (meaning somewhat similar).
Below code i tried using imagehash:
from PIL import Image import imagehash
image_one = 'result.png'
img = Image.open(image_one) image_one_hash = imagehash.whash(img)
print(image_one_hash)
image_two = 'not-found-02.png'
img2 = Image.open(image_two) image_two_hash = imagehash.whash(img2)
print(image_two_hash)
similarity = image_one_hash - image_two_hash print(similarity)

Is there a way to ignore EXIF orientation data when loading an image with PIL?

I'm getting some unwanted rotation when loading images using PIL. I'm loading image samples and their binary mask, so this is causing issues. I'm attempting to convert the code to use openCV instead, but this is proving sticky. I haven't seen any arguments in the documentation under Image.load(), but I'm hoping there's a workaround I just haven't found...
There is, but I haven't written it all up. Basically, if you load an image with EXIF "Orientation" field set, you can get that parameter.
First, a quick test using this image from the PIL GitHub source Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg and run jhead on it you can see the EXIF orientation is 6:
jhead /Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg
File name : /Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg
File size : 4951 bytes
File date : 2020:04:24 14:00:09
Resolution : 128 x 128
Orientation : rotate 90 <--- see here
JPEG Quality : 75
Now do that in PIL:
from PIL import Image
# Load that image
im = Image.open('/Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg')
# Get all EXIF data
e = im.getexif()
# Specifically get orientation
e.get(0x0112)
# prints 6
Now click on the source and you can work out how your image has been rotated and undo it.
Or, you could be completely unprofessional ;-) and create a function called SneakilyRemoveOrientationWhileNooneIsLooking(filename) and shell out (subprocess) to exiftool and remove the orientation with:
exiftool -Orientation= image.jpg
Author's "much simpler solution" detailed in above comment is misleading so I just wanna clear that up.
Pillow does not automatically apply EXIF orientation transformation when reading an image. However, it has a method to do so: PIL.ImageOps.exif_transpose(image)
OpenCV automatically applies EXIF orientation when reading an image. You can disable this behavior by using the IMREAD_IGNORE_ORIENTATION flag.
I believe the author's true intention was to apply the EXIF orientation rather than ignore it, which is exactly what his solution accomplished.

Problems Converting Numpy/OpenCV Array Image into a Wand Image

I'm currently trying to perform a Polar to Cartesian Coordinate Image transformation, to display a raw sonar image into a 'fan-display'.
Initially I have a Numpy Array image of type np.float64, that can be seen below:
After doing some searching, I came across this StackOverflow post Inverse transform an image from Polar to Cartesian in OpenCV with a very similar problem, in which the poster seemed to have solved his/her issue by using the Python Wand library (http://docs.wand-py.org/en/0.5.9/index.html), specifically using their set of Distortion functions.
However, when I tried to use Wand and read the image in, I instead ended up with Wand getting the image below, which seems to be smaller than the original one. However, the weird thing is that img.size still gives the same size number as the original image's shape.
The code for this transformation can be seen below:
print(raw_img.shape)
wand_img = Image.from_array(raw_img.astype(np.uint8), channel_map="I") #=> (369, 256)
display(wand_img)
print("Current image size", wand_img.size) #=> "Current image size (369, 256)"
This is definitely quite problematic as Wand will automatically give the wrong 'fan image'. Is anybody familiar with this kind of problem with the Wand library previously, and if yes, may I ask what is the recommended solution to fix this issue?
If this issue isn't resolved soon I have an alternative backup of using OpenCV's cv::remap function (https://docs.opencv.org/4.1.2/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121). However the problem with this is that I'm not sure what mapping arrays (i.e. map_x and map_y) to use to perform the Polar->Cartesian transformation, as using a mapping matrix that implements the transformation equations below:
r = polar_distances(raw_img)
x = r * cos(theta)
y = r * sin(theta)
didn't seem to work and instead threw out errors from OpenCV as well.
Any kind of help and insight into this issue is greatly appreciated. Thank you!
- NickS
EDIT I've tried on another image example as well, and it still shows a similar problem. So first, I imported the image into Python using OpenCV, using these lines of code:
import matplotlib.pyplot as plt
from wand.image import Image
from wand.display import display
import cv2
img = cv2.imread("Test_Img.jpg")
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.figure()
plt.imshow(img_rgb)
plt.show()
which showed the following display as a result:
However, as I continued and tried to open the img_rgb object with Wand, using the code below:
wand_img = Image.from_array(img_rgb)
display(img_rgb)
I'm getting the following result instead.
I tried to open the image using wand.image.Image() on the file directly, which is able to display the image correctly when using display() function, so I believe that there isn't anything wrong with the wand library installation on the system.
Is there a missing step that I required to convert the numpy into Wand Image that I'm missing? If so, what would it be and what is the suggested method to do so?
Please do keep in mind that I'm stressing the conversion of Numpy to Wand Image quite crucial, the raw sonar images are stored as binary data, thus the required use of Numpy to convert them to proper images.
Is there a missing step that I required to convert the numpy into Wand Image that I'm missing?
No, but there is a bug in Wand's Numpy implementation in Wand 0.5.x. The shape of OpenCV's ndarray is (ROWS, COLUMNS, CHANNELS), but Wand's ndarray is (WIDTH, HEIGHT, CHANNELS). I believe this has been fixed for the future 0.6.x releases.
If so, what would it be and what is the suggested method to do so?
Swap the values in img_rgb.shape before passing to Wand.
img_rgb.shape = (img_rgb.shape[1], img_rgb.shape[0], img_rgb.shape[2],)
with Image.from_array(img_rgb) as img:
display(img)

Align the Images properly

Hi I am trying to get the handwritten data only from an image, for that I took a empty image and a filled one and then I am doing ImageChops.difference to get the data out of it.
The problem is right now with the alignment of images, both are not equally aligned in terms of depth, so the results are not correct.
from PIL import Image, ImageChops
def compare_images(path_one, path_two, diff_save_location):
"""
Compares to images and saves a diff image, if there
is a difference
#param: path_one: The path to the first image
#param: path_two: The path to the second image
"""
image_one = Image.open(path_one).convert('LA')
image_two = Image.open(path_two).convert('LA')
diff = ImageChops.difference(image_one, image_two)
if diff.getbbox():
diff.convert('RGB').save(diff_save_location)
if __name__ == '__main__':
compare_images('images/blank.jpg',
'images/filled.jpg',
'images/diff.jpg')
This is the result which I got.
the result which I am looking for:
Can anyone help me with this.
Thanks.
This site may be helpful: https://www.learnopencv.com/image-alignment-feature-based-using-opencv-c-python/ . The main idea is to first detect keypoints use SIFT, SURF or other algorithms in both images; then match the keypoints from the empty image with the keypoints from the handwritten image, to get a homography matrix; then use this matrix to align the two images.
After image alignment, post processing may be needed due to illumination or noise.

Resources