Python 3.x - Slightly Precise Optical Character Recognition. What should I use? - python-3.x

import time
# cv2.cvtColor takes a numpy ndarray as an argument
import numpy as nm
import pytesseract
# importing OpenCV
import cv2
from PIL import ImageGrab, Image
bboxes = [(1469, 1014, 1495, 1029)]
def imToString():
# Path of tesseract executable
pytesseract.pytesseract.tesseract_cmd = 'D:\Program Files (x86)\Tesseract-OCR' + chr(92) + 'tesseract.exe'
while (True):
for box in bboxes:
# ImageGrab-To capture the screen image in a loop.
# Bbox used to capture a specific area.
cap = ImageGrab.grab(bbox=box)
# Converted the image to monochrome for it to be easily
# read by the OCR and obtained the output String.
tesstr = pytesseract.image_to_string(
cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY), lang='eng', config='digits') # ,lang='eng')
cap.show()
#input()
time.sleep(5)
print(tesstr)
# Calling the function
imToString()
It captures an image like this:
It isn't always two digits it can be one or three digits too.
Pytesseract returns values like: asi and oli
So, which Image To Text (OCR) Algorithm should I use for this problem? And, how to use that? I need a very precise value in this example it's 53 so the output should be around 50.

Related

pytesseract unable to recognize complex math formula from image

I am using pytesseract module in python, pytesseract recognizes text from image but it dosen't work on images that contain complex math formulas like under-root, derivation, integration math problem or equation.
code 2.py
# Import modules
from PIL import Image
import pytesseract
import cv2
# Include tesseract executable in your path
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Create an image object of PIL library
image = Image.open('23.jpg')
# img = cv2.imread('123.jpg')
# pass image into pytesseract module
# pytesseract is trained in many languages
image_to_text = pytesseract.image_to_string(image, lang='eng+equ')
image_to_text1 = pytesseract.image_to_string(image)
# Print the text
print(image_to_text)
# print(image_to_text1)
# workon digits
Output:
242/33
2x
2x+3X
2X+3x=4
2x?-3x +1=0
(x-1)(x+1) =x2-1
(x+2)/((x+3)(x-4))
7-4=3
V(x/2) =3
2xx—343=6x—3 (x#3)
Jeeta =e* +e
dy 2
S=2?-3
dz ¥
dy = (a? — 3)dx
Input image
To work with MATH language you should install the proper language for tesseract. In your case it is 'equ' from https://github.com/tesseract-ocr/tessdata/raw/3.04.00/equ.traineddata . The full list of available languages is described at https://tesseract-ocr.github.io/tessdoc/Data-Files
I'm not familiar with tesseract language install for windows. But there is a documentation at https://github.com/tesseract-ocr/tesseract/wiki :
If you want to use another language, download the appropriate training
data, unpack it using 7-zip, and copy the .traineddata file into the
'tessdata' directory, probably C:\Program Files\Tesseract-OCR\tessdata
And at first try to process your image with cli only ( without pyhton ), because cli has a full list of the options to tune.
I used this Code and it worked!
import re
import cv2
import pytesseract as tess
path = (r"C:\Users\10\AppData\Local\Tesseract-OCR\tesseract.exe")
tess.pytesseract.tesseract_cmd = path
png = "Downloads/m.png"
text = tess.image_to_string(png)
text.replace(" ", "")
pattern = re.compile("([0-9][=+-/*])")
equations = [x for x in text if bool(re.match(pattern, x))]
print(re.findall(r'(.*)', str(text))[0])

Error Opening PGM file with PIL and SKIMAGE

I have following Image file:
Image
I used PIL and Skimage to open it but I get following errors
First with PIL (tried with and without trucate option):
Code:
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
img = Image.open("image_output.pgm")
Erorr:
OSError: cannot identify image file 'image_output.pgm'
And with Skimage:
Code:
from skimage import io
img = io.imread("image_output.pgm")
Error:
OSError: cannot identify image file <_io.BufferedReader name='image_output.pgm'>
I can open the file with GUI applications like system photo viewer and Matlab.
How can I diagnose what is wrong with image? I compared the byte data with other PGM files which I can open in Python, but could not identify the difference.
Thanks.
Your file is P2 type PGM, which means it is in ASCII - you can view it in a normal text editor. It seems neither PIL, nor skimage want to read that, but are happy to read the corresponding P5 type which is identical except it is written in binary, rather than ASCII.
There are a few options...
1) You could use OpenCV to read it:
import cv2
im = cv2.imread('a.pgm')
2) You could convert it to P5 with ImageMagick and then read the output.pgm file with skimage or PIL:
magick input.pgm output.pgm
3) If adding OpenCV, or ImageMagick as a dependency is a real pain for you, it is possible to read a PGM image yourself:
#!/usr/bin/env python3
import re
import numpy as np
# Open image file, slurp the lot
with open('input.pgm') as f:
s = f.read()
# Find anything that looks like numbers
# Technically, there could be comments that should be ignored
l=re.findall(r'[0-9P]+',s)
# List "l" will contain: P5, width, height, 255, pixel1, pixel2, pixel3...
# Technically, if l[3]>255, you should change the type of the Numpy array to uint16, but that is not the case
w, h = int(l[1]), int(l[2])
# Make Numpy image from data
ni = np.array(l[4:],dtype=np.uint8).reshape((h,w))

how to show binary image data in python?

i can show the image using image.open, but how do i display from the binary data?
trying to use plot gets: ValueError: x and y can be no greater than 2-D, but have shapes (64,) and (64, 64, 3). this makes sense as that is what the result is supposed to be, but how do i display it?
import pathlib
import glob
from os.path import join
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
def parse(image): # my like ings, but with .png instead of .jpeg.
image_string = tf.io.read_file(image)
image = tf.image.decode_png(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [64, 64])
return image
root = "in/flower_photos/tulips"
path = join(root,"*.jpg")
files = sorted(glob.glob(path))
file=files[0]
image = Image.open(file)
image.show()
binary=parse(file)
print(type(binary))
# how do i see this?
#plt.plot(binary) # does not seem to work
#plt.show() # does not seem to work
found a nice pillow tutorial.
from matplotlib import image
from matplotlib import pyplot
from PIL import Image
# load the image
filename='Sydney-Opera-House.jpg'
im = Image.open(filename)
# summarize some details about the image
print(im.format)
print(im.mode)
print(im.size)
# show the image
#image.show()
# load image as pixel array
data = image.imread(filename)
# summarize shape of the pixel array
print(data.dtype)
print(data.shape)
# display the array of pixels as an image
pyplot.imshow(data)
pyplot.show()

getting error in path line, . guide please

I am training my system for texture analysis, using local binary pattern. here I am training images. taken code from somewhere. I am getting the error in defining the path of images.
# OpenCV bindings
import cv2
# To performing path manipulations
import os
# Local Binary Pattern function
from skimage.feature import local_binary_pattern
# To calculate a normalized histogram
from scipy.stats import itemfreq
from sklearn.preprocessing import normalize
# Utility package -- use pip install cvutils to install
import cvutils
# To read class from file
import csv
#Store the path of training images in train_images
train_images = cvutils.imlist ("'C:\Users\Babar\MATLAB\isp\training
images\fire-image1.jpg',
'C:\Users\Babar\MATLAB\isp\training images\fire-image2.jpg',
'C:\Users\Babar\MATLAB\isp\training images\fire-image3.jpg'")
# Dictionary containing image paths as keys and corresponding label as
value
train_dic = {'fire-image1':0,'fire-image2':0,'fire-image3':0}
with open('C:\Users\Babar\MATLAB\isp\class_train.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
train_dic[row[0]] = int(row[1])

Why is the plot in librosa different?

I am currently trying using librosa to perform stfft, such that the parameter resembles a stfft process from a different framework (Kaldi).
The audio file is fash-b-an251
Kaldi does it using a sample frequency of 16 KHz, window_size = 400 (25ms), hop_length=160 (10ms).
The spectrogram extracted from this looks like this:
I then tried to do the same using librosa:
import numpy as np
import sys
import librosa
import os
import scipy
import matplotlib.pyplot as plt
from matplotlib import cm
# Input parameter
# relative_path_to_file
if len(sys.argv) < 1:
print "Missing Arguments!"
print "python spectogram_librosa.py path_to_audio_file"
sys.exit()
path = sys.argv[1]
abs_path = os.path.abspath(path)
spectogram_dnn = "/home/user/dnn/spectogram"
if not os.path.exists(spectogram_dnn):
print "spectogram_dnn folder didn't exist!"
os.makedirs(spectogram_dnn)
print "Created!"
y,sr = librosa.load(abs_path,sr=16000)
D = librosa.logamplitude(np.abs(librosa.core.stft(y, win_length=400, hop_length=160, window=scipy.signal.hanning,center=False)), ref_power=np.max)
librosa.display.specshow(D,sr=16000,hop_length=160, x_axis='time', y_axis='log', cmap=cm.jet)
plt.colorbar(format='%+2.0f dB')
plt.title('Log power spectrogram')
plt.show()
raw_input()
sys.exit()
Which is basically taken from here:
In which i've modified the stfft function such that it fits my parameters..
Problems is that is creates an entirely different plot..
So.. What am I doing wrong in librosa?.. Why is this plot so much different, from the one created in kaldi.
Am I missing something?
It has to do with the Hz scale. The one in the first image is linear while the one in the second image is logarithmic. You can fix it by either changing the scale in either of the images to match the other.

Resources