Python 3.6 OpenCV real-time capturing and processing

Python 3.6 OpenCV real-time capturing and processing - python-3.x

I am in the process of making a Real-Time feedback program using Python and OpenCV: a webcam will be observing a process and generating feedbacks based on what is happening.
Here is my code:
points = get_points_xml()
rect = cv2.boundingRect(np.array(points))
x, y, w, h = rect
cap = cv2.VideoCapture(0)
while (True):
ret, frame = cap.read()
cropped = frame[y: y + h, x: x + w]
ycb = cv2.cvtColor(cropped.copy(), cv2.COLOR_BGR2YCrCb)
y, cr, br = cv2.split(ycb)
blur = cv2.blur(y, (5, 5))
size = blur.size
ret, thresh = cv2.threshold(blur, 60, 255, cv2.THRESH_BINARY)
saturated, area_sat = get_saturated(thresh)
print(saturated, "pixels", area_sat, "cm2")
ret, thresh = cv2.threshold(blur, 80, 140, cv2.THRESH_BINARY_INV)
empty, area_empty = get_saturated(thresh)
print(empty, "pixels", area_empty, "cm2")
unsaturated = size - saturated - empty
area_unsat = unsaturated/cm2
print(unsaturated, "pixels", area_unsat, "cm2")
if (cv2.waitKey(1) & 0xFF == ord('q')):
break
cap.release()
cv2.destroyAllWindows()
Executing the line
cropped = frame[y: y + h, x: x + w]
results in the following error:
TypeError: only integer scalar arrays can be converted to a scalar index
This is weird, because in the 'first round' of the while loop my program is working fine, then in the 'second round' of the while loop it prompts me the above mentioned error.
What could be causing this?

You should provide any relevant information when you ask a question.
A possible answer to your question could be as follows :
Change this line in your code:
cropped = frame[y: y + h, x: x + w]
To
cropped = frame[int(round(y)):int(round(y + h)), int(round(x)):int(round(x + w))]
In this way, you can make sure the values you used to crop the frame are all integers.

Related

Python OpenCv2 place image over face found

I am loading several images will go over my face and I am having difficulty getting the image to go over the square for face created. I have looked at a many resources , but for some reason I am receiving an error when attempting to follow their method.
Every time I do so , I receive an error
ValueError: could not broadcast input array from shape (334,334,3) into shape (234,234,3)
I think the images might be too large, however I tried to resize them to see if they will fit to no avail.
here is my code:
import cv2
import sys
import logging as log
import datetime as dt
from time import sleep
import os
import random
from timeit import default_timer as timer
cascPath = "haarcascade_frontalface_default.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
#log.basicConfig(filename='webcam.log',level=log.INFO)
video_capture = cv2.VideoCapture(0)
anterior = 0
#s_img = cv2.imread("my.jpg")
increment = 0
for filename in os.listdir("Faces/"):
if filename.endswith(".png"):
FullFile = (os.path.join("Faces/", filename))
#ret, frame = video_capture.read()
frame = cv2.imread(FullFile, cv2.IMREAD_UNCHANGED)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale( gray,scaleFactor=1.1, minNeighbors=5, minSize=(30, 30) )
edges = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 9)
for (x, y, w, h) in faces:
roi_color = frame[y:( y ) + ( h ), x:x + w]
status = cv2.imwrite('export/faces_detected'+ str( increment ) +'.png', roi_color)
increment = increment + 1
else:
continue
masks = []
for filename in os.listdir("export/"):
if filename.endswith(".png"):
FullFile = (os.path.join("export/", filename))
s_img = cv2.imread(FullFile)
masks.append(s_img)
Start = timer()
End = timer()
MasksSize = len(masks)
nrand = random.randint(0, MasksSize -1 )
increment = 0
while True:
if not video_capture.isOpened():
print('Unable to load camera.')
sleep(5)
pass
# Capture frame-by-frame
ret, frame = video_capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30)
)
edges = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 9)
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
if (End - Start) > 3:
Start = timer()
End = timer()
nrand = random.randint(0, MasksSize -1 )
# -75 and +20 added to fit my face
cv2.rectangle(frame, (x, y - 75), (x+w, y+h+20), (0, 255, 0), 2)
s_img = masks[nrand]
increment = increment + 1
#maskresize = cv2.resize(s_img, (150, 150))
#frame[y:y+s_img.shape[0] , x:x+s_img.shape[1]] = s_img # problem occurs here with
# ValueError: could not broadcast input array from shape (334,334,3) into shape (234,234,3)
# I assume I am inserting somethign too big?
End = timer()
if anterior != len(faces):
anterior = len(faces)
#log.info("faces: "+str(len(faces))+" at "+str(dt.datetime.now()))
# Display the resulting frame
cv2.imshow('Video', frame)
#cv2.imshow('Video', cartoon)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Display the resulting frame
cv2.imshow('Video', frame)
# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()

In the following line,
frame[y:y+s_img.shape[0] , x:x+s_img.shape[1]] = s_img
you are trying to attempt to assign s_img to frame[y:y+s_img.shape[0] , x:x+s_img.shape[1]] which are of different shapes.
You can check the shapes of the two by printing the shape (it will be the same as the shapes mentioned in the error).
Try reshaping s_img to the same shape and then try to assign.
Refer to this link:https://www.geeksforgeeks.org/image-resizing-using-opencv-python/
I used this function to resize the image to scale.
def image_resize(image, width = None, height = None, inter = cv2.INTER_AREA):
# initialize the dimensions of the image to be resized and
# grab the image size
dim = None
(h, w) = image.shape[:2]
# if both the width and height are None, then return the
# original image
if width is None and height is None:
return image
# check to see if the width is None
if width is None:
# calculate the ratio of the height and construct the
# dimensions
r = height / float(h)
dim = (int(w * r), height)
# otherwise, the height is None
else:
# calculate the ratio of the width and construct the
# dimensions
r = width / float(w)
dim = (width, int(h * r))
# resize the image
resized = cv2.resize(image, dim, interpolation = inter)
# return the resized image
return resized
Then later on called
r= image_resize(s_img, height = h, width=w)
frame[y:y+r.shape[0] , x:x+r.shape[1]] = r
Answer taken from here too:
Resize an image without distortion OpenCV

Delay in output video stream when using YOLOV3 Detection using OpenCV

This is my Code of Mask Detection using YOLOV3 weights created by me. Whenever I run my Program, I experience a delay in my output Video of detection. This is the code please have a look.
import cv2
import numpy as np
net = cv2.dnn.readNet("yolov3_custom_final.weights", "yolov3_custom.cfg")
with open("obj.name", "r") as f:
classes = f.read().splitlines()
cap = cv2.VideoCapture(0 + cv2.CAP_DSHOW)
while True:
ret, img = cap.read()
height, weight, _ = img.shape
blob = cv2.dnn.blobFromImage(img, 1 / 255, (416, 416), (0, 0, 0), swapRB=True, crop=False)
net.setInput(blob)
output = net.getUnconnectedOutLayersNames()
layers = net.forward(output)
box = []
confidences = []
class_ids = []
for out in layers:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.3:
centre_x = int(detection[0] * weight)
centre_y = int(detection[1] * height)
w = int(detection[2] * weight)
h = int(detection[3] * height)
x = int(centre_x - w / 2)
y = int(centre_y - h / 2)
box.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = np.array(cv2.dnn.NMSBoxes(box, confidences, 0.5, 0.4))
font = cv2.FONT_HERSHEY_PLAIN
colors = np.random.uniform(0, 255, size=(len(box), 3))
for i in indexes.flatten():
x, y, w, h = box[i]
label = str(classes[class_ids[i]])
confidence = str(round(confidences[i], 2))
color = colors[i]
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
cv2.putText(img, label + "I" + confidence, (x, y + 20), font, 2, (255, 255, 255), 2)
cv2.imshow("Final", img)
if cv2.waitKey(1) & 0xff == ord("q"):
break
cap.release()
cv2.destroyAllWindows()
Can someone Please help me in this Issue or suggest a way to reduce the Lag in my Output videostream ?

As I have done some research over the Time, I have a found a Possible answer to this question. As I'm running my YOLO model in my local system which has no GPU, This is the factor that is causing a delay in the Output as it Processes a frame and takes another frame after completion.

What is the best way to extract text contained within a table in a pdf using python?

I'm constructing a program to extract text from a pdf, put it in a structured format, and send it off to a database. I have roughly 1,400 individual pdfs that all follow a similar format, but nuances in the verbiage and plan designs that the documents summarize make it tricky.
I've played around with a couple different pdf readers in python including tabula-py and pdfminer but none of them are quite getting to what I'd like to do. Tabula reads in all of the text very well, however it pulls everything as it explicitly lays horizontally, excluding the fact that some of the text is wrapped in a box. For example, if you open up the sample SBC I have attached where it reads "What is the overall deductible?" Tabula will read in "What is the overall $500/Individual or..." skipping the fact that the word "deductible" is really part of the first sentence. (Note the files I'm working with are pdfs but I've attached a jpeg because I couldn't figure out how to attach a pdf.)
import tabula
df = tabula.read_pdf(*filepath*, pandas_options={'header': None))
print(df.iloc[0][0])
print(df)
In the end, I'd really like to be able to parse out the text within each box so that I can better identify what values belong to deductible, out-of-pocket limts, copays/coinsurance, etc. I thought possibly some sort of OCR would allow me to recognize which parts of the PDF are contained in the blue rectangles and then pull the string from there, but I really don't know where to start with that.Sample SBC

#jpnadas In this case the code you copied from my answer in this post isn't really suitable because it addresses the case when a table doesn't have surrounding grid. That algorithm looks for repeating blocks of texts and tries to find a pattern that resembles a table heuristically.
But in this particular case the table does have the grid and by taking this advantage we can achieve a lot more accurate result.
The strategy is the following:
Increase image gamma to make the grid darker
Get rid of colour and apply Otsu thresholding
Find long vertical an horizontal lines in the image and create a mask from it using erode and dilate functions
Find the cell blocks in the mask using findContours function.
Find table objects
5.1 The rest can be as in the post about finding a table without the
grid: find table structure heuristically
5.2 Alternative approach could be using hierarchy returned by the findContours function. This approach is even more accurate and
allows to find multiple tables on a single image.
Having cell coordinates it's easy to extract certain cell image from the original image:
cell_image = image[cell_y:cell_y + cell_h, cell_x:cell_x + cell_w]
Apply OCR to each cell_image.
BUT! I consider the OpenCV approach as a last resort when you're not able to read the PDF's contents: for instance in case when a PDF contains raster image inside.
If it's a vector-based PDF and its contents are readable it makes more sense to find the table inside contents and just read the text from it instead of doing heavy 'OCR lifting'.
Here's the code for reference for more accurate table recognition:
import os
import imutils
import numpy as np
import argparse
import cv2
def gamma_correction(image, gamma = 1.0):
look_up_table = np.empty((1,256), np.uint8)
for i in range(256):
look_up_table[0,i] = np.clip(pow(i / 255.0, gamma) * 255.0, 0, 255)
result = cv2.LUT(image, look_up_table)
return result
def pre_process_image(image):
# Let's get rid of color first
# Applying gamma to make the table lines darker
gamma = gamma_correction(image, 2)
# Getting rid of color
gray = cv2.cvtColor(gamma, cv2.COLOR_BGR2GRAY)
# Then apply Otsu threshold to reveal important areas
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# inverting the thresholded image
return ~thresh
def get_horizontal_lines_mask(image, horizontal_size=100):
horizontal = image.copy()
horizontal_structure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontal_size, 1))
horizontal = cv2.erode(horizontal, horizontal_structure, anchor=(-1, -1), iterations=1)
horizontal = cv2.dilate(horizontal, horizontal_structure, anchor=(-1, -1), iterations=1)
return horizontal
def get_vertical_lines_mask(image, vertical_size=100):
vertical = image.copy()
vertical_structure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, vertical_size))
vertical = cv2.erode(vertical, vertical_structure, anchor=(-1, -1), iterations=1)
vertical = cv2.dilate(vertical, vertical_structure, anchor=(-1, -1), iterations=1)
return vertical
def make_lines_mask(preprocessed, min_horizontal_line_size=100, min_vertical_line_size=100):
hor = get_horizontal_lines_mask(preprocessed, min_horizontal_line_size)
ver = get_vertical_lines_mask(preprocessed, min_vertical_line_size)
mask = np.zeros((preprocessed.shape[0], preprocessed.shape[1], 1), dtype=np.uint8)
mask = cv2.bitwise_or(mask, hor)
mask = cv2.bitwise_or(mask, ver)
return ~mask
def find_cell_boxes(mask):
# Looking for the text spots contours
# OpenCV 3
# img, contours, hierarchy = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# OpenCV 4
contours = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = imutils.grab_contours(contours)
contours = sorted(contours, key=cv2.contourArea, reverse=True)
image_width = mask.shape[1]
# Getting the texts bounding boxes based on the text size assumptions
boxes = []
for contour in contours:
box = cv2.boundingRect(contour)
w = box[2]
# Excluding the page box shape but adding smaller boxes
if w < 0.95 * image_width:
boxes.append(box)
return boxes
def find_table_in_boxes(boxes, cell_threshold=10, min_columns=2):
rows = {}
cols = {}
# Clustering the bounding boxes by their positions
for box in boxes:
(x, y, w, h) = box
col_key = x // cell_threshold
row_key = y // cell_threshold
cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]
# Filtering out the clusters having less than 2 cols
table_cells = list(filter(lambda r: len(r) >= min_columns, rows.values()))
# Sorting the row cells by x coord
table_cells = [list(sorted(tb)) for tb in table_cells]
# Sorting rows by the y coord
table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))
return table_cells
def build_vertical_lines(table_cells):
if table_cells is None or len(table_cells) <= 0:
return [], []
max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]
max_last_row_height_box = max(table_cells[-1], key=lambda b: b[3])
max_y = max_last_row_height_box[1] + max_last_row_height_box[3]
hor_lines = []
ver_lines = []
for box in table_cells:
x = box[0][0]
y = box[0][1]
hor_lines.append((x, y, max_x, y))
for box in table_cells[0]:
x = box[0]
y = box[1]
ver_lines.append((x, y, x, max_y))
(x, y, w, h) = table_cells[0][-1]
ver_lines.append((max_x, y, max_x, max_y))
(x, y, w, h) = table_cells[0][0]
hor_lines.append((x, max_y, max_x, max_y))
return hor_lines, ver_lines
if __name__ == "__main__":
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to images directory")
args = vars(ap.parse_args())
in_file = args["image"]
filename_base = in_file.replace(os.path.splitext(in_file)[1], "")
img = cv2.imread(in_file)
pre_processed = pre_process_image(img)
# Visualizing pre-processed image
cv2.imwrite(filename_base + ".pre.png", pre_processed)
lines_mask = make_lines_mask(pre_processed, min_horizontal_line_size=1800, min_vertical_line_size=500)
# Visualizing table lines mask
cv2.imwrite(filename_base + ".mask.png", lines_mask)
cell_boxes = find_cell_boxes(lines_mask)
cells = find_table_in_boxes(cell_boxes)
# apply OCR to each cell rect here
# the cells array contains cell coordinates in tuples (x, y, w, h)
hor_lines, ver_lines = build_vertical_lines(cells)
# Visualize the table lines
vis = img.copy()
for line in hor_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
for line in ver_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imwrite(filename_base + ".result.png", vis)
Some parameters are hard-coded:
page size threshold - 0.95
min horizontal line size - 1800 px
min vertical line size - 500 px
You can provide them as configurable parameters or make them relative to image size.
Results:

I think that the best way to do what you need is to find and isolate the cells in the file and then apply OCR to each individual cell.
There are a number of solutions in SO for that, I got the code from this answer and played around a little with the parameters to get the output below (not perfect yet, but you can tweak it a little bit yourself).
import os
import cv2
import imutils
# This only works if there's only one table on a page
# Important parameters:
# - morph_size
# - min_text_height_limit
# - max_text_height_limit
# - cell_threshold
# - min_columns
def pre_process_image(img, save_in_file, morph_size=(23, 23)):
# get rid of the color
pre = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Otsu threshold
pre = cv2.threshold(pre, 250, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# dilate the text to make it solid spot
cpy = pre.copy()
struct = cv2.getStructuringElement(cv2.MORPH_RECT, morph_size)
cpy = cv2.dilate(~cpy, struct, anchor=(-1, -1), iterations=1)
pre = ~cpy
if save_in_file is not None:
cv2.imwrite(save_in_file, pre)
return pre
def find_text_boxes(pre, min_text_height_limit=20, max_text_height_limit=120):
# Looking for the text spots contours
contours, _ = cv2.findContours(pre, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Getting the texts bounding boxes based on the text size assumptions
boxes = []
for contour in contours:
box = cv2.boundingRect(contour)
h = box[3]
if min_text_height_limit < h < max_text_height_limit:
boxes.append(box)
return boxes
def find_table_in_boxes(boxes, cell_threshold=100, min_columns=3):
rows = {}
cols = {}
# Clustering the bounding boxes by their positions
for box in boxes:
(x, y, w, h) = box
col_key = x // cell_threshold
row_key = y // cell_threshold
cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]
# Filtering out the clusters having less than 2 cols
table_cells = list(filter(lambda r: len(r) >= min_columns, rows.values()))
# Sorting the row cells by x coord
table_cells = [list(sorted(tb)) for tb in table_cells]
# Sorting rows by the y coord
table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))
return table_cells
def build_lines(table_cells):
if table_cells is None or len(table_cells) <= 0:
return [], []
max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]
max_last_row_height_box = max(table_cells[-1], key=lambda b: b[3])
max_y = max_last_row_height_box[1] + max_last_row_height_box[3]
hor_lines = []
ver_lines = []
for box in table_cells:
x = box[0][0]
y = box[0][1]
hor_lines.append((x, y, max_x, y))
for box in table_cells[0]:
x = box[0]
y = box[1]
ver_lines.append((x, y, x, max_y))
(x, y, w, h) = table_cells[0][-1]
ver_lines.append((max_x, y, max_x, max_y))
(x, y, w, h) = table_cells[0][0]
hor_lines.append((x, max_y, max_x, max_y))
return hor_lines, ver_lines
if __name__ == "__main__":
in_file = os.path.join(".", "test.jpg")
pre_file = os.path.join(".", "pre.png")
out_file = os.path.join(".", "out.png")
img = cv2.imread(os.path.join(in_file))
pre_processed = pre_process_image(img, pre_file)
text_boxes = find_text_boxes(pre_processed)
cells = find_table_in_boxes(text_boxes)
hor_lines, ver_lines = build_lines(cells)
# Visualize the result
vis = img.copy()
# for box in text_boxes:
# (x, y, w, h) = box
# cv2.rectangle(vis, (x, y), (x + w - 2, y + h - 2), (0, 255, 0), 1)
for line in hor_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
for line in ver_lines:
[x1, y1, x2, y2] = line
cv2.line(vis, (x1, y1), (x2, y2), (0, 0, 255), 1)
cv2.imwrite(out_file, vis)

unable to extract white color text in tesseract

I am developing an app to extract text from Images using OpenCV and Tesseract. I stuck in a place when Images contain text in White color and in case of Images having mix of White and some other color.
I am easily able to extract text if it is not white color, but it dosen't work in case of Text is White. I am using Tesseract for extracting data from text.
Below is my code for image processing before passing to tesseract:
image = cv2.imread(imgPath)
filename = getFileName()
img2gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
#cv2.imwrite('python_scripts/temp/img2gray_' + filename, img2gray)
""" below removing shadow from image """
rgb_planes = cv2.split(img2gray)
result_planes = []
result_norm_planes = []
for plane in rgb_planes:
dilated_img = cv2.dilate(plane, np.ones((1, 1), np.uint8))
bg_img = cv2.medianBlur(dilated_img, 21)
diff_img = 255 - cv2.absdiff(plane, bg_img)
norm_img = diff_img
norm_img = cv2.normalize(diff_img, norm_img, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1)
result_planes.append(diff_img)
result_norm_planes.append(norm_img)
result = cv2.merge(result_planes)
result_norm = cv2.merge(result_norm_planes)
img2gray = result
origImage = image.copy()
_, new_img = cv2.threshold(img2gray, 80, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV
# to manipulate the orientation of dilution , large x means horizonatally dilating more, large y means vertically dilating more
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (13, 13))
dilated = cv2.dilate(new_img, kernel, iterations=9) # dilate , more the iteration more the dilation
cv2.imwrite('python_scripts/temp/dilated_' + filename, dilated)
_, contours, _ = cv2.findContours(dilated, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE) # get contours
(imgH, imgW) = img2gray.shape[:2]
#logger.info("imgH: " + str(imgH) + ", imgW: " + str(imgW))
ctr = 0
contoursSort = {}
for contour in contours:
# to sort contours from top to bottom
[x, y, w, h] = cv2.boundingRect(contour)
contoursSort[y] = contour
orderedContours = collections.OrderedDict(sorted(contoursSort.items()))
for key, contour in orderedContours.items():
# get rectangle bounding contour
(x, y, w, h) = cv2.boundingRect(contour)
# Don't plot small false positives that aren't text
if w < 35 and h < 35:
continue
length = x + w
breadth = y + h
area = length * breadth
cv2.rectangle(origImage, (x, y), (x + w, y + h), (255, 255, 0), 2)
# don't consider contour which is touching the border
if x != 0 and y != 0 and x != imgW and y != imgH:
#logger.info("x: " + str(x) + ", y: " + str(y) + ", length: " + str(length)
# + ", breadth: " + str(breadth) + ", area: " + str(area))
croppedImg = origImage[y:(y + h), x:(x + w)]
croppedGray = cv2.cvtColor(croppedImg, cv2.COLOR_BGR2GRAY)
ctr = ctr + 1
th3 = cv2.adaptiveThreshold(croppedGray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,75,10)
#th3_height, th3_width = th3.shape[:2]
#th3 = cv2.resize(th3,(2*th3_width, 2*th3_height), interpolation = cv2.INTER_CUBIC)
th3 = cv2.GaussianBlur(th3,(5,5),0)
tessImgPath = 'python_scripts/temp/th3_' + str(ctr) + "_" + filename
cv2.imwrite(tessImgPath, th3)
tessText = runTesseract(tessImgPath)
os.remove(tessImgPath)
logger.info("tess text: " + str(tessText))
Please guide me how can i process image through OpenCV so that both
white and other color text can be extracted in single shot.
One way which i am trying now is to get the portion of white color in image, if it is less than other colors then do bitwise_xor of image and then pass it to Tesseract.
Thanks in advance

I am also working on text-extraction where tesseract failed to extract text with multiple colors.
It is not issue with white color. I tried extraction on your images. Tesseract gives the exact output for second image as the text contains only single color.
But the result for first image is not so good as half of the text is in black and another is in white with gray outline. so try this to get full white and black image:
import cv2
import numpy as np
# read image
img = cv2.imread("XDJFq.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 250, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 11)
# write results to disk
cv2.imwrite("Full_white.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
So, It makes the gray outline of white text to black, after this extract text from it.
The final result is
Coming together is a
BEGINNING...
. ‘keeping together is
PROGRESS...
working together is
SUCCESS...

Count the number of faces using dlib's facial landmark detector using OpenCV

Basically the code I'm writing needs to add a string in front of values being calculated every frame
For example:
Face 1: (eye aspect ratio), (mouth aspect ratio)
Face 2: (eye // //), (// // //)
... and so on, depending on how many faces are found using dlib's predictor.
The String needed here is "Face1: " |||| "Face2: " ... the other values are already found
Here's the code:
# Start capturing WebCam
#cap = cv2.VideoCapture(0)
framecount = 0
while True:
ret, frame = cap.read()
framecount += 1
frame = imutils.resize(frame, width = 1500)
if ret:
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
rects = detector(gray, 0)
for rect in rects:
x = rect.left()
y = rect.top()
x1 = rect.right() - x
y1 = rect.bottom() - y
landmarks = np.matrix([[p.x, p.y] for p in predictor(frame, rect).parts()])
left_eye = landmarks[LEFT_EYE_POINTS]
right_eye = landmarks[RIGHT_EYE_POINTS]
inner_mouth = landmarks[MOUTH_INNER_POINTS]
outer_mouth = landmarks[MOUTH_OUTLINE_POINTS]
left_eye_hull = cv2.convexHull(left_eye)
right_eye_hull = cv2.convexHull(right_eye)
inner_mouth_hull = cv2.convexHull(inner_mouth)
outer_mouth_hull = cv2.convexHull(outer_mouth)
# drawing the contours on frame
cv2.drawContours(frame, [left_eye_hull], -1, (0, 255, 0), 1)
cv2.drawContours(frame, [right_eye_hull], -1, (0, 255, 0), 1)
# Eye aspect ratio and Mouth aspect ratio
ear_left = eye_aspect_ratio(left_eye)
ear_right = eye_aspect_ratio(right_eye)
MAR = mouth_aspect_ratio(inner_mouth)
## This part below:
print(framecount," {:.2f}".format(ear_left), " {:.2f}".format(MAR))
print("Number of faces: ", len(rects))
The first print function will print out the frame of video and display the Eye aspect ratio and Mouth aspect ratio. And if there is more than one face, it will repeat the frame number more than once but will have different values.
So, it does what I need it to do, however I want to add a name before each of these values.
Second print function displays the number of faces present in a video, in my case its number 3 for three faces.
Example of output:
Output example
Example of output with Enumerate:
Output example enum

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python 3.6 OpenCV real-time capturing and processing - python-3.x

Related

Python OpenCv2 place image over face found

Delay in output video stream when using YOLOV3 Detection using OpenCV

What is the best way to extract text contained within a table in a pdf using python?

unable to extract white color text in tesseract

Count the number of faces using dlib's facial landmark detector using OpenCV

Categories

Resources