hand-filled character per box form
I want to automate a process in which I would get hand-filled character per box type forms in image format and I need to extract text from these forms. The boxes surrounds each letter, I have to extract all the text from the image form.
You can use selecting contours by size, find rotated rectangle and inverse transform make.
import cv2
import numpy as np
img = cv2.imread('4YAry.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# convert to binary image
thresh=cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY_INV )[1]
contours,hierarchy = cv2.findContours(thresh, 1, 2)
for cnt in contours:
x , y , w , h = cv2 . boundingRect ( cnt )
if abs(w-345)<10: # width box is 345 px
rect = cv2.minAreaRect(cnt)
box = cv2.boxPoints(rect)
srcTri=np.array( [box[1], box[0], box[2]] ).astype(np.float32)
dstTri = np.array( [[0, 0], [0, rect[1][1]], [rect[1][0],0]] ).astype(np.float32)
warp_mat = cv2.getAffineTransform(srcTri, dstTri)
warp_dst = cv2.warpAffine(img, warp_mat, (np.int0(rect[1][0]), np.int0(rect[1][1])))
N=14
s=0.99*warp_dst.shape[1]/N # tune rectangle positions
for i in range(N):
warp_dst = cv2.rectangle ( warp_dst , ( 2+int(i*s) ,2 ), ( 2+int((i+1)*s) , warp_dst.shape[0]-3 ), ( 255 , 255 , 255 ), 2 )
cv2.imwrite('chars.png', warp_dst)
Using for instance Hough, detect the top and bottom edges and the vertical separations. Validate the separations by checking that they run from top to bottom. The horizontal lines will be more reliable and accurate, you can use their direction for deskewing if necessary.
After doing that, you will have missing separations and false ones. Using some heuristics, try to find the correct pitch and detect the false positives and false negatives. Now you can extract the content of the individual boxes, or erase the edges.
This process cannot be perfect, some characters will be damaged.
]From https://www.pyimagesearch.com/2018/07/19/opencv-tutorial-a-guide-to-learn-opencv/
I'm able to extract the contours and write as files.
For example I've a photo with some scribbled text : "in there".
I've been able to extract the letters as separate files but what I want is that these letter files should have same width and height. For example in case of "i" and "r" width will differ. In that case I want to append(any b/w pixels) to the right of "i" photo so it's width becomes same as that of "r"
How to do it in Python? Just increase the size of photo(not resize)
My code looks something like this:
# find contours (i.e., outlines) of the foreground objects in the
# thresholded image
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
output = image.copy()
ROI_number = 0
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
ROI = image[y:y+h, x:x+w]
file = 'ROI_{}.png'.format(ROI_number)
cv2.imwrite(file.format(ROI_number), ROI)
[][1
Here are a couple of other ways to do that using Python/OpenCV using cv2.copyMakeBorder() to extend the border to the right by 50 pixels. The first way simply extends the border by replication. The second extends it with the mean (average) blue background color using a mask to get only the blue pixels.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('i.png')
# get mask of background pixels (for result2b only)
lowcolor = (232,221,163)
highcolor = (252,241,183)
mask = cv2.inRange(img, lowcolor, highcolor)
# get average color of background using mask on img (for result2b only)
mean = cv2.mean(img, mask)[0:3]
color = (mean[0],mean[1],mean[2])
# extend image to the right by 50 pixels
result = img.copy()
result2a = cv2.copyMakeBorder(result, 0,0,0,50, cv2.BORDER_REPLICATE)
result2b = cv2.copyMakeBorder(result, 0,0,0,50, cv2.BORDER_CONSTANT, value=color)
# view result
cv2.imshow("img", img)
cv2.imshow("mask", mask)
cv2.imshow("result2a", result2a)
cv2.imshow("result2b", result2b)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save result
cv2.imwrite("i_extended2a.jpg", result2a)
cv2.imwrite("i_extended2b.jpg", result2b)
Replicated Result:
Average Background Color Result:
In Python/OpenCV/Numpy you create a new image of the size and background color you want. Then you use numpy slicing to insert the old image into the new one. For example:
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('i.png')
ht, wd, cc= img.shape
# create new image of desired size (extended by 50 pixels in width) and desired color
ww = wd+50
hh = ht
color = (242,231,173)
result = np.full((hh,ww,cc), color, dtype=np.uint8)
# copy img image into image at offsets yy=0,xx=0
yy=0
xx=0
result[yy:yy+ht, xx:xx+wd] = img
# view result
cv2.imshow("result", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
# save result
cv2.imwrite("i_extended.jpg", result)
I just started learning tensorflow and am working on the basic classification tutorial in their official page.
Basic Classification Tutorial
From the below piece of code
def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array[i], true_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
The below are the sample results of the test data.
Case 1:
Case 2
Even though the system predicted it with 100%, then why did it displayed the results with the red color?
In the predicted label, it is not showing any other classes.
There is a bug in the code.
"i" will for predictions_array, take the first array element(which is an array). This part is ok, but it is always going to be index 0 you need for the prediction array. Two ways to fix this: either pass it as "predictions_array[0]" when calling like I did below. Or modify the function to include "predictions_array = predictions_array[0]"
Since "i" must be 0 for the predictions array it will in the original code always check test_labels[0]. This will then for all cases when you predict something else than 9 give a red (since it thinks its a wrong prediction). Therefore passing i as the index of the test image will give you the correct label.
Suggestion of a modified function:
def plot_value_array(i, predictions_array, true_label):
print(true_label)
true_label = true_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
And modified calling where "1" is the image I am in this case testing (make this a variable so you dont have to type it twice when testing).
In other words: in case img = test_images[1] I have to pass 1 to the function.
plot_value_array(1, predictions_single[0], test_labels)
plt.xticks(range(10), class_names, rotation=45)
plt.show()
Can any one please help me with this. When i use this function it works for red and yellow signal images but not for green. Green signal images are all black. Any idea what’s wrong. But surprisingly if i make it BGR2HSV it shows green signal images but other 2 are black. I’m using Matplotlib to import images so i guess it’s RGB by default.
def mask(rgb_image) :
hsv_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2HSV)
## mask of red color range 1
red_mask1 = cv2.inRange(hsv_image, (0,20,0), (10,255,255))
## mask of red color range 2
red_mask2 = cv2.inRange(hsv_image, (170,20,0), (180,255,255))
## mask of green
green_mask = cv2.inRange(hsv_image, (40,0,0), (80,255,255))
## mask of yellow
yellow_mask = cv2.inRange(hsv_image, (10,30,100), (30,255,255))
## final mask
mask1 = cv2.bitwise_or(red_mask1, red_mask2)
mask2 = cv2.bitwise_or(mask1, yellow_mask)
mask3 = cv2.bitwise_or(mask2, green_mask)
target = cv2.bitwise_and(rgb_image,rgb_image, mask=mask3)
plt.imshow(target)
Code used to read image:
def load_dataset(image_dir):
im_list = []
image_types = ["red", "yellow", "green"]
for im_type in image_types:
for file in glob.glob(os.path.join(image_dir, im_type, "*")):
im = mpimg.imread(file)
if not im is None:
im_list.append((im, im_type))
return im_list
It worked: My range for green was incorrect. It should be Lower: (80,20,20), Upper:(170,255,255).
There is an example here for how to create a multi-colored text title.
However, I want to apply this to a plot that already has a figure in it.
For example, if I apply it to this (same code as with the example minus a few extras and with another figure)...:
plt.rcdefaults()
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import transforms
fig = plt.figure(figsize=(4,3), dpi=300)
def rainbow_text(x,y,ls,lc,**kw):
t = plt.gca().transData
fig = plt.gcf()
plt.show()
#horizontal version
for s,c in zip(ls,lc):
text = plt.text(x,y," "+s+" ",color=c, transform=t, **kw)
text.draw(fig.canvas.get_renderer())
ex = text.get_window_extent()
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
plt.figure()
rainbow_text(0.5,0.5,"all unicorns poop rainbows ! ! !".split(),
['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black'],
size=40)
...the result is 2 plots with the title enlarged.
This sort of makes sense to me because I'm using plt. two times.
But how do I integrate it so that it only refers to the first instance of plt. in creating the title?
Also, about this line:
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
I notice it can alter the spacing between words, but when I play with the values of x, results are not predictable (spacing is inconsistent between words).
How can I meaningfully adjust that value?
And finally, where it says "units='dots'", what are the other options? Are 'dots' 1/72nd of an inch (and is that the default for Matplotlib?)?
How can I convert units from dots to inches?
Thanks in advance!
In fact the bounding box of the text comes in units unlike the ones used, for example, in scatterplot. Text is a different kind of object that gets somehow redraw if you resize the window or change the ratio. By having a stabilized window you can ask the coordinates of the bounding box in plot units and build your colored text that way:
a = "all unicorns poop rainbows ! ! !".split()
c = ['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black']
f = plt.figure(figsize=(4,3), dpi=120)
ax = f.add_subplot(111)
r = f.canvas.get_renderer()
space = 0.1
w = 0.5
counter = 0
for i in a:
t = ax.text(w, 1.2, a[counter],color=c[counter],fontsize=12,ha='left')
transf = ax.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
w = w + bb.xmax-bb.xmin + space
counter = counter + 1
plt.ylim(0.5,2.5)
plt.xlim(0.6,1.6)
plt.show()
, which results in:
This, however, is still not ideal since you need to keep controlling the size of your plot axis to obtain the correct spaces between words. This is somewhat arbitrary but if you manage to do your program with such a control it's feasible to use plot units to achieve your intended purpose.
ORIGINAL POST:
plt. is just the call to the library. In truth you are creating an instance of plt.figure in the global scope (so it can be seen in locally in the function). Due to this you are overwriting the figure because you use the same name for the variable (so it's just one single instance in the end). To solve this try controlling the names of your figure instances. For example:
import matplotlib.pyplot as plt
#%matplotlib inline
from matplotlib import transforms
fig = plt.figure(figsize=(4,3), dpi=300)
#plt.show(fig)
def rainbow_text(x,y,ls,lc,**kw):
t = plt.gca().transData
figlocal = plt.gcf()
#horizontal version
for s,c in zip(ls,lc):
text = plt.text(x,y," "+s+" ",color=c, transform=t, **kw)
text.draw(figlocal.canvas.get_renderer())
ex = text.get_window_extent()
t = transforms.offset_copy(text._transform, x=ex.width, units='dots')
plt.show(figlocal) #plt.show((figlocal,fig))
#plt.figure()
rainbow_text(0.5,0.5,"all unicorns poop rainbows ! ! !".split(),
['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black'],
size=40,)
I've commented several instructions but notice I give a different name for the figure local to the function (figlocal). Also notice that in my examples of show I control directly which figure should be shown.
As for your other questions notice you can use other units as can be seen in the function documentation:
Return a new transform with an added offset.
args:
trans is any transform
kwargs:
fig is the current figure; it can be None if units are 'dots'
x, y give the offset
units is 'inches', 'points' or 'dots'
EDIT: Apparently there's some kind of problem with the extents of the bounding box for text that does not give the correct width of the word and thus the space between words is not stable. My advise is to use the latex functionality of Matplotlib to write the colors in the same string (so only one call of plt.text). You can do it like this:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('pgf')
from matplotlib import rc
rc('text',usetex=True)
rc('text.latex', preamble=r'\usepackage{color}')
a = "all unicorns poop rainbows ! ! !".split()
c = ['red', 'orange', 'brown', 'green', 'blue', 'purple', 'black']
st = ''
for i in range(len(a)):
st = st + r'\textcolor{'+c[i]+'}{'+a[i]+'}'
plt.text(0.5,0.5,st)
plt.show()
This however is not an ideal solution. The reason is that you need to have Latex installed, including the necessary packages (notice I'm using the color package). Take a look at Yann answer in this question: Partial coloring of text in matplotlib
#armatita: I think your answer actually does what I need. I thought I needed display coordinates instead, but it looks like I can just use axis 1 coordinates, if that's what this is (I'm planning on using multiple axes via subplot2grid). Here's an example:
import matplotlib.pyplot as plt
%matplotlib inline
dpi=300
f_width=4
f_height=3
f = plt.figure(figsize=(f_width,f_height), dpi=dpi)
ax1 = plt.subplot2grid((100,115), (0,0), rowspan=95, colspan=25)
ax2 = plt.subplot2grid((100,115), (0,30), rowspan=95, colspan=20)
ax3 = plt.subplot2grid((100,115), (0,55), rowspan=95, colspan=35)
ax4 = plt.subplot2grid((100,115), (0,95), rowspan=95, colspan=20)
r = f.canvas.get_renderer()
t = ax1.text(.5, 1.1, 'a lot of text here',fontsize=12,ha='left')
space=0.1
w=.5
transf = ax1.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
e = ax1.text(.5+bb.width+space, 1.1, 'text',fontsize=12,ha='left')
print(bb)
plt.show()
I'm not sure what you mean about controlling the axis size, though. Are you referring to using the code in different environments or exporting the image in different sizes? I plan on having the image used in the same environment and in the same size (per instance of using this approach), so I think it will be okay. Does my logic make sense? I have a weak grasp on what's really going on, so I hope so. I would use it with a function (via splitting the text) like you did, but there are cases where I need to split on other characters (i.e. when a word in parentheses should be colored, but not the parentheses). Maybe I can just put a delimiter in there like ','? I think I need a different form of .split() because it didn't work when I tried it.
At any rate, if I can implement this across all of my charts, it will save me countless hours. Thank you so much!
Here is an example where there are 2 plots and 2 instances of using the function for posterity:
import matplotlib.pyplot as plt
%matplotlib inline
dpi=300
f_width=4
f_height=3
f = plt.figure(figsize=(f_width,f_height), dpi=dpi)
ax1 = plt.subplot2grid((100,60), (0,0), rowspan=95, colspan=30)
ax2 = plt.subplot2grid((100,60), (0,30), rowspan=95, colspan=30)
f=f #Name for figure
string = str("Group 1 ,vs. ,Group 2 (,sub1,) and (,sub2,)").split(',')
color = ['black','red','black','green','black','blue','black']
xpos = .5
ypos = 1.2
axis=ax1
#No need to include space if incuded between delimiters above
#space = 0.1
def colortext(f,string,color,xpos,ypos,axis):
#f=figure object name (i.e. fig, f, figure)
r = f.canvas.get_renderer()
counter = 0
for i in string:
t = axis.text(xpos, ypos, string[counter],color=color[counter],fontsize=12,ha='left')
transf = axis.transData.inverted()
bb = t.get_window_extent(renderer=f.canvas.renderer)
bb = bb.transformed(transf)
xpos = xpos + bb.xmax-bb.xmin
counter = counter + 1
colortext(f,string,color,xpos,ypos,axis)
string2 = str("Group 1 part 2 ,vs. ,Group 2 (,sub1,) and (,sub2,)").split(',')
ypos2=1.1
colortext(f,string2,color,xpos,ypos2,axis)
plt.show()