Need Help in finding 2 seperate contours instead of a combined contour in MICR code - python-3.x

I an running OCR on bank cheques using pyimagesearch tutorial to detect micr code. The code used in the tutorial detects group contours & character contours from a reference image containing symbols.
In the tutorial when finding the contours for symbol below
the code uses an built-in python iterator to iterate over the contours (here 3 seperate contours) and combined to give a character for recognition purposes.
But in the cheque dataset that I use, I have the symbol with low resolution
The actual bottom of the cheque is :
which causes the iterator to consider the contour-2 & contour-3 as a single contour. Due to this the iterator iterates over the character following the above symbol (here '0') and prepares a incorrect template to match with the reference symbols. You can see the code below for better understanding.
I know here noise in the image is a factor, but is it possible to reduce the noise & also find the exact contour to detect the symbol?
I tried using noise reduction techniques like cv2.fastNlMeansDenoising & cv2.GaussianBlur before cv2.findContours step the contours 2&3 are detected as single contour instead of 2 seperate contours.
Also I tried altering the `cv2.findContours' parameters
Below is the working code where the characters are iterated for better understanding of python builtin iterator:
def extract_digits_and_symbols(image, charCnts, minW=5, minH=10):
# grab the internal Python iterator for the list of character
# contours, then initialize the character ROI and location
# lists, respectively
charIter = charCnts.__iter__()
rois = []
locs = []
# keep looping over the character contours until we reach the end
# of the list
while True:
try:
# grab the next character contour from the list, compute
# its bounding box, and initialize the ROI
c = next(charIter)
(cX, cY, cW, cH) = cv2.boundingRect(c)
roi = None
# check to see if the width and height are sufficiently
# large, indicating that we have found a digit
if cW >= minW and cH >= minH:
# extract the ROI
roi = image[cY:cY + cH, cX:cX + cW]
rois.append(roi)
cv2.imshow('roi',roi)
cv2.waitKey(0)
locs.append((cX, cY, cX + cW, cY + cH))
# otherwise, we are examining one of the special symbols
else:
# MICR symbols include three separate parts, so we
# need to grab the next two parts from our iterator,
# followed by initializing the bounding box
# coordinates for the symbol
parts = [c, next(charIter), next(charIter)]
(sXA, sYA, sXB, sYB) = (np.inf, np.inf, -np.inf,
-np.inf)
# loop over the parts
for p in parts:
# compute the bounding box for the part, then
# update our bookkeeping variables
# c = next(charIter)
# (cX, cY, cW, cH) = cv2.boundingRect(c)
# roi = image[cY:cY+cH, cX:cX+cW]
# cv2.imshow('symbol', roi)
# cv2.waitKey(0)
# roi = None
(pX, pY, pW, pH) = cv2.boundingRect(p)
sXA = min(sXA, pX)
sYA = min(sYA, pY)
sXB = max(sXB, pX + pW)
sYB = max(sYB, pY + pH)
# extract the ROI
roi = image[sYA:sYB, sXA:sXB]
cv2.imshow('symbol', roi)
cv2.waitKey(0)
rois.append(roi)
locs.append((sXA, sYA, sXB, sYB))
# we have reached the end of the iterator; gracefully break
# from the loop
except StopIteration:
break
# return a tuple of the ROIs and locations
return (rois, locs)
edit: contour 2 & 3 instead of contours 1 & 2

Try to find the right threshold value, instead of using cv2.THRESH_OTSU. It seems should be possible to find a suitable threshold from the provided example. If you can't find the threshold value that works for all images, you can try morphological closing on the threshold result with structuring element with 1-pixel width.
Edit (steps):
For threshold, you need to find appropriate value by hand, in your image threhsold value 100 seems to work:
i = cv.imread('image.png')
g = cv.cvtColor(i, cv.COLOR_BGR2GRAY)
_, tt = cv.threshold(g, 100, 255, cv.THRESH_BINARY_INV)
as for closing variant:
_, t = cv.threshold(g, 0,255,cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
kernel = np.ones((12,1), np.uint8)
c = cv.morphologyEx(t, cv.MORPH_OPEN, kernel)
Note that I used import cv2 as cv. I also used opening instead of closing since in the example they inverted colors during thresholding

Related

Optimizing the access and changing of value of SLIC superpixels

I'm trying to do semantic segmentation using a SLIC variant and want to create a mask for the original image where each segment is colored (according to its class) based on available point-based annotations. If there is no point-based annotation in that segment, then leave is as 0.
I currently have x, y points and their associated labels for an image and a (slow) method that finds and colors the desired segments. I'm familiar with vectorization or the 'pythonic' was of doing things, but I can't seem to speed up this last for-loop and would love some advice or references on optimization. Thanks.
# Point-based annotations
annotation = pd.read_csv("a_dataframe.csv") # [X, Y, Label]
color_label = {'class 1' : 25, 'class 2' : 50, 'class 3' : 75}
# Uses CPU to create single segmented image with current params
slic = SlicAvx2(num_components = n_segments, compactness = n_compactness)
segmented_image = slic.iterate(cv2.cvtColor(each_image, cv2.COLOR_RGB2LAB))
# Finds the segments of interest and records their ID
X = np.array(each_annotation.iloc[:, 0], dtype = 'uint8')
Y = np.array(each_annotation.iloc[:, 1], dtype = 'uint8')
L = np.array(each_annotation.iloc[:, 2], dtype = 'str') # Labels
DS = segmented_image[X, Y] # Desired Segments
# Empty mask, marks the segments of interest with the classes of the point in them
mask = np.zeros(each_image.shape[:2], dtype = "uint8")
# Would ideally like to find a more quickly way of doing this
for (index, segVal) in enumerate(DS):
mask[segmented_image == segVal] = color_label.get(L[index])
I have essentially what I would like to replace that loop with here:
[mask[segmented_image == s] for i, s in enumerate(DS)]
but I'm not able to assign X, Y locations with the appropriate Label in mask. I thought it would be something similar to this:
[mask[segmented_image == s] for i, s in enumerate(DS)] = color_label.get(L[i])
but it appears that I'm trying to assign a color value to the lists I'm generating...
Are you looking for ind2rgb?
A way to convert an indexed map (one index per slic segment, possible same index for multiple regions) and convert it to RGB image based on a map from index to color.

Looping program causes index # is out of bounds for axis #

I'm pretty new to python and Opencv, but I have a few pieces from cv2 and random in mind for a simple test program to make sure I understood how these libraries worked.
I'm trying to create a program that effectively generates colored "snow", similar to what an old fashioned television shows when it has no signal.
Basically I generate a random color with random.randint(-1,256) to get a value between 0 and 255. I do it three times and store each in a different variable, randB/G/R. Then I do it twice more for coordinates randX/Y, using img.shape to get variables for width and height for the max number.
I don't think my variables are being interpreted as strings. If I quickly break the loop and print my variables, no errors are shown. If I remove the randX and randY variables and specify fixed coordinates or a range of [X1:Y1, X2:Y2] it doesn't crash.
import cv2
import numpy as np
import random
img = cv2.imread('jake_twitch.png', cv2.IMREAD_COLOR)
height, width, channels = img.shape
while True:
randB = (random.randint(-1,256))
randG = (random.randint(-1,256))
randR = (random.randint(0,256))
randX = (random.randint(0,width))
randY = (random.randint(0,height))
img[randX,randY] = [randB,randG,randR]
cv2.imshow('Snow', img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.imwrite('Snow.png', img)
cv2.destroyAllWindows
I would expect my code to run indefinitely coloring pixels random colors within a specified "box" defined by the width and height variables from img.shape.
it seems to start doing that, but If the program runs for more than about a second it crashes and spits out this error
"IndexError: index 702 is out of bounds for axis 1 with size 702"
Your image is width and height pixels wide - but the corresponding indexes run from 0..width-1 and 0..height-1
The randint function returns inclusive limits - so
random.randint(0,width)
might give you width ... which is 1 too big:
random.randint(a, b)
Return a random integer N such that a <= N <= b. Alias for randrange(a, b+1).
Use
randX = (random.randint(0,width-1))
randY = (random.randint(0,height-1))
instead.
Or change it to use random.randrange(0, width) or random.choice(range(width)) - both omit the upper limit value.

How to make space for stitching multiple images in OpenCV - Python3 [duplicate]

I'm trying to stitch 2 images together by using template matching find 3 sets of points which I pass to cv2.getAffineTransform() get a warp matrix which I pass to cv2.warpAffine() into to align my images.
However when I join my images the majority of my affine'd image isn't shown. I've tried using different techniques to select points, changed the order or arguments etc. but I can only ever get a thin slither of the affine'd image to be shown.
Could somebody tell me whether my approach is a valid one and suggest where I might be making an error? Any guesses as to what could be causing the problem would be greatly appreciated. Thanks in advance.
This is the final result that I get. Here are the original images (1, 2) and the code that I use:
EDIT: Here's the results of the variable trans
array([[ 1.00768049e+00, -3.76690353e-17, -3.13824885e+00],
[ 4.84461775e-03, 1.30769231e+00, 9.61912797e+02]])
And here are the here the points passed to cv2.getAffineTransform: unified_pair1
array([[ 671., 1024.],
[ 15., 979.],
[ 15., 962.]], dtype=float32)
unified_pair2
array([[ 669., 45.],
[ 18., 13.],
[ 18., 0.]], dtype=float32)
import cv2
import numpy as np
def showimage(image, name="No name given"):
cv2.imshow(name, image)
cv2.waitKey(0)
cv2.destroyAllWindows()
return
image_a = cv2.imread('image_a.png')
image_b = cv2.imread('image_b.png')
def get_roi(image):
roi = cv2.selectROI(image) # spacebar to confirm selection
cv2.waitKey(0)
cv2.destroyAllWindows()
crop = image_a[int(roi[1]):int(roi[1]+roi[3]), int(roi[0]):int(roi[0]+roi[2])]
return crop
temp_1 = get_roi(image_a)
temp_2 = get_roi(image_a)
temp_3 = get_roi(image_a)
def find_template(template, search_image_a, search_image_b):
ccnorm_im_a = cv2.matchTemplate(search_image_a, template, cv2.TM_CCORR_NORMED)
template_loc_a = np.where(ccnorm_im_a == ccnorm_im_a.max())
ccnorm_im_b = cv2.matchTemplate(search_image_b, template, cv2.TM_CCORR_NORMED)
template_loc_b = np.where(ccnorm_im_b == ccnorm_im_b.max())
return template_loc_a, template_loc_b
coord_a1, coord_b1 = find_template(temp_1, image_a, image_b)
coord_a2, coord_b2 = find_template(temp_2, image_a, image_b)
coord_a3, coord_b3 = find_template(temp_3, image_a, image_b)
def unnest_list(coords_list):
coords_list = [a[0] for a in coords_list]
return coords_list
coord_a1 = unnest_list(coord_a1)
coord_b1 = unnest_list(coord_b1)
coord_a2 = unnest_list(coord_a2)
coord_b2 = unnest_list(coord_b2)
coord_a3 = unnest_list(coord_a3)
coord_b3 = unnest_list(coord_b3)
def unify_coords(coords1,coords2,coords3):
unified = []
unified.extend([coords1, coords2, coords3])
return unified
# Create a 2 lists containing 3 pairs of coordinates
unified_pair1 = unify_coords(coord_a1, coord_a2, coord_a3)
unified_pair2 = unify_coords(coord_b1, coord_b2, coord_b3)
# Convert elements of lists to numpy arrays with data type float32
unified_pair1 = np.asarray(unified_pair1, dtype=np.float32)
unified_pair2 = np.asarray(unified_pair2, dtype=np.float32)
# Get result of the affine transformation
trans = cv2.getAffineTransform(unified_pair1, unified_pair2)
# Apply the affine transformation to original image
result = cv2.warpAffine(image_a, trans, (image_a.shape[1] + image_b.shape[1], image_a.shape[0]))
result[0:image_b.shape[0], image_b.shape[1]:] = image_b
showimage(result)
cv2.imwrite('result.png', result)
Sources: Approach based on advice received here, this tutorial and this example from the docs.
July 12 Edit:
This post inspired GitHub repos providing functions to accomplish this task; one for a padded warpAffine() and another for a padded warpPerspective(). Check out the Python version or the C++ version.
Transformations shift the location of pixels
What any transformation does is takes your point coordinates (x, y) and maps them to new locations (x', y'):
s*x' h1 h2 h3 x
s*y' = h4 h5 h6 * y
s h7 h8 1 1
where s is some scaling factor. You must divide the new coordinates by the scale factor to get back the proper pixel locations (x', y'). Technically, this is only true of homographies---(3, 3) transformation matrices---you don't need to scale for affine transformations (you don't even need to use homogeneous coordinates...but it's better to keep this discussion general).
Then the actual pixel values are moved to those new locations, and the color values are interpolated to fit the new pixel grid. So during this process, these new locations get recorded at some point. We'll need those locations to see where the pixels actually move to, relative to the other image. Let's start with an easy example and see where points are mapped.
Suppose your transformation matrix simply shifts pixels to the left by ten pixels. Translation is handled by the last column; the first row is the translation in x and second row is the translation in y. So we would have an identity matrix, but with -10 in the first row, third column. Where would the pixel (0,0) be mapped? Hopefully, (-10,0) if logic makes any sense. And in fact, it does:
transf = np.array([[1.,0.,-10.],[0.,1.,0.],[0.,0.,1.]])
homg_pt = np.array([0,0,1])
new_homg_pt = transf.dot(homg_pt))
new_homg_pt /= new_homg_pt[2]
# new_homg_pt = [-10. 0. 1.]
Perfect! So we can figure out where all points map with a little linear algebra. We will need to get all the (x,y) points, and put them into a huge array so that every single point is in it's own column. Lets pretend our image is only 4x4.
h, w = src.shape[:2] # 4, 4
indY, indX = np.indices((h,w)) # similar to meshgrid/mgrid
lin_homg_pts = np.stack((indX.ravel(), indY.ravel(), np.ones(indY.size)))
These lin_homg_pts have every homogenous point now:
[[ 0. 1. 2. 3. 0. 1. 2. 3. 0. 1. 2. 3. 0. 1. 2. 3.]
[ 0. 0. 0. 0. 1. 1. 1. 1. 2. 2. 2. 2. 3. 3. 3. 3.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
Then we can do matrix multiplication to get the mapped value of every point. For simplicity, let's stick with the previous homography.
trans_lin_homg_pts = transf.dot(lin_homg_pts)
trans_lin_homg_pts /= trans_lin_homg_pts[2,:]
And now we have the transformed points:
[[-10. -9. -8. -7. -10. -9. -8. -7. -10. -9. -8. -7. -10. -9. -8. -7.]
[ 0. 0. 0. 0. 1. 1. 1. 1. 2. 2. 2. 2. 3. 3. 3. 3.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
As we can see, everything is working as expected: we have shifted the x-values only, by -10.
Pixels can be shifted outside of your image bounds
Notice that these pixel locations are negative---they're outside of the image bounds. If we do something a little more complex and rotate the image by 45 degrees, we'll get some pixel values way outside our original bounds. We don't care about every pixel value though, we just need to know how far the farthest pixels are that are outside the original image pixel locations, so that we can pad the original image that far out, before displaying the warped image on it.
theta = 45*np.pi/180
transf = np.array([
[ np.cos(theta),np.sin(theta),0],
[-np.sin(theta),np.cos(theta),0],
[0.,0.,1.]])
print(transf)
trans_lin_homg_pts = transf.dot(lin_homg_pts)
minX = np.min(trans_lin_homg_pts[0,:])
minY = np.min(trans_lin_homg_pts[1,:])
maxX = np.max(trans_lin_homg_pts[0,:])
maxY = np.max(trans_lin_homg_pts[1,:])
# minX: 0.0, minY: -2.12132034356, maxX: 4.24264068712, maxY: 2.12132034356,
So we see that we can get pixel locations well outside our original image, both in the negative and positive directions. The minimum x value doesn't change because when an homography applies a rotation, it does it from the top-left corner. Now one thing to note here is that I've applied the transformation to all pixels in the image. But this is really unnecessary, you can simply warp the four corner points and see where they land.
Padding the destination image
Note that when you call cv2.warpAffine() you have to input the destination size. These transformed pixel values reference that size. So if a pixel gets mapped to (-10,0), it won't show up in the destination image. That means that we'll have to make another homography with translations which shift all pixel locations be positive, and then we can pad the image matrix to compensate for our shift. We'll also have to pad the original image on the bottom and the right if the homography moves points to positions bigger than the image, too.
In the recent example, the min x value is the same, so we need no horizontal shift. However, the min y value has dropped by about two pixels, so we need to shift the image two pixels down. First, let's create the padded destination image.
pad_sz = list(src.shape) # in case three channel
pad_sz[0] = np.round(np.maximum(pad_sz[0], maxY) - np.minimum(0, minY)).astype(int)
pad_sz[1] = np.round(np.maximum(pad_sz[1], maxX) - np.minimum(0, minX)).astype(int)
dst_pad = np.zeros(pad_sz, dtype=np.uint8)
# pad_sz = [6, 4, 3]
As we can see, the height increased from the original by two pixels to account for that shift.
Add translation to the transformation to shift all pixel locations to positive
Now, we need to create a new homography matrix to translate the warped image by the same amount that we shifted by. And to apply both transformations---the original and this new shift---we have to compose the two homographies (for an affine transformation, you can simply add the translation, but not for an homography). Additionally we need to divide by the last entry to make sure the scales are still proper (again, only for homographies):
anchorX, anchorY = 0, 0
transl_transf = np.eye(3,3)
if minX < 0:
anchorX = np.round(-minX).astype(int)
transl_transf[0,2] -= anchorX
if minY < 0:
anchorY = np.round(-minY).astype(int)
transl_transf[1,2] -= anchorY
new_transf = transl_transf.dot(transf)
new_transf /= new_transf[2,2]
I also created here the anchor points for where we will place the destination image into the padded matrix; it's shifted by the same amount the homography will shift the image. So let's place the destination image inside the padded matrix:
dst_pad[anchorY:anchorY+dst_sz[0], anchorX:anchorX+dst_sz[1]] = dst
Warp with the new transformation into the padded image
All we have left to do is apply the new transformation to the source image (with the padded destination size), and then we can overlay the two images.
warped = cv2.warpPerspective(src, new_transf, (pad_sz[1],pad_sz[0]))
alpha = 0.3
beta = 1 - alpha
blended = cv2.addWeighted(warped, alpha, dst_pad, beta, 1.0)
Putting it all together
Let's create a function for this since we were creating quite a few variables we don't need at the end here. For inputs we need the source image, the destination image, and the original homography. And for outputs we simply want the padded destination image, and the warped image. Note that in the examples we used a 3x3 homography so we better make sure we send in 3x3 transforms instead of 2x3 affine or Euclidean warps. You can just add the row [0,0,1] to any affine warp at the bottom and you'll be fine.
def warpPerspectivePadded(img, dst, transf):
src_h, src_w = src.shape[:2]
lin_homg_pts = np.array([[0, src_w, src_w, 0], [0, 0, src_h, src_h], [1, 1, 1, 1]])
trans_lin_homg_pts = transf.dot(lin_homg_pts)
trans_lin_homg_pts /= trans_lin_homg_pts[2,:]
minX = np.min(trans_lin_homg_pts[0,:])
minY = np.min(trans_lin_homg_pts[1,:])
maxX = np.max(trans_lin_homg_pts[0,:])
maxY = np.max(trans_lin_homg_pts[1,:])
# calculate the needed padding and create a blank image to place dst within
dst_sz = list(dst.shape)
pad_sz = dst_sz.copy() # to get the same number of channels
pad_sz[0] = np.round(np.maximum(dst_sz[0], maxY) - np.minimum(0, minY)).astype(int)
pad_sz[1] = np.round(np.maximum(dst_sz[1], maxX) - np.minimum(0, minX)).astype(int)
dst_pad = np.zeros(pad_sz, dtype=np.uint8)
# add translation to the transformation matrix to shift to positive values
anchorX, anchorY = 0, 0
transl_transf = np.eye(3,3)
if minX < 0:
anchorX = np.round(-minX).astype(int)
transl_transf[0,2] += anchorX
if minY < 0:
anchorY = np.round(-minY).astype(int)
transl_transf[1,2] += anchorY
new_transf = transl_transf.dot(transf)
new_transf /= new_transf[2,2]
dst_pad[anchorY:anchorY+dst_sz[0], anchorX:anchorX+dst_sz[1]] = dst
warped = cv2.warpPerspective(src, new_transf, (pad_sz[1],pad_sz[0]))
return dst_pad, warped
Example of running the function
Finally, we can call this function with some real images and homographies and see how it pans out. I'll borrow the example from LearnOpenCV:
src = cv2.imread('book2.jpg')
pts_src = np.array([[141, 131], [480, 159], [493, 630],[64, 601]], dtype=np.float32)
dst = cv2.imread('book1.jpg')
pts_dst = np.array([[318, 256],[534, 372],[316, 670],[73, 473]], dtype=np.float32)
transf = cv2.getPerspectiveTransform(pts_src, pts_dst)
dst_pad, warped = warpPerspectivePadded(src, dst, transf)
alpha = 0.5
beta = 1 - alpha
blended = cv2.addWeighted(warped, alpha, dst_pad, beta, 1.0)
cv2.imshow("Blended Warped Image", blended)
cv2.waitKey(0)
And we end up with this padded warped image:
![[Padded and warped1]1
as opposed to the typical cut off warp you would normally get.

Use Pillow (PIL fork) for chroma key [duplicate]

I'm writing a script to chroma key (green screen) and composite some videos using Python and PIL (pillow). I can key the 720p images, but there's some left over green spill. Understandable but I'm writing a routine to remove that spill...however I'm struggling with how long it's taking. I can probably get better speeds using numpy tricks, but I'm not that familiar with it. Any ideas?
Here's my despill routine. It takes a PIL image and a sensitivity number but I've been leaving that at 1 so far...it's been working well. I'm coming in at just over 4 seconds for a 720p frame to remove this spill. For comparison, the chroma key routine runs in about 2 seconds per frame.
def despill(img, sensitivity=1):
"""
Blue limits green.
"""
start = time.time()
print '\t[*] Starting despill'
width, height = img.size
num_channels = len(img.getbands())
out = Image.new("RGBA", img.size, color=0)
for j in range(height):
for i in range(width):
#r,g,b,a = data[j,i]
r,g,b,a = img.getpixel((i,j))
if g > (b*sensitivity):
out_g = (b*sensitivity)
else:
out_g = g
# end if
out.putpixel((i,j), (r,out_g,b,a))
# end for
# end for
out.show()
print '\t[+] done.'
print '\t[!] Took: %0.1f seconds' % (time.time()-start)
exit()
return out
# end despill
Instead of putpixel, I tried to write the output pixel values to a numpy array then convert the array to a PIL image, but that was averaging just over 5 seconds...so this was faster somehow. I know putpixel isn't the snappiest option but I'm at a loss...
putpixel is slow, and loops like that are even slower, since they are run by the Python interpreter, which is slow as hell. The usual solution is to convert immediately the image to a numpy array and solve the problem with vectorized operations on it, which run in heavily optimized C code. In your case I would do something like:
arr = np.array(img)
g = arr[:,:,1]
bs = arr[:,:,2]*sensitivity
cond = g>bs
arr[:,:,1] = cond*bs + (~cond)*g
out = Image.fromarray(arr)
(it may not be correct and I'm sure it can be optimized way better, this is just a sketch)

Partitioning images based on their white space

I have lots of images of three objects with a white background separated by white space. For example,
Is it possible to split this image (and ones like it) into three images automatically? It would be great if this also worked from the command line.
As #ypnos said, you want to collapse the rows by summation, or averaging. That will leave you with a vector the width of the image. Next clip everything below a high threshold, remembering that high numbers correspond to high brightness. This will select the white space:
Then you simply cluster the remaining indices and select the middle two clusters (since the outer two belong to the bordering white space). In python this looks like so:
import sklearn.cluster, PIL.Image, numpy, sys, os.path
# import matplotlib.pyplot as plt
def split(fn, thresh=200):
img = PIL.Image.open(fn)
dat = numpy.array(img.convert(mode='L'))
h, w = dat.shape
dat = dat.mean(axis=0)
# plt.plot(dat*(dat>thresh);
path, fname = os.path.split(fn)
fname = os.path.basename(fn)
base, ext = os.path.splitext(fname)
guesses = numpy.matrix(numpy.linspace(0, len(dat), 4)).T
km = sklearn.cluster.KMeans(n_clusters=2, init=guesses)
km.fit(numpy.matrix(numpy.nonzero(dat>thresh)).T)
c1, c2 = map(int, km.cluster_centers_[[1,2]])
img.crop((0, 0, c1, h)).save(path + '/' + base + '_1' + ext)
img.crop((c1, 0, c2, h)).save(path + '/' + base + '_2' + ext)
img.crop((c2, 0, w, h)).save(path + '/' + base + '_3' + ext)
if __name__ == "__main__":
split(sys.argv[1], int(sys.argv[2]))
One shortcoming of this method is that it may stumble on images with bright objects (failing to properly identify the white space), or are not separated by a clean vertical line (e.g., overlapping in the composite). In such cases line detection, which is not constrained to vertical lines, would work better. I leave implementing that to someone else.
You need to sum-up over every column in the image and compare the sum with the theoretical sum of all pixels in that column being white (i.e., #lines times 255). Add all columns that match the criterion to a list of indices. In case there is not always a fully clean line between the objects (e.g. due to compression artifacts), you can set a lower threshold instead of the full-white sum.
Now go through your list of indices. Remove all adjacent indices that start at the first column. Also remove all adjacent indices that end at the far right of the image. Create groups of indices that are adjacent to each other. In each group count the number of indices and calculate the mean index.
Now take the two largest groups and take their mean is the index for where to crop.
You can do this in a rather small script in Python with OpenCV, or C++ OpenCV program.

Resources