Context and examples of symptoms
I am using a neural network to do super-resolution (increase the resolution of images). However, since an image can be big, I need to segment it in multiple smaller images and make predictions on each one of those separately before merging the result back together.
Here are examples of what this gives me:
Example 1: you can see a subtle vertical line passing through the shoulder of the skier in the output picture.
Example 2: once you start seeing them, you'll notice that the subtle lines are forming squares throughout the whole image (remnants of the way I segmented the image for individual predictions).
Example 3: you can clearly see the vertical line crossing the lake.
Source of the problem
Basically, my network makes poor predictions along the edges, which I believe is normal since there is less "surrounding" information.
Source code
import numpy as np
import matplotlib.pyplot as plt
import skimage.io
from keras.models import load_model
from constants import verbosity, save_dir, overlap, \
model_name, tests_path, input_width, input_height
from utils import float_im
def predict(args):
model = load_model(save_dir + '/' + args.model)
image = skimage.io.imread(tests_path + args.image)[:, :, :3] # removing possible extra channels (Alpha)
print("Image shape:", image.shape)
predictions = []
images = []
crops = seq_crop(image) # crops into multiple sub-parts the image based on 'input_' constants
for i in range(len(crops)): # amount of vertical crops
for j in range(len(crops[0])): # amount of horizontal crops
current_image = crops[i][j]
images.append(current_image)
print("Moving on to predictions. Amount:", len(images))
for p in range(len(images)):
if p%3 == 0 and verbosity == 2:
print("--prediction #", p)
# Hack because GPU can only handle one image at a time
input_img = (np.expand_dims(images[p], 0)) # Add the image to a batch where it's the only member
predictions.append(model.predict(input_img)[0]) # returns a list of lists, one for each image in the batch
return predictions, image, crops
def show_pred_output(input, pred):
plt.figure(figsize=(20, 20))
plt.suptitle("Results")
plt.subplot(1, 2, 1)
plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.subplot(1, 2, 2)
plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.show()
# adapted from https://stackoverflow.com/a/52463034/9768291
def seq_crop(img):
"""
To crop the whole image in a list of sub-images of the same size.
Size comes from "input_" variables in the 'constants' (Evaluation).
Padding with 0 the Bottom and Right image.
:param img: input image
:return: list of sub-images with defined size
"""
width_shape = ceildiv(img.shape[1], input_width)
height_shape = ceildiv(img.shape[0], input_height)
sub_images = [] # will contain all the cropped sub-parts of the image
for j in range(height_shape):
horizontal = []
for i in range(width_shape):
horizontal.append(crop_precise(img, i*input_width, j*input_height, input_width, input_height))
sub_images.append(horizontal)
return sub_images
def crop_precise(img, coord_x, coord_y, width_length, height_length):
"""
To crop a precise portion of an image.
When trying to crop outside of the boundaries, the input to padded with zeros.
:param img: image to crop
:param coord_x: width coordinate (top left point)
:param coord_y: height coordinate (top left point)
:param width_length: width of the cropped portion starting from coord_x
:param height_length: height of the cropped portion starting from coord_y
:return: the cropped part of the image
"""
tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]
return float_im(tmp_img) # From [0,255] to [0.,1.]
# from https://stackoverflow.com/a/17511341/9768291
def ceildiv(a, b):
return -(-a // b)
# adapted from https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):
# unflatten predictions
def nest(data, template):
data = iter(data)
return [[next(data) for _ in row] for row in template]
if len(crops) != 0:
predictions = nest(predictions, crops)
H = np.cumsum([x[0].shape[0] for x in predictions])
W = np.cumsum([x.shape[1] for x in predictions[0]])
D = predictions[0][0]
recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
for d, s in zip(np.split(rd, W[:-1], 1), rs):
d[...] = s
return recon
if __name__ == '__main__':
print(" - ", args)
preds, original, crops = predict(args) # returns the predictions along with the original
enhanced = reconstruct(preds, crops) # reconstructs the enhanced image from predictions
plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)
show_pred_output(original, enhanced)
The question (what I want)
There are many obvious naive approaches to solving this problem, but I'm convinced there must be a very concise way of doing it: how do I add an overlap_amount variable which would allow me to make overlapped predictions, thus discarding the "edge parts" of each sub-image ("segments") and replacing it with the result of the predictions on the segments surrounding it (since they would not contain "edge-predictions")?
I, of course, want to minimize the amount of "useless" predictions (pixels to be discarded). It might also be worth noting that the input segments produce an output segment which is 4 times bigger (i.e. if it was a 20x20 pixels image, you now get a 80x80 pixels image as output).
I solved a similiar problem by moving inference into the CPU. It was much, much slower but at least in my case solved the patch border problems better than overlapping ROI voting- or discarding based approaches I also tested.
Assuming you are using the Tensorflow backend:
from tensorflow.python import device
with device('cpu:0')
prediction = model.predict(...)
Of course assuming that you have enough RAM to fit your model. Comment below if that is not the case and I'll check out if there's something in my code that could be used here.
Solved it through a naive approach. It could be much better, but at least this works.
The process
Basically, it takes the initial image, then adds a padding around it, then crops it in multiple sub-images which are all lined up into an array. The crops are done so that all images overlap their surrounding neighbours as well.
Then, each image is fed into the network and the predictions are collected (4x on the resolution of the image, basically, in this case). When reconstructing the image, each prediction is taken individually and it's edge is cropped out (since it contains errors). The cropping is done so that the gluing of all the predictions ends up with no overlap, and only the middle parts of the predictions coming from the neural network are stuck together.
Finally, the surrounding padding is removed.
Result
No more line! :D
Code
import numpy as np
import matplotlib.pyplot as plt
import skimage.io
from keras.models import load_model
from constants import verbosity, save_dir, overlap, \
model_name, tests_path, input_width, input_height, scale_fact
from utils import float_im
def predict(args):
"""
Super-resolution on the input image using the model.
:param args:
:return:
'predictions' contains an array of every single cropped sub-image once enhanced (the outputs of the model).
'image' is the original image, untouched.
'crops' is the array of every single cropped sub-image that will be used as input to the model.
"""
model = load_model(save_dir + '/' + args.model)
image = skimage.io.imread(tests_path + args.image)[:, :, :3] # removing possible extra channels (Alpha)
print("Image shape:", image.shape)
predictions = []
images = []
# Padding and cropping the image
overlap_pad = (overlap, overlap) # padding tuple
pad_width = (overlap_pad, overlap_pad, (0, 0)) # assumes color channel as last
padded_image = np.pad(image, pad_width, 'constant') # padding the border
crops = seq_crop(padded_image) # crops into multiple sub-parts the image based on 'input_' constants
# Arranging the divided image into a single-dimension array of sub-images
for i in range(len(crops)): # amount of vertical crops
for j in range(len(crops[0])): # amount of horizontal crops
current_image = crops[i][j]
images.append(current_image)
print("Moving on to predictions. Amount:", len(images))
upscaled_overlap = overlap * 2
for p in range(len(images)):
if p % 3 == 0 and verbosity == 2:
print("--prediction #", p)
# Hack due to some GPUs that can only handle one image at a time
input_img = (np.expand_dims(images[p], 0)) # Add the image to a batch where it's the only member
pred = model.predict(input_img)[0] # returns a list of lists, one for each image in the batch
# Cropping the useless parts of the overlapped predictions (to prevent the repeated erroneous edge-prediction)
pred = pred[upscaled_overlap:pred.shape[0]-upscaled_overlap, upscaled_overlap:pred.shape[1]-upscaled_overlap]
predictions.append(pred)
return predictions, image, crops
def show_pred_output(input, pred):
plt.figure(figsize=(20, 20))
plt.suptitle("Results")
plt.subplot(1, 2, 1)
plt.title("Input : " + str(input.shape[1]) + "x" + str(input.shape[0]))
plt.imshow(input, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.subplot(1, 2, 2)
plt.title("Output : " + str(pred.shape[1]) + "x" + str(pred.shape[0]))
plt.imshow(pred, cmap=plt.cm.binary).axes.get_xaxis().set_visible(False)
plt.show()
# adapted from https://stackoverflow.com/a/52463034/9768291
def seq_crop(img):
"""
To crop the whole image in a list of sub-images of the same size.
Size comes from "input_" variables in the 'constants' (Evaluation).
Padding with 0 the Bottom and Right image.
:param img: input image
:return: list of sub-images with defined size (as per 'constants')
"""
sub_images = [] # will contain all the cropped sub-parts of the image
j, shifted_height = 0, 0
while shifted_height < (img.shape[0] - input_height):
horizontal = []
shifted_height = j * (input_height - overlap)
i, shifted_width = 0, 0
while shifted_width < (img.shape[1] - input_width):
shifted_width = i * (input_width - overlap)
horizontal.append(crop_precise(img,
shifted_width,
shifted_height,
input_width,
input_height))
i += 1
sub_images.append(horizontal)
j += 1
return sub_images
def crop_precise(img, coord_x, coord_y, width_length, height_length):
"""
To crop a precise portion of an image.
When trying to crop outside of the boundaries, the input to padded with zeros.
:param img: image to crop
:param coord_x: width coordinate (top left point)
:param coord_y: height coordinate (top left point)
:param width_length: width of the cropped portion starting from coord_x (toward right)
:param height_length: height of the cropped portion starting from coord_y (toward bottom)
:return: the cropped part of the image
"""
tmp_img = img[coord_y:coord_y + height_length, coord_x:coord_x + width_length]
return float_im(tmp_img) # From [0,255] to [0.,1.]
# adapted from https://stackoverflow.com/a/52733370/9768291
def reconstruct(predictions, crops):
"""
Used to reconstruct a whole image from an array of mini-predictions.
The image had to be split in sub-images because the GPU's memory
couldn't handle the prediction on a whole image.
:param predictions: an array of upsampled images, from left to right, top to bottom.
:param crops: 2D array of the cropped images
:return: the reconstructed image as a whole
"""
# unflatten predictions
def nest(data, template):
data = iter(data)
return [[next(data) for _ in row] for row in template]
if len(crops) != 0:
predictions = nest(predictions, crops)
# At this point "predictions" is a 3D image of the individual outputs
H = np.cumsum([x[0].shape[0] for x in predictions])
W = np.cumsum([x.shape[1] for x in predictions[0]])
D = predictions[0][0]
recon = np.empty((H[-1], W[-1], D.shape[2]), D.dtype)
for rd, rs in zip(np.split(recon, H[:-1], 0), predictions):
for d, s in zip(np.split(rd, W[:-1], 1), rs):
d[...] = s
# Removing the pad from the reconstruction
tmp_overlap = overlap * (scale_fact - 1) # using "-2" leaves the outer edge-prediction error
return recon[tmp_overlap:recon.shape[0]-tmp_overlap, tmp_overlap:recon.shape[1]-tmp_overlap]
if __name__ == '__main__':
print(" - ", args)
preds, original, crops = predict(args) # returns the predictions along with the original
enhanced = reconstruct(preds, crops) # reconstructs the enhanced image from predictions
# Save and display the result
plt.imsave('output/' + args.save, enhanced, cmap=plt.cm.gray)
show_pred_output(original, enhanced)
Constants and extra bits
verbosity = 2
input_width = 64
input_height = 64
overlap = 16
scale_fact = 4
def float_im(img):
return np.divide(img, 255.)
Alternative
A possibly better alternative which you might want to consider if you run into the same kind of problem as me; it's the same basic idea, but more polished and perfected.
Related
I am new to OpenCV and trying to see if I can find a way to detect vertical text for the image attached.
In this case on row 3 , I would like to get the bounding box around Original Cost and the amount below ($200,000.00).
Similarly I would like to get the bounding box around Amount Existing Liens and the associated amount below. I then would use this data to send to an OCR engine to read text. Traditional OCR engines go line by line and extract and loses the context.
Here is what I have tried so far -
import cv2
import numpy as np
img = cv2.imread('Test3.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,100,100,apertureSize = 3)
cv2.imshow('edges',edges)
cv2.waitKey(0)
minLineLength = 20
maxLineGap = 10
lines = cv2.HoughLinesP(edges,1,np.pi/180,15,minLineLength=minLineLength,maxLineGap=maxLineGap)
for x in range(0, len(lines)):
for x1,y1,x2,y2 in lines[x]:
cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2)
cv2.imshow('hough',img)
cv2.waitKey(0)
Here is my solution based on Kanan Vyas and Adrian Rosenbrock
It's probably not as "canonical" as you'd wish.
But it seems to work (more or less...) with the image you provided.
Just a word of CAUTION: The code looks within the directory from which it is running, for a folder named "Cropped" where cropped images will be stored. So, don't run it in a directory which already contains a folder named "Cropped" because it deletes everything in this folder at each run. Understood? If you're unsure run it in a separate folder.
The code:
# Import required packages
import cv2
import numpy as np
import pathlib
###################################################################################################################################
# https://www.pyimagesearch.com/2015/04/20/sorting-contours-using-python-and-opencv/
###################################################################################################################################
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0
# handle if we need to sort in reverse
if method == "right-to-left" or method == "bottom-to-top":
reverse = True
# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i = 1
# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b:b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)
###################################################################################################################################
# https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26 (with a few modifications)
###################################################################################################################################
def box_extraction(img_for_box_extraction_path, cropped_dir_path):
img = cv2.imread(img_for_box_extraction_path, 0) # Read the image
(thresh, img_bin) = cv2.threshold(img, 128, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU) # Thresholding the image
img_bin = 255-img_bin # Invert the imagecv2.imwrite("Image_bin.jpg",img_bin)
# Defining a kernel length
kernel_length = np.array(img).shape[1]//200
# A verticle kernel of (1 X kernel_length), which will detect all the verticle lines from the image.
verticle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, kernel_length))
# A horizontal kernel of (kernel_length X 1), which will help to detect all the horizontal line from the image.
hori_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_length, 1))
# A kernel of (3 X 3) ones.
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))# Morphological operation to detect verticle lines from an image
img_temp1 = cv2.erode(img_bin, verticle_kernel, iterations=3)
verticle_lines_img = cv2.dilate(img_temp1, verticle_kernel, iterations=3)
#cv2.imwrite("verticle_lines.jpg",verticle_lines_img)# Morphological operation to detect horizontal lines from an image
img_temp2 = cv2.erode(img_bin, hori_kernel, iterations=3)
horizontal_lines_img = cv2.dilate(img_temp2, hori_kernel, iterations=3)
#cv2.imwrite("horizontal_lines.jpg",horizontal_lines_img)# Weighting parameters, this will decide the quantity of an image to be added to make a new image.
alpha = 0.5
beta = 1.0 - alpha
# This function helps to add two image with specific weight parameter to get a third image as summation of two image.
img_final_bin = cv2.addWeighted(verticle_lines_img, alpha, horizontal_lines_img, beta, 0.0)
img_final_bin = cv2.erode(~img_final_bin, kernel, iterations=2)
(thresh, img_final_bin) = cv2.threshold(img_final_bin, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)# For Debugging
# Enable this line to see verticle and horizontal lines in the image which is used to find boxes
#cv2.imwrite("img_final_bin.jpg",img_final_bin)
# Find contours for image, which will detect all the boxes
contours, hierarchy = cv2.findContours(
img_final_bin, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Sort all the contours by top to bottom.
(contours, boundingBoxes) = sort_contours(contours, method="top-to-bottom")
idx = 0
for c in contours:
# Returns the location and width,height for every contour
x, y, w, h = cv2.boundingRect(c)# If the box height is greater then 20, widht is >80, then only save it as a box in "cropped/" folder.
if (w > 50 and h > 20):# and w > 3*h:
idx += 1
new_img = img[y:y+h, x:x+w]
cv2.imwrite(cropped_dir_path+str(x)+'_'+str(y) + '.png', new_img)
###########################################################################################################################################################
def prepare_cropped_folder():
p=pathlib.Path('./Cropped')
if p.exists(): # Cropped folder non empty. Let's clean up
files = [x for x in p.glob('*.*') if x.is_file()]
for f in files:
f.unlink()
else:
p.mkdir()
###########################################################################################################################################################
# MAIN
###########################################################################################################################################################
prepare_cropped_folder()
# Read image from which text needs to be extracted
img = cv2.imread("dkesg.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Performing OTSU threshold
ret, thresh1 = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
thresh1=255-thresh1
bin_y=np.zeros(thresh1.shape[0])
for x in range(0,len(bin_y)):
bin_y[x]=sum(thresh1[x,:])
bin_y=bin_y/max(bin_y)
ry=np.where(bin_y>0.995)[0]
for i in range(0,len(ry)):
cv2.line(img, (0, ry[i]), (thresh1.shape[1], ry[i]), (0, 0, 0), 1)
# We need to draw abox around the picture with a white border in order for box_detection to work
cv2.line(img,(0,0),(0,img.shape[0]-1),(255,255,255),2)
cv2.line(img,(img.shape[1]-1,0),(img.shape[1]-1,img.shape[0]-1),(255,255,255),2)
cv2.line(img,(0,0),(img.shape[1]-1,0),(255,255,255),2)
cv2.line(img,(0,img.shape[0]-1),(img.shape[1]-1,img.shape[0]-1),(255,255,255),2)
cv2.line(img,(0,0),(0,img.shape[0]-1),(0,0,0),1)
cv2.line(img,(img.shape[1]-3,0),(img.shape[1]-3,img.shape[0]-1),(0,0,0),1)
cv2.line(img,(0,0),(img.shape[1]-1,0),(0,0,0),1)
cv2.line(img,(0,img.shape[0]-2),(img.shape[1]-1,img.shape[0]-2),(0,0,0),1)
cv2.imwrite('out.png',img)
box_extraction("out.png", "./Cropped/")
Now... It puts the cropped regions in the Cropped folder. They are named as x_y.png with (x,y) the position on the original image.
Here are two examples of the outputs
and
Now, in a terminal. I used pytesseract on these two images.
The results are the following:
1)
Original Cost
$200,000.00
2)
Amount Existing Liens
$494,215.00
As you can see, pytesseract got the amount wrong in the second case... So, be careful.
Best regards,
Stéphane
I assume the bounding box is fix (rectangle that able to fit in "Original Amount and the amount below). You can use text detection to detect the "Original Amount" and "Amount Existing Liens" using OCR and crop out the image based on the detected location for further OCR on the amount. You can refer this link for text detection
Try to divide the image into different cells using the lines in the image.
For example, first divide the input into rows by detecting the horizontal lines. This can be done by using cv.HoughLinesP and checking for each line if the difference between y-coordinate of the begin and end point is smaller than a certain threshold abs(y2 - y1) < 10. If you have a horizontal line, it's a separator for a new row. You can use the y-coordinates of this line to split the input horizontally.
Next, for the row you're interested in, divide the region into columns using the same technique, but now make sure the difference between the x-coordinates of the begin and end point are smaller than a certain threshold, since you're now looking for the vertical lines.
You can now crop the image to different cells using the y-coordinates of the horizontal lines and the x-coordinates of the vertical lines. Pass these cropped regions one by one to the OCR engine and you'll have for each cell the corresponding text.
My training images are downscaled versions of their associated HR image. Thus, the input and the output images aren't the same dimension. For now, I'm using a hand-crafted sample of 13 images, but eventually I would like to be able to use my 500-ish HR (high-resolution) images dataset. This dataset, however, does not have images of the same dimension, so I'm guessing I'll have to crop them in order to obtain a uniform dimension.
I currently have this code set up: it takes a bunch of 512x512x3 images and applies a few transformations to augment the data (flips). I thus obtain a basic set of 39 images in their HR form, and then I downscale them by a factor of 4, thus obtaining my trainset which consits of 39 images of dimension 128x128x3.
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.image as mpimg
import skimage
from skimage import transform
from constants import data_path
from constants import img_width
from constants import img_height
from model import setUpModel
def setUpImages():
train = []
finalTest = []
sample_amnt = 11
max_amnt = 13
# Extracting images (512x512)
for i in range(sample_amnt):
train.append(mpimg.imread(data_path + str(i) + '.jpg'))
for i in range(max_amnt-sample_amnt):
finalTest.append(mpimg.imread(data_path + str(i+sample_amnt) + '.jpg'))
# # TODO: https://keras.io/preprocessing/image/
# ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False,
# samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0,
# width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0,
# zoom_range=0.0, channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False,
# vertical_flip=False, rescale=None, preprocessing_function=None, data_format=None,
# validation_split=0.0, dtype=None)
# Augmenting data
trainData = dataAugmentation(train)
testData = dataAugmentation(finalTest)
setUpData(trainData, testData)
def setUpData(trainData, testData):
# print(type(trainData)) # <class 'numpy.ndarray'>
# print(len(trainData)) # 64
# print(type(trainData[0])) # <class 'numpy.ndarray'>
# print(trainData[0].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2-1].shape) # (1400, 1400, 3)
# print(trainData[len(trainData)//2].shape) # (350, 350, 3)
# print(trainData[len(trainData)-1].shape) # (350, 350, 3)
# TODO: substract mean of all images to all images
# Separating the training data
Y_train = trainData[:len(trainData)//2] # First half is the unaltered data
X_train = trainData[len(trainData)//2:] # Second half is the deteriorated data
# Separating the testing data
Y_test = testData[:len(testData)//2] # First half is the unaltered data
X_test = testData[len(testData)//2:] # Second half is the deteriorated data
# Adjusting shapes for Keras input # TODO: make into a function ?
X_train = np.array([x for x in X_train])
Y_train = np.array([x for x in Y_train])
Y_test = np.array([x for x in Y_test])
X_test = np.array([x for x in X_test])
# # Sanity check: display four images (2x HR/LR)
# plt.figure(figsize=(10, 10))
# for i in range(2):
# plt.subplot(2, 2, i + 1)
# plt.imshow(Y_train[i], cmap=plt.cm.binary)
# for i in range(2):
# plt.subplot(2, 2, i + 1 + 2)
# plt.imshow(X_train[i], cmap=plt.cm.binary)
# plt.show()
setUpModel(X_train, Y_train, X_test, Y_test)
# TODO: possibly remove once Keras Preprocessing is integrated?
def dataAugmentation(dataToAugment):
print("Starting to augment data")
arrayToFill = []
# faster computation with values between 0 and 1 ?
dataToAugment = np.divide(dataToAugment, 255.)
# TODO: switch from RGB channels to CbCrY
# # TODO: Try GrayScale
# trainingData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(350, 350, 1) for x in trainingData])
# validateData = np.array(
# [(cv2.cvtColor(np.uint8(x * 255), cv2.COLOR_BGR2GRAY) / 255).reshape(1400, 1400, 1) for x in validateData])
# adding the normal images (8)
for i in range(len(dataToAugment)):
arrayToFill.append(dataToAugment[i])
# vertical axis flip (-> 16)
for i in range(len(arrayToFill)):
arrayToFill.append(np.fliplr(arrayToFill[i]))
# horizontal axis flip (-> 32)
for i in range(len(arrayToFill)):
arrayToFill.append(np.flipud(arrayToFill[i]))
# downsizing by scale of 4 (-> 64 images of 128x128x3)
for i in range(len(arrayToFill)):
arrayToFill.append(skimage.transform.resize(
arrayToFill[i],
(img_width/4, img_height/4),
mode='reflect',
anti_aliasing=True))
# # Sanity check: display the images
# plt.figure(figsize=(10, 10))
# for i in range(64):
# plt.subplot(8, 8, i + 1)
# plt.imshow(arrayToFill[i], cmap=plt.cm.binary)
# plt.show()
return np.array(arrayToFill)
My question is: in my case, can I use the Preprocessing tool that Keras offers? I would ideally like to be able to input my varying sized images of high quality, crop them (not downsize them) to 512x512x3, and data augment them through flips and whatnot. Substracting the mean would also be part of what I'd like to achieve. That set would represent my validation set.
Reusing the validation set, I want to downscale by a factor of 4 all the images, and that would generate my training set.
Those two sets could then be split appropriately to obtain, ultimately, the famous X_train Y_train X_test Y_test.
I'm just hesitant about throwing out all the work I've done so far to preprocess my mini sample, but I'm thinking if it can all be done with a single built-in function, maybe I should give that a go.
This is my first ML project, hence me not understanding very well Keras, and the documentation isn't always the clearest. I'm thinking that the fact that I'm working with a X and Y that are different in size, maybe this function doesn't apply to my project.
Thank you! :)
Yes you can use keras preprocessing function. Below some snippets to help you...
def cropping_function(x):
...
return cropped_image
X_image_gen = ImageDataGenerator(preprocessing_function = cropping_function,
horizontal_flip = True,
vertical_flip=True)
X_train_flow = X_image_gen.flow(X_train, batch_size = 16, seed = 1)
Y_image_gen = ImageDataGenerator(horizontal_flip = True,
vertical_flip=True)
Y_train_flow = Y_image_gen.flow(y_train, batch_size = 16, seed = 1)
train_flow = zip(X_train_flow,Y_train_flow)
model.fit_generator(train_flow)
Christof Henkel's suggestion is very clean and nice. I would just like to offer another way to do it using imgaug, a convenient way to augment images in lots of different ways. It's usefull if you want more implemented augmentations or if you ever need to use some ML library other than Keras.
It unfortunatly doesn't have a way to make crops that way but it allows implementing custom functions. Here is an example function for generating random crops of a set size from an image that's at least as big as the chosen crop size:
from imgaug import augmenters as iaa
def random_crop(images, random_state, parents, hooks):
crop_h, crop_w = 128, 128
new_images = []
for img in images:
if (img.shape[0] >= crop_h) and (img.shape[1] >= crop_w):
rand_h = np.random.randint(0, img.shape[0]-crop_h)
rand_w = np.random.randint(0, img.shape[1]-crop_w)
new_images.append(img[rand_h:rand_h+crop_h, rand_w:rand_w+crop_w])
else:
new_images.append(np.zeros((crop_h, crop_w, 3)))
return np.array(new_images)
def keypoints_dummy(keypoints_on_images, random_state, parents, hooks):
return keypoints_on_images
cropper = iaa.Lambda(func_images=random_crop, func_keypoints=keypoints_dummy)
You can then combine this function with any other builtin imgaug function, for example the flip functions that you're already using like this:
seq = iaa.Sequential([cropper, iaa.Fliplr(0.5), iaa.Flipud(0.5)])
This function could then generate lots of different crops from each image. An example image with some possible results (note that it would result in actual (128, 128, 3) images, they are just merged into one image here for visualization):
Your image set could then be generated by:
crops_per_image = 10
images = [skimage.io.imread(path) for path in glob.glob('train_data/*.jpg')]
augs = np.array([seq.augment_image(img)/255 for img in images for _ in range(crops_per_image)])
It would also be simple to add new functions to be applied to the images, for example the remove mean functions you mentioned.
Here's another way performing random and center crop before resizing using native ImageDataGenerator and flow_from_directory. You can add it as preprocess_crop.py module into your project.
It first resizes image preserving aspect ratio and then performs crop. Resized image size is based on crop_fraction which is hardcoded but can be changed. See crop_fraction = 0.875 line where 0.875 appears to be the most common, e.g. 224px crop from 256px image.
Note that the implementation has been done by monkey patching keras_preprocessing.image.utils.loag_img function as I couldn't find any other way to perform crop before resizing without rewriting many other classes above.
Due to these limitations, the cropping method is enumerated into the interpolation field. Methods are delimited by : where the first part is interpolation and second is crop e.g. lanczos:random. Supported crop methods are none, center, random. When no crop method is specified, none is assumed.
How to use it
Just drop the preprocess_crop.py into your project to enable cropping. The example below shows how you can use random cropping for the training and center cropping for validation:
import preprocess_crop
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
#...
# Training with random crop
train_datagen = ImageDataGenerator(
rotation_range=20,
channel_shift_range=20,
horizontal_flip=True,
preprocessing_function=preprocess_input
)
train_img_generator = train_datagen.flow_from_directory(
train_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:random', # <--------- random crop
shuffle = True
)
# Validation with center crop
validate_datagen = ImageDataGenerator(
preprocessing_function=preprocess_input
)
validate_img_generator = validate_datagen.flow_from_directory(
validate_dir,
target_size = (IMG_SIZE, IMG_SIZE),
batch_size = BATCH_SIZE,
class_mode = 'categorical',
interpolation = 'lanczos:center', # <--------- center crop
shuffle = False
)
Here's preprocess_crop.py file to include with your project:
import random
import keras_preprocessing.image
def load_and_crop_img(path, grayscale=False, color_mode='rgb', target_size=None,
interpolation='nearest'):
"""Wraps keras_preprocessing.image.utils.loag_img() and adds cropping.
Cropping method enumarated in interpolation
# Arguments
path: Path to image file.
color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
The desired image format.
target_size: Either `None` (default to original size)
or tuple of ints `(img_height, img_width)`.
interpolation: Interpolation and crop methods used to resample and crop the image
if the target size is different from that of the loaded image.
Methods are delimited by ":" where first part is interpolation and second is crop
e.g. "lanczos:random".
Supported interpolation methods are "nearest", "bilinear", "bicubic", "lanczos",
"box", "hamming" By default, "nearest" is used.
Supported crop methods are "none", "center", "random".
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
ValueError: if interpolation method is not supported.
"""
# Decode interpolation string. Allowed Crop methods: none, center, random
interpolation, crop = interpolation.split(":") if ":" in interpolation else (interpolation, "none")
if crop == "none":
return keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=target_size,
interpolation=interpolation)
# Load original size image using Keras
img = keras_preprocessing.image.utils.load_img(path,
grayscale=grayscale,
color_mode=color_mode,
target_size=None,
interpolation=interpolation)
# Crop fraction of total image
crop_fraction = 0.875
target_width = target_size[1]
target_height = target_size[0]
if target_size is not None:
if img.size != (target_width, target_height):
if crop not in ["center", "random"]:
raise ValueError('Invalid crop method {} specified.', crop)
if interpolation not in keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS:
raise ValueError(
'Invalid interpolation method {} specified. Supported '
'methods are {}'.format(interpolation,
", ".join(keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS.keys())))
resample = keras_preprocessing.image.utils._PIL_INTERPOLATION_METHODS[interpolation]
width, height = img.size
# Resize keeping aspect ratio
# result shold be no smaller than the targer size, include crop fraction overhead
target_size_before_crop = (target_width/crop_fraction, target_height/crop_fraction)
ratio = max(target_size_before_crop[0] / width, target_size_before_crop[1] / height)
target_size_before_crop_keep_ratio = int(width * ratio), int(height * ratio)
img = img.resize(target_size_before_crop_keep_ratio, resample=resample)
width, height = img.size
if crop == "center":
left_corner = int(round(width/2)) - int(round(target_width/2))
top_corner = int(round(height/2)) - int(round(target_height/2))
return img.crop((left_corner, top_corner, left_corner + target_width, top_corner + target_height))
elif crop == "random":
left_shift = random.randint(0, int((width - target_width)))
down_shift = random.randint(0, int((height - target_height)))
return img.crop((left_shift, down_shift, target_width + left_shift, target_height + down_shift))
return img
# Monkey patch
keras_preprocessing.image.iterator.load_img = load_and_crop_img
So After some image preprocessing I have gotten an image which holds 5 contours
(The image was resized for posting here in stackoverflow):
I'd like to remove all "islands" except for the actual letter,
So at first I tried using cv2.erode and cv2.dilate with all kinds of kernels sizes and it didn't do the job, so I decided to remove by masking all contours except the largest one by this:
_, cnts, _ = cv2.findContours(original, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
I would expect according to the given image there would be 5 contours
areas = []
for contour in cnts:
area = cv2.contourArea(contour)
areas.append(area)
relevant_indexes = list(range(1, len(cnts)))
relevant_indexes.remove(areas.index(max(areas)))
mask = numpy.zeros(eroded.shape).astype(eroded.dtype)
color = 255
for i in relevant_indexes:
cv2.fillPoly(mask, cnts[i], color)
cv2.imwrite("mask.png", mask)
// Trying to mask out the noise
result = cv2.bitwise_xor(orifinal, mask)
cv2.imwrite("result.png", result)
But the mask I get is:
it's not what I would expect, and the left down contour is missing,
can someone PLEASE explain me what am I missing here? And what would be the correct approach for getting rid of those "isolated islands"?
Thank you all!
p.s
The original photo I'm working on:
Solution:
It sounds like you want to mask out the largest connected component (cv-speak for "island").
Here's an opencv/python script to do that:
#!/usr/bin/env python
import cv2
import numpy as np
import console
# load image in grayscale
img = cv2.imread("img.png", 0)
# get all connected components
_, output, stats, _ = cv2.connectedComponentsWithStats(img, connectivity=4)
# get a list of areas for each group label
group_areas = stats[cv2.CC_STAT_AREA]
# get the id of the group with the largest area (ignoring 0, which is the background id)
max_group_id = np.argmax(group_areas[1:]) + 1
# get max_group_id mask and save it as an image
max_group_id_mask = (output == max_group_id).astype(np.uint8) * 255
cv2.imwrite("output.png", max_group_id_mask)
Result:
Here's the result of the above script on your sample image:
i am loading the cifar-10 data set , the methods adds the data to tensor array , so to access the data i used .eval() with session , on a normal tf constant it return the value , but on the labels and the train set which are tf array it wont
1- i am using docker tensorflow-jupyter
2- it uses python 3
3- the batch file must be added to data folder
i am using the first batch [data_batch_1.bin]from this file
http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
As notebook:
https://drive.google.com/open?id=0B_AFMME1kY1obkk1YmJHcjV0ODA
The code[As in tensorflow site but modified to read 1 patch] [check the last 7 lines for the data loading] :
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import urllib
import tensorflow as tf
from six.moves import xrange # pylint: disable=redefined-builtin
# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 5000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 1000
IMAGE_SIZE = 32
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 2
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
# Display the training images in the visualizer.
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
def read_cifar10(filename_queue):
"""Reads and parses examples from CIFAR10 data files.
Recommendation: if you want N-way read parallelism, call this function
N times. This will give you N independent Readers reading different
files & positions within those files, which will give better mixing of
examples.
Args:
filename_queue: A queue of strings with the filenames to read from.
Returns:
An object representing a single example, with the following fields:
height: number of rows in the result (32)
width: number of columns in the result (32)
depth: number of color channels in the result (3)
key: a scalar string Tensor describing the filename & record number
for this example.
label: an int32 Tensor with the label in the range 0..9.
uint8image: a [height, width, depth] uint8 Tensor with the image data
"""
class CIFAR10Record(object):
pass
result = CIFAR10Record()
# Dimensions of the images in the CIFAR-10 dataset.
# See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
# input format.
label_bytes = 1 # 2 for CIFAR-100
result.height = 32
result.width = 32
result.depth = 3
image_bytes = result.height * result.width * result.depth
# Every record consists of a label followed by the image, with a
# fixed number of bytes for each.
record_bytes = label_bytes + image_bytes
# Read a record, getting filenames from the filename_queue. No
# header or footer in the CIFAR-10 format, so we leave header_bytes
# and footer_bytes at their default of 0.
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)
# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(
tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
[result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])
return result
def inputs(eval_data, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
eval_data: bool, indicating if one should use the train or eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames=[];
filenames.append(os.path.join(data_dir, 'data_batch_1.bin') )
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
print(filenames)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
width, height)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_whitening(resized_image)
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples_in_queue)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
sess = tf.InteractiveSession()
train_data,train_labels = inputs(False,"data",6000)
print (train_data,train_labels)
train_data=train_data.eval()
train_labels=train_labels.eval()
print(train_data)
print(train_labels)
sess.close()
You must call tf.train.start_queue_runners(sess) before you call train_data.eval() or train_labels.eval().
This is a(n unfortunate) consequence of how TensorFlow input pipelines are implemented: the tf.train.string_input_producer(), tf.train.shuffle_batch(), and tf.train.batch() functions internally create queues that buffer records between different stages in the input pipeline. The tf.train.start_queue_runners() call tells TensorFlow to start fetching records into these buffers; without calling it the buffers remain empty and eval() hangs indefinitely.
I have an image 315x581. I want to crop it in 28x28 from top left to bottom right, then I need to save each 28x28 image in folder.
I could crop just one image from y1=0 to y2=28 and x1=0 to x2=28.
First problem is: I used cv2.imwrite("cropped.jpg", cropped) to save this small image, but It doesn't save it, provided that it works some line above.
Second problem is: How can I write a code which it keeps on cropping the image in 28x28 from left to right and top to bottom and save each subimage.
I used for loop, but I don't know how to complete it.
Thank you so much for any help.
Here this is my code,
import cv2
import numpy as np
from PIL import Image
import PIL.Image
import os
import gzip
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
#%%
image1LL='C:/Users/Tala/Documents/PythonProjects/Poster-OpenCV-MaskXray/CHNCXR_0001_0_LL.jpg'
mask1LL='C:/Users/Tala/Documents/PythonProjects/Poster-OpenCV-MaskXray/CHNCXR_0001_0_threshLL.jpg'
#finalsSave='C:/Users/Tala/Documents/PythonProjects/Poster-OpenCV-MaskXray/Xray Result'
# load the image
img = cv2.imread(image1LL,0)
mask = cv2.imread(mask1LL,0)
# combine foreground+background
final1LL = cv2.bitwise_and(img,img,mask = mask)
cv2.imshow('final1LL',final1LL)
cv2.waitKey(100)
final1LL.size
final1LL.shape
# Save the image
cv2.imwrite('final1LL.jpg',final1LL)
# crop the image using array slices -- it's a NumPy array
# after all!
y1=0
x1=0
for y2 in range(0,580,28):
for x2 in range(0,314,28):
cropped = final1LL[0:28, 0:28]
cv2.imshow('cropped', cropped)
cv2.waitKey(100)
cv2.imwrite("cropped.jpg", cropped)
Your approach is good, but there is some fine tuning required. The following code will help you:
import cv2
filename = 'p1.jpg'
img = cv2.imread(filename, 1)
interval = 100
stride = 100
count = 0
print img.shape
for i in range(0, img.shape[0], interval):
for j in range(0, img.shape[1], interval):
print j
cropped_img = img[j:j + stride, i:i + stride] #--- Notice this part where you have to add the stride as well ---
count += 1
cv2.imwrite('cropped_image_' + str(count) + '_.jpg', cropped_img) #--- Also take note of how you would save all the cropped images by incrementing the count variable ---
cv2.waitKey()
My result:
Original image:
Some of the cropped images:
Cropped image 1
Cropped image 2
Cropped image 3
If you are using it in PyTorch as a deep learning framework, then this task would be quite easy and can be done without the need for any other external image processing libraries such as OpenCV. The below code will convert a single image into a stack of multiple images in a form of PyTorch tensor. If you want to use only images then you need to remove the line "transforms.ToTensor()" and save the "tens" variable in the code as an image using matplotlib.
Note: Here bird image is used with dimension 32 x 32 x 3, crop images 5x5x3 with stride =1.
image = Image.open('bird.png')
tensreal = trans(image)
trans = transforms.Compose([transforms.Resize(32),
transforms.ToTensor(),
])
stride = 1
crop_height = 5
crop_width = 5
img_height = 32
img_width = 32
tens_list = []
for i in range(0,img_width-crop_width,stride):
for j in range(0,img_height-crop_height ,stride):
tens = trans(image)
tens1 = tens[:, j:j+crop_height, i:i+crop_width]
tens_list.append(tens1)
all_tens = torch.stack(tens_list)
print(all_tens.size())