I would like to remove gridlines from a scanned document using Python to make them easier to read.
Here is a snippet of what we're working with:
As you can see, there are inconsistencies in the grid, and to make matters worse the scanning isn't always square. Five example documents can be found here.
I am open to whatever methods you may suggest for this, but using openCV and pypdf might be a good place to start before any more involved breaking out the machine learning techniques.
This post addresses a similar question, but does not have a solution. The user posted the following code snippet which may be of interest (to be honest I have not tested it, I am just putting it here for your convivence).
import cv2
import numpy as np
def rmv_lines(Image_Path):
img = cv2.imread(Image_Path)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength, maxLineGap = 100, 15
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x in range(0, len(lines)):
for x1,y1,x2,y2 in lines[x]:
#if x1 != x2 and y1 != y2:
cv2.line(img,(x1,y1),(x2,y2),(255,255,255),4)
return cv2.imwrite('removed.jpg',img)
I would prefer the final documents be in pdf format if possible.
(disclaimer: I am the author of pText, the library being used in this answer)
I can help you part of the way (extracting the images from the PDF).
Start by loading the Document.
You'll see that I'm passing an extra parameter in the PDF.loads method.
SimpleImageExtraction acts like an EventListener for PDF instructions. Whenever it encounters an instruction that would render an image, it intercepts the instruction and stores the image.
with open(file, "rb") as pdf_file_handle:
l = SimpleImageExtraction()
doc = PDF.loads(pdf_file_handle, [l])
Now that we have loaded the Document, and SimpleImageExtraction should have had a chance to work its magic, we can output the images. In this example I'm just going to store them.
for i, img in enumerate(l.get_images_per_page(0)):
output_file = "image_" + str(i) + ".jpg"
with open(output_file, "wb") as image_file_handle:
img.save(image_file_handle)
You can obtain pText either on GitHub, or using PyPi
There are a ton more examples, check them out to find out more about working with images.
Related
I've been struggling with this challenge for the best of today, I've managed to get a good point using previous posts and other resources.
I'm trying to convert a PIL.Image to a QPixmap so that I can display using a QgraphicsScene on my PyQT GUI. However when the picture is displayed the colours have changed?? Has anyone ever experienced this issue?
The code I use for this is as below.
self.graphicsScene.clear()
im = Image.open('Penguins.jpg')
im = im.convert("RGBA")
data = im.tobytes("raw","RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_ARGB32)
pix = QtGui.QPixmap.fromImage(qim)
self.graphicsScene.addPixmap(pix)
self.graphicsView.fitInView(QtCore.QRectF(0,0,im.size[0], im.size[1]), QtCore.Qt.KeepAspectRatio)
self.graphicsScene.update()
Im on windows 7 64bit, using python 3.4 with PyQt4 and pillow 3.1.0. The results im getting can be seen below.
Original picture
Picture displayed in GUI
Thanks in advance :).
In your PIL image the last band is the alpha channel, whereas in the Qt image the alpha channel is the first (RGBA vs. ARGB). There may be other ways of permuting the bands but the easiest way seems to use the ImageQt class.
from PIL.ImageQt import ImageQt
qim = ImageQt(im)
pix = QtGui.QPixmap.fromImage(qim)
I dont know why, but ImageQt crashed in my system Win10, Python3, Qt5.
So i went to an other direction and tried a solution found on github.
This code doesnt crash, but gives a effect shown in first post.
My solution for this is, to separate the RGB pic to each color and assemble it as BGR or BGRA before converting it to a Pixmap
def pil2pixmap(self, im):
if im.mode == "RGB":
r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))
elif im.mode == "RGBA":
r, g, b, a = im.split()
im = Image.merge("RGBA", (b, g, r, a))
elif im.mode == "L":
im = im.convert("RGBA")
# Bild in RGBA konvertieren, falls nicht bereits passiert
im2 = im.convert("RGBA")
data = im2.tobytes("raw", "RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_ARGB32)
pixmap = QtGui.QPixmap.fromImage(qim)
return pixmap
I've tested RGB, and PIL saves data with the qt format Format_RGB888:
im = im.convert("RGB")
data = im.tobytes("raw","RGB")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_RGB888)
I haven't tested it, but I assume for that RGBA it will be the equivalent format Format_RGBA8888:
im = im.convert("RGBA")
data = im.tobytes("raw","RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_RGBA8888)
#titusjan's answer didn't work for me. Both #Michael and #Jordan have solutions that worked. A simpler version of #Michael's is just to redefine how the bytes are written for the image. So this works for me:
im2 = im.convert("RGBA")
data = im2.tobytes("raw", "BGRA")
qim = QtGui.QImage(data, im.width, im.height, QtGui.QImage.Format_ARGB32)
pixmap = QtGui.QPixmap.fromImage(qim)
The only difference is that I swapped the order for the encoding, e.g. to 'BGRA' instead of 'RGBA'.
This maybe usefull
Creates an ImageQt object from a PIL Image object. This class is a subclass of QtGui.QImage, which means that you can pass the resulting objects directly to PyQt4/5 API functions and methods.
This operation is currently supported for mode 1, L, P, RGB, and RGBA images. To handle other modes, you need to convert the image first.
https://pillow.readthedocs.io/en/stable/reference/ImageQt.html
One problems that many of the existing answers run into is that Qt seems to have an undocumented implicit assumption that by default image lines need to start on a 32 bit boundary. For images with alpha channel that is automatically the case, but for RGB images that have sizes that are not divisible by 4 it is not, and the resulting QImage typically looks grey and skewed, or it could crash.
The easiest solution is to use the bytesPerLine parameter of the QImage constructor to explicitly tell it to start the next line at the right position and RGB works fine (no clue why it doesn't do that automatically):
im = im.convert("RGB")
data = im.tobytes("raw", "RGB")
qi = QImage(data, im.size[0], im.size[1], im.size[0]*3, QImage.Format.Format_RGB888)
pix = QPixmap.fromImage(qi)
Another possible reason for crashes is the data retention. QImage does not make a copy of or add a reference to the data, it assumes the data is valid until the QImage is destroyed. For this specific answer which immediately transforms the QImage into a QPixmap it shouldn't matter, as the QPixmap keeps a copy of the data, but if for whatever reason you hang on to the QImage, you also need to keep a reference to the data around.
I'm currently doing an internship remotely and I got to code a Visualization Tool with D3.JS but here is not the part where I got a problem.
To fix the subject I got some file called episodes, which contain data about the path of a robot, if he succeeds or failed and the different point in cartesian coordinate.
(BTW I'm French I apologize in advance if there is some grammar issues)
So I got a small Python program that interpret these data contained in these .p files here's the code :
import pickle
import matplotlib.pyplot as plt #PyQT is require or tkinter
import numpy as np
import cv2
#This script aligns the true position to position given by orbslam.
#Load episode with id. #This loads the dictionary containing all information about an episode.
trajectory_dir = "ORBSlam/"
episode_id = 0
episode = pickle.load(open( trajectory_dir+"episodeStats"+str(episode_id)+".p", "rb" ))
#Extract useful data from the dictionary
pose_env = episode["pose_env"]
images_RGB = episode["rgb"]
images_depth = episode["depth"]
actions_orb = episode["orb_action"]
actions_best = episode["best_action"]
goal_distances = episode["goal_distance"]
success = episode["success"]
#Save observations into images.
for i in range(len(images_RGB)):
cv2.imwrite( trajectory_dir+"RGBs"+str(episode_id)+"/"+str(i)+".png", images_RGB[i] )
cv2.imwrite( trajectory_dir+"Depths"+str(episode_id)+"/"+str(i)+".png", images_depth[i]*255 )
#Display 2D trajectories.
x_env = []
y_env = []
for i in range(len(pose_env)):
#add x,y coordinates of the translation
x_env.append(pose_env[i][0,3])
y_env.append(pose_env[i][2,3])
plt.plot(x_env,y_env)
plt.axis('equal')
plt.show()
The problem here is that during the loop where it's supposed to output png image, in fact there is no output in the folder and since it's a silent function I don't know what the error is, I created both folder in ORBSlam folder (RGBs and Depths), so do you think it's something about permissions of writing or something like that ? (I'm working on macOS)
Thanks in advance for all the responses.
EDIT : I find why I've got no output images, I simply forgot to create a folder with the name of episode_id (Basically 0,1,2,...) in Depths and RGBs folder, my bad it was a dumb mistake
Solved : a folder inside where missing to get the output
I have a bunch of medical images in dicom that I want to correct for bias field inhomogeneity using SimpleITK in Python. The workflow is straightforward: I want to (1) open the dicom image, (2) create a binary mask of the object in the image, (3) apply N4 bias field correction to the masked image, (4) write back the corrected image in dicom format. Note that no spatial transformation is applied to the image, but only intensity transformation, so that I could copy all spatial information and all meta data (except for date/hour of creation and instance number) from the original to the corrected image.
I have written this function to achieve my goal:
def n4_dcm_correction(dcm_in_file):
metadata_to_set = ["0008|0012", "0008|0013", "0020|0013"]
filepath = PurePath(dcm_in_file)
root_dir = str(filepath.parent)
file_name = filepath.stem
dcm_reader = sitk.ImageFileReader()
dcm_reader.SetFileName(dcm_in_file)
dcm_reader.LoadPrivateTagsOn()
inputImage = dcm_reader.Execute()
metadata_to_copy = [k for k in inputImage.GetMetaDataKeys() if k not in metadata_to_set]
maskImage = sitk.OtsuThreshold(inputImage,0,1,200)
filledImage = sitk.BinaryFillhole(maskImage)
floatImage = sitk.Cast(inputImage,sitk.sitkFloat32)
corrector = sitk.N4BiasFieldCorrectionImageFilter();
output = corrector.Execute(floatImage, filledImage)
output.CopyInformation(inputImage)
for k in metadata_to_copy:
print("key is: {}; value is {}".format(k, inputImage.GetMetaData(k)))
output.SetMetaData(k, inputImage.GetMetaData(k))
output.SetMetaData("0008|0012", time.strftime("%Y%m%d"))
output.SetMetaData("0008|0013", time.strftime("%H%M%S"))
output.SetMetaData("0008|0013", str(float(inputImage.GetMetaData("0008|0013")) + randint(1, 999)))
out_file = "{}/{}_biascorrected.dcm".format(root_dir, file_name)
writer = sitk.ImageFileWriter()
writer.KeepOriginalImageUIDOn()
writer.SetFileName(out_file)
writer.Execute(sitk.Cast(output, sitk.sitkUInt16))
return
n4_dcm_correction("/path/to/my/dcm/image.dcm")
As much as the bias correction part works (the bias is removed), the writing part is a mess. I would expect my output dicom to have the exact same metadata of the original one, however they are all missing, notably the patient name, the protocol name and the manufacturer. Similalry, something is very wrong with the spatial information, since if I try to convert the dicom to the nifti format with dcm2niix, the directions are reversed: superior is down and inferior is up, forward is back and backward is front. What step am I missing ?
I suspect you are working with a MRI series, not a single file. Likely this example does what you want, read-modify-write a volume stored in a set of files.
If the example did not resolve your issue, please post to the ITK discourse which is the primary location for ITK/SimpleITK related discussions.
I am currently working on a program that requires me to read DICOM files and display them correctly. After extracting the pixel array from the DICOM file, I ran it through both the imshow function from matplotlib and cv2. To my surprise they both yield vastly different images. One has color while the other has no, and one shows more detail than the other. Im confused as to why this is happening. I found Difference between plt.show and cv2.imshow? and tried converting the pixels to BRG instead of RGB what cv2 uses but this changes nothing. I am wondering why it is that these 2 frameworks show the same pixel buffer so differently. below is my code and an image to show the outcomes
import cv2
import os
import pydicom
import numpy as np
import matplotlib.pyplot as plt
inputdir = 'datasets/dicom/98890234/20030505/CT/CT2/'
outdir = 'datasets/dicom/pngs/'
test_list = [ f for f in os.listdir(inputdir)]
for f in test_list[:1]: # remove "[:10]" to convert all images
ds = pydicom.dcmread(inputdir + f)
img = np.array(ds.pixel_array, dtype = np.uint8) # get image array
rows,cols = img.shape
cannyImg = cv2.Canny(img, cols, rows)
cv2.imshow('thing',cv2.cvtColor(img, cv2.COLOR_BRG2RBG))
cv2.imshow('thingCanny', cannyImg)
plt.imshow(ds.pixel_array)
plt.show()
cv2.waitKey()
Using the cmap parameter with imshow() might solve the issue. Try this:
plt.imshow(arr, cmap='gray', vmin=0, vmax=255)
Refer to the docs for more info.
Not an answer but too long for a comment. I think the root cause of your problems is in the initialization of the array already:
img = np.array(ds.pixel_array, dtype = np.uint8)
uint8 is presumably not what you have in the DICOM file. First because it looks like a CT image which is usually stored with 10+ bpp and second because the artifacts you are facing look very familiar to me. These kind of artifacts (dense bones displayed in black, gradient effects) usually occur if >8 bit pixeldata is interpreted as 8bit.
BTW: To me, both renderings look obviously incorrect.
Sorry for not being a python expert and just being able to tell what is wrong but unable to tell how to get it right.
I'm trying to write code to detect the color of a particular area of an image.
So far I have come across is using OpenCV, we can do this, But still haven't found any particular tutorial to help with this.
I want to do this with javascript, but I can also use python OpenCV to get the results.
can anyone please help me with sharing any useful link or can explain how can I achieve detecting the color of the particular area in the image.
For eg.
The box in red will show a different color. I need to figure out which color it is showing.
What I have tried:
I have tried OpenCV canny images, though I am successful to get area separated with canny images, how to detect the color of that particular canny area is still a challenge.
Also, I tried it with inRange method from OpenCV which works perfect
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, lower, upper)
output = cv2.bitwise_and(image, image, mask = mask)
# show the images
cv2.imshow("images", np.hstack([image, output]))
It works well and extracts the color area from the image But is there any callback which responds if the image has particular color so that it can be all done automatically?
So I am assuming here that, you already know the location of the rect which is going to be dynamically changed and need to find out the single most dominant color in the desired ROI. There are a lot of ways to do the same, one is by getting the average, of all the pixels in the ROI, other is to count all the distinct pixel values in the given ROI, with some tolerance difference.
Method 1:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227) # left, top, bottom, right
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
print cv2.mean(cropped_img)
>>> (53.430516018839604, 41.05708814243569, 244.54991977640907, 0.0)
Method 2:
To find out the various dominant clusters in the given image you can use cv2.kmeans() as:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227)
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
Z = cropped_img.reshape((-1, 3))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 4
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Sort all the colors, as per their frequencies, as:
print center[sorted(range(K), key=lambda x: np.count_nonzero(label == [x]), reverse=True)[0]]
>>> [ 52.96525192 40.93861389 245.02325439]
#Prateek... nice to have the question narrowed down to the core. The code you provided does not address this issue at hand and remains just a question. I'll hint you towards a direction but you have to code it yourself.
steps that guide you towards a scripting result:
1) In your script add two (past & current) pixellists to store values (pixeltype + occurance).
2) Introduce a while-loop with an action true/stop statement (link to "3") for looping purpose because then it becomes a dynamic process.
3) Write a GUI with a flashy warning banner.
4) compare the pixellist with current_pixellist for serious state change (threshhold).
5) If the delta state change at "4" meets threshold throw the alert ("3").
When you've got written the code and enjoyed the trouble of tracking the tracebacks... then edit your question, update it with the code and reshape your question (i can help wiht that if you want). Then we can pick it up from there. Does that sound like a plan?
I am not sure why you need callback in this situation, but maybe this is what you mean?
def test_color(image, lower, upper):
mask = cv2.inRange(image, lower, upper)
return np.any(mask == 255)
Explanations:
cv2.inRange() will return 255 when pixel is in range (lower, upper), 0 otherwise (see docs)
Use np.any() to check if any element in the mask is actually 255