I've been struggling with this challenge for the best of today, I've managed to get a good point using previous posts and other resources.
I'm trying to convert a PIL.Image to a QPixmap so that I can display using a QgraphicsScene on my PyQT GUI. However when the picture is displayed the colours have changed?? Has anyone ever experienced this issue?
The code I use for this is as below.
self.graphicsScene.clear()
im = Image.open('Penguins.jpg')
im = im.convert("RGBA")
data = im.tobytes("raw","RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_ARGB32)
pix = QtGui.QPixmap.fromImage(qim)
self.graphicsScene.addPixmap(pix)
self.graphicsView.fitInView(QtCore.QRectF(0,0,im.size[0], im.size[1]), QtCore.Qt.KeepAspectRatio)
self.graphicsScene.update()
Im on windows 7 64bit, using python 3.4 with PyQt4 and pillow 3.1.0. The results im getting can be seen below.
Original picture
Picture displayed in GUI
Thanks in advance :).
In your PIL image the last band is the alpha channel, whereas in the Qt image the alpha channel is the first (RGBA vs. ARGB). There may be other ways of permuting the bands but the easiest way seems to use the ImageQt class.
from PIL.ImageQt import ImageQt
qim = ImageQt(im)
pix = QtGui.QPixmap.fromImage(qim)
I dont know why, but ImageQt crashed in my system Win10, Python3, Qt5.
So i went to an other direction and tried a solution found on github.
This code doesnt crash, but gives a effect shown in first post.
My solution for this is, to separate the RGB pic to each color and assemble it as BGR or BGRA before converting it to a Pixmap
def pil2pixmap(self, im):
if im.mode == "RGB":
r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))
elif im.mode == "RGBA":
r, g, b, a = im.split()
im = Image.merge("RGBA", (b, g, r, a))
elif im.mode == "L":
im = im.convert("RGBA")
# Bild in RGBA konvertieren, falls nicht bereits passiert
im2 = im.convert("RGBA")
data = im2.tobytes("raw", "RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_ARGB32)
pixmap = QtGui.QPixmap.fromImage(qim)
return pixmap
I've tested RGB, and PIL saves data with the qt format Format_RGB888:
im = im.convert("RGB")
data = im.tobytes("raw","RGB")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_RGB888)
I haven't tested it, but I assume for that RGBA it will be the equivalent format Format_RGBA8888:
im = im.convert("RGBA")
data = im.tobytes("raw","RGBA")
qim = QtGui.QImage(data, im.size[0], im.size[1], QtGui.QImage.Format_RGBA8888)
#titusjan's answer didn't work for me. Both #Michael and #Jordan have solutions that worked. A simpler version of #Michael's is just to redefine how the bytes are written for the image. So this works for me:
im2 = im.convert("RGBA")
data = im2.tobytes("raw", "BGRA")
qim = QtGui.QImage(data, im.width, im.height, QtGui.QImage.Format_ARGB32)
pixmap = QtGui.QPixmap.fromImage(qim)
The only difference is that I swapped the order for the encoding, e.g. to 'BGRA' instead of 'RGBA'.
This maybe usefull
Creates an ImageQt object from a PIL Image object. This class is a subclass of QtGui.QImage, which means that you can pass the resulting objects directly to PyQt4/5 API functions and methods.
This operation is currently supported for mode 1, L, P, RGB, and RGBA images. To handle other modes, you need to convert the image first.
https://pillow.readthedocs.io/en/stable/reference/ImageQt.html
One problems that many of the existing answers run into is that Qt seems to have an undocumented implicit assumption that by default image lines need to start on a 32 bit boundary. For images with alpha channel that is automatically the case, but for RGB images that have sizes that are not divisible by 4 it is not, and the resulting QImage typically looks grey and skewed, or it could crash.
The easiest solution is to use the bytesPerLine parameter of the QImage constructor to explicitly tell it to start the next line at the right position and RGB works fine (no clue why it doesn't do that automatically):
im = im.convert("RGB")
data = im.tobytes("raw", "RGB")
qi = QImage(data, im.size[0], im.size[1], im.size[0]*3, QImage.Format.Format_RGB888)
pix = QPixmap.fromImage(qi)
Another possible reason for crashes is the data retention. QImage does not make a copy of or add a reference to the data, it assumes the data is valid until the QImage is destroyed. For this specific answer which immediately transforms the QImage into a QPixmap it shouldn't matter, as the QPixmap keeps a copy of the data, but if for whatever reason you hang on to the QImage, you also need to keep a reference to the data around.
Related
I am using pyautogui.locateOnScreen() function to locate elements in chrome and get their x,y coordinates and click them. But at some point I need to take a screenshot of a part of the screen and search for the object I want in this screenshot. Then I get coordinates of it. Is it possible to do it with pyautogui?
My example code:
coord_one = pyautogui.locateOnScreen("first_image.png",confidence=0.95)
scshoot = pyautogui.screenshot(region=coord_one)
coord_two = # search second image in scshoot and if it can be detected get coordinates of it.
If it is not possible with pyautogui, can you advice the easiest-smartest way?
Thanks in advance.
I don't believe there is a built-in direct way to do what you need but the python-opencv library does the job.
The following code sample assumes you have an screen capture you just took "capture.png" and you want to find "logo.png" in that capture, which you know is an subsection of "capture.png".
Minimal example
"""Get bounding box of cropped image from original image."""
import cv2 as cv
import numpy as np
img_rgb = cv.imread(r'res/original.png')
# the cropped image, expected to be smaller
target_img = cv.imread(r'res/crop.png')
_, w, h = target_img.shape[::-1]
res = cv.matchTemplate(img_rgb,target_img,cv.TM_CCOEFF_NORMED)
# with the method used, the date in res are top left pixel coords
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(res)
top_left = max_loc
# if we add to it the width and height of the target, then we get the bbox.
bottom_right = (top_left[0] + w, top_left[1] + h)
cv.rectangle(img_rgb,top_left, bottom_right, 255, 2)
cv.imshow('', img_rgb)
MatchTemplate
From the docs, MatchTemplate "simply slides the template image over the input image (as in 2D convolution) and compares the template and patch of input image under the template image." Under the hood, this offers methods such as square difference to compare the images represented as arrays.
See more
For a more in-depth explanation, check the opencv docs as the code is entirely based off their example.
I would like to remove gridlines from a scanned document using Python to make them easier to read.
Here is a snippet of what we're working with:
As you can see, there are inconsistencies in the grid, and to make matters worse the scanning isn't always square. Five example documents can be found here.
I am open to whatever methods you may suggest for this, but using openCV and pypdf might be a good place to start before any more involved breaking out the machine learning techniques.
This post addresses a similar question, but does not have a solution. The user posted the following code snippet which may be of interest (to be honest I have not tested it, I am just putting it here for your convivence).
import cv2
import numpy as np
def rmv_lines(Image_Path):
img = cv2.imread(Image_Path)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength, maxLineGap = 100, 15
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x in range(0, len(lines)):
for x1,y1,x2,y2 in lines[x]:
#if x1 != x2 and y1 != y2:
cv2.line(img,(x1,y1),(x2,y2),(255,255,255),4)
return cv2.imwrite('removed.jpg',img)
I would prefer the final documents be in pdf format if possible.
(disclaimer: I am the author of pText, the library being used in this answer)
I can help you part of the way (extracting the images from the PDF).
Start by loading the Document.
You'll see that I'm passing an extra parameter in the PDF.loads method.
SimpleImageExtraction acts like an EventListener for PDF instructions. Whenever it encounters an instruction that would render an image, it intercepts the instruction and stores the image.
with open(file, "rb") as pdf_file_handle:
l = SimpleImageExtraction()
doc = PDF.loads(pdf_file_handle, [l])
Now that we have loaded the Document, and SimpleImageExtraction should have had a chance to work its magic, we can output the images. In this example I'm just going to store them.
for i, img in enumerate(l.get_images_per_page(0)):
output_file = "image_" + str(i) + ".jpg"
with open(output_file, "wb") as image_file_handle:
img.save(image_file_handle)
You can obtain pText either on GitHub, or using PyPi
There are a ton more examples, check them out to find out more about working with images.
I am currently working on a program that requires me to read DICOM files and display them correctly. After extracting the pixel array from the DICOM file, I ran it through both the imshow function from matplotlib and cv2. To my surprise they both yield vastly different images. One has color while the other has no, and one shows more detail than the other. Im confused as to why this is happening. I found Difference between plt.show and cv2.imshow? and tried converting the pixels to BRG instead of RGB what cv2 uses but this changes nothing. I am wondering why it is that these 2 frameworks show the same pixel buffer so differently. below is my code and an image to show the outcomes
import cv2
import os
import pydicom
import numpy as np
import matplotlib.pyplot as plt
inputdir = 'datasets/dicom/98890234/20030505/CT/CT2/'
outdir = 'datasets/dicom/pngs/'
test_list = [ f for f in os.listdir(inputdir)]
for f in test_list[:1]: # remove "[:10]" to convert all images
ds = pydicom.dcmread(inputdir + f)
img = np.array(ds.pixel_array, dtype = np.uint8) # get image array
rows,cols = img.shape
cannyImg = cv2.Canny(img, cols, rows)
cv2.imshow('thing',cv2.cvtColor(img, cv2.COLOR_BRG2RBG))
cv2.imshow('thingCanny', cannyImg)
plt.imshow(ds.pixel_array)
plt.show()
cv2.waitKey()
Using the cmap parameter with imshow() might solve the issue. Try this:
plt.imshow(arr, cmap='gray', vmin=0, vmax=255)
Refer to the docs for more info.
Not an answer but too long for a comment. I think the root cause of your problems is in the initialization of the array already:
img = np.array(ds.pixel_array, dtype = np.uint8)
uint8 is presumably not what you have in the DICOM file. First because it looks like a CT image which is usually stored with 10+ bpp and second because the artifacts you are facing look very familiar to me. These kind of artifacts (dense bones displayed in black, gradient effects) usually occur if >8 bit pixeldata is interpreted as 8bit.
BTW: To me, both renderings look obviously incorrect.
Sorry for not being a python expert and just being able to tell what is wrong but unable to tell how to get it right.
I'm trying to write code to detect the color of a particular area of an image.
So far I have come across is using OpenCV, we can do this, But still haven't found any particular tutorial to help with this.
I want to do this with javascript, but I can also use python OpenCV to get the results.
can anyone please help me with sharing any useful link or can explain how can I achieve detecting the color of the particular area in the image.
For eg.
The box in red will show a different color. I need to figure out which color it is showing.
What I have tried:
I have tried OpenCV canny images, though I am successful to get area separated with canny images, how to detect the color of that particular canny area is still a challenge.
Also, I tried it with inRange method from OpenCV which works perfect
# find the colors within the specified boundaries and apply
# the mask
mask = cv2.inRange(image, lower, upper)
output = cv2.bitwise_and(image, image, mask = mask)
# show the images
cv2.imshow("images", np.hstack([image, output]))
It works well and extracts the color area from the image But is there any callback which responds if the image has particular color so that it can be all done automatically?
So I am assuming here that, you already know the location of the rect which is going to be dynamically changed and need to find out the single most dominant color in the desired ROI. There are a lot of ways to do the same, one is by getting the average, of all the pixels in the ROI, other is to count all the distinct pixel values in the given ROI, with some tolerance difference.
Method 1:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227) # left, top, bottom, right
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
print cv2.mean(cropped_img)
>>> (53.430516018839604, 41.05708814243569, 244.54991977640907, 0.0)
Method 2:
To find out the various dominant clusters in the given image you can use cv2.kmeans() as:
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg")
region_of_interest = (356, 88, 495, 227)
cropped_img = img[region_of_interest[1]:region_of_interest[3], region_of_interest[0]:region_of_interest[2]]
Z = cropped_img.reshape((-1, 3))
Z = np.float32(Z)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 4
ret, label, center = cv2.kmeans(Z, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Sort all the colors, as per their frequencies, as:
print center[sorted(range(K), key=lambda x: np.count_nonzero(label == [x]), reverse=True)[0]]
>>> [ 52.96525192 40.93861389 245.02325439]
#Prateek... nice to have the question narrowed down to the core. The code you provided does not address this issue at hand and remains just a question. I'll hint you towards a direction but you have to code it yourself.
steps that guide you towards a scripting result:
1) In your script add two (past & current) pixellists to store values (pixeltype + occurance).
2) Introduce a while-loop with an action true/stop statement (link to "3") for looping purpose because then it becomes a dynamic process.
3) Write a GUI with a flashy warning banner.
4) compare the pixellist with current_pixellist for serious state change (threshhold).
5) If the delta state change at "4" meets threshold throw the alert ("3").
When you've got written the code and enjoyed the trouble of tracking the tracebacks... then edit your question, update it with the code and reshape your question (i can help wiht that if you want). Then we can pick it up from there. Does that sound like a plan?
I am not sure why you need callback in this situation, but maybe this is what you mean?
def test_color(image, lower, upper):
mask = cv2.inRange(image, lower, upper)
return np.any(mask == 255)
Explanations:
cv2.inRange() will return 255 when pixel is in range (lower, upper), 0 otherwise (see docs)
Use np.any() to check if any element in the mask is actually 255
I am reading an image from SimpleITK but I get these results in vtk any help?
I am not sure where things are going wrong here.
Please see image here.
####
CODE
def sitk2vtk(img):
size = list(img.GetSize())
origin = list(img.GetOrigin())
spacing = list(img.GetSpacing())
sitktype = img.GetPixelID()
vtktype = pixelmap[sitktype]
ncomp = img.GetNumberOfComponentsPerPixel()
# there doesn't seem to be a way to specify the image orientation in VTK
# convert the SimpleITK image to a numpy array
i2 = sitk.GetArrayFromImage(img)
#import pylab
#i2 = reshape(i2, size)
i2_string = i2.tostring()
# send the numpy array to VTK with a vtkImageImport object
dataImporter = vtk.vtkImageImport()
dataImporter.CopyImportVoidPointer( i2_string, len(i2_string) )
dataImporter.SetDataScalarType(vtktype)
dataImporter.SetNumberOfScalarComponents(ncomp)
# VTK expects 3-dimensional parameters
if len(size) == 2:
size.append(1)
if len(origin) == 2:
origin.append(0.0)
if len(spacing) == 2:
spacing.append(spacing[0])
# Set the new VTK image's parameters
#
dataImporter.SetDataExtent (0, size[0]-1, 0, size[1]-1, 0, size[2]-1)
dataImporter.SetWholeExtent(0, size[0]-1, 0, size[1]-1, 0, size[2]-1)
dataImporter.SetDataOrigin(origin)
dataImporter.SetDataSpacing(spacing)
dataImporter.Update()
vtk_image = dataImporter.GetOutput()
return vtk_image
###
END CODE
You are ignoring two things:
There is an order change when you perform GetArrayFromImage:
The order of index and dimensions need careful attention during conversion. Quote from SimpleITK Notebooks at http://insightsoftwareconsortium.github.io/SimpleITK-Notebooks/01_Image_Basics.html:
ITK's Image class does not have a bracket operator. It has a GetPixel which takes an ITK Index object as an argument, which is an array ordered as (x,y,z). This is the convention that SimpleITK's Image class uses for the GetPixel method as well.
While in numpy, an array is indexed in the opposite order (z,y,x).
There is a change of coordinates between ITK and VTK image representations. Historically, in computer graphics there is a tendency to align the camera in such a way that the positive Y axis is pointing down. This results in a change of coordinates between ITK and VTK images.