I get the UserWarning 'C:\Users...\Anaconda3\lib\site-packages\openpyxl\reader\drawings.py:58: UserWarning: wmf image format is not supported so the image is being dropped warn(msg)'
How can I suppress this warning message?
If you only need to read the file, the warning can be suppressed by setting the read_only keyword argument to True:
wb = load_workbook(filename=<your file>, read_only=True)
stovfl answered this question:
To add WMF read or write support to your application, use PIL.WmfImagePlugin.register_handler() to register a WMF handler. You have to patch openpyxl as well, dropping WMF is hardcoded, see OpenPyXL - find_images
answer
Pillow supports wmf image file:
https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#wmf-emf
By default, PIL.open() will load the image at 72 dpi.
So, if you want to modify excel file with wmf image(s),
what you need to do is convert wmf image to supported formats (e.g. "png")
when loading image(s) in the _import_image(img) function of
"C:\Users\%username%\AppData\Local\Programs\Python\Python310\Lib\site-packages\openpyxl\drawing\image.py":
def _import_image(img):
if not PILImage:
raise ImportError('You must install Pillow to fetch image objects')
if not isinstance(img, PILImage.Image):
img = PILImage.open(img)
# This is the part you have to add (Start)
try:
if (img.format.lower() == "wmf"):
fp = BytesIO()
img.save(fp, format="png")
img = PILImage.open(fp)
except:
None
# This is the part you have to add (End)
return img
This will allows you to modify the excel file with wmf image(s).
But note that the quality of the image might be visibly different compared to original image(s), since image format conversion takes place.
Related
I am using Keras OCR and PyTesseract and was wondering if it is possible to use PDF files as the image input.
If not, does anyone have a suggestion as to how to convert a very massive PDF file into PNG or another acceptable format?
Thank you!
No, as far as I know PyTesseract works only with images. You'll need to convert your pdf to images first.
By "very massive PDF" I'm assuming you mean a pdf with lots of pages. This is not an issue. You can use pdf2image library (see the docs here). The method convert_from_path has an output_folder argument that lets you specify the folder where all your generated images will be saved:
Output directory for the generated files, should be seen more as a
“working directory” than an output folder. The converted images will
be written there to save system memory.
You can later use them one by one instead of your pdf to work with PyTesseract. If you don't assign the returned list of images from convert_from_path you don't risk filling up your memory.
Otherwise, if you are willing to keep everything in memory you can use the returned pages directly, like so:
pages = convert_from_path(pdf_path)
for example, my code :
Python : 3.9
Macos: BigSur
from PIL import Image
from fonctions_images import *
from pdf2image import convert_from_path
path='/Users/yves/documents_1/'
fichier =path+'TOUTOU.pdf'
images = convert_from_path(fichier,500, transparent=True,grayscale=True,poppler_path='/usr/local/Cellar/poppler/21.12.0/bin')
for v in range(0,len(images)):
image=images[v]
image.save(path+"image.png", format="png")
test=path+"image.png"
img = cv2.imread(test) # to store image in memory
img = del_lines(path,img) # to supprime the lines
img = cv2.imread(path+"img_final_bin_1.png")
pytesseract.pytesseract.tesseract_cmd = "/usr/local/bin/tesseract"
d=pytesseract.image_to_data(img[3820:4050,2340:4000], lang='fra',config=custom_config,output_type='data.frame')
In my current condition, I can open an Image normally using a really short code like this
from PIL import Image
x = Image.open("Example.png")
x.show()
But I tried to use GIF format instead of png, It shows the file but it didn't load the frame of the GIF. Is there any possible way to make load it?
In My Current Code
from PIL import Image
a = Image.open("x.gif").convert("RGBA") # IF I don't convert it to RGBA, It will give me an error.
a.show()
Refer to Reading Sequences in the documentation:
from PIL import Image
with Image.open("animation.gif") as im:
im.seek(1) # skip to the second frame
try:
while 1:
im.seek(im.tell() + 1)
# do something to im
except EOFError:
pass # end of sequence
I'm getting some unwanted rotation when loading images using PIL. I'm loading image samples and their binary mask, so this is causing issues. I'm attempting to convert the code to use openCV instead, but this is proving sticky. I haven't seen any arguments in the documentation under Image.load(), but I'm hoping there's a workaround I just haven't found...
There is, but I haven't written it all up. Basically, if you load an image with EXIF "Orientation" field set, you can get that parameter.
First, a quick test using this image from the PIL GitHub source Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg and run jhead on it you can see the EXIF orientation is 6:
jhead /Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg
File name : /Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg
File size : 4951 bytes
File date : 2020:04:24 14:00:09
Resolution : 128 x 128
Orientation : rotate 90 <--- see here
JPEG Quality : 75
Now do that in PIL:
from PIL import Image
# Load that image
im = Image.open('/Users/mark/StackOverflow/PillowBuild/Pillow-7.1.2/Tests/images/hopper_orientation_6.jpg')
# Get all EXIF data
e = im.getexif()
# Specifically get orientation
e.get(0x0112)
# prints 6
Now click on the source and you can work out how your image has been rotated and undo it.
Or, you could be completely unprofessional ;-) and create a function called SneakilyRemoveOrientationWhileNooneIsLooking(filename) and shell out (subprocess) to exiftool and remove the orientation with:
exiftool -Orientation= image.jpg
Author's "much simpler solution" detailed in above comment is misleading so I just wanna clear that up.
Pillow does not automatically apply EXIF orientation transformation when reading an image. However, it has a method to do so: PIL.ImageOps.exif_transpose(image)
OpenCV automatically applies EXIF orientation when reading an image. You can disable this behavior by using the IMREAD_IGNORE_ORIENTATION flag.
I believe the author's true intention was to apply the EXIF orientation rather than ignore it, which is exactly what his solution accomplished.
I opening an image file using the pillow(PIL) library and saving it again under a different name. But when I save the image under the different name it takes my original 300 DPI file and makes it a 72 DPI file. I tried adding dpi=(300, 300) But still no success.
See code
from PIL import Image
image = Image.open('image-1.jpg')
image.save('image-2.jpg' , dpi=(300, 300))
My original file(image-1.jpg)
https://www.dropbox.com/s/x7xj6hyoemv3t94/image_info_1.jpg?raw=1
My copied file(image-2.jpg)
https://www.dropbox.com/s/dpcnkfozefobopn/image_info_2.jpg?raw=1
Notice how they still have the same image size: 8.45.
Thanks to #HansHirse explaining that the meta data was missing AKA exif information I saved the image with the exif info and it worked
from PIL import Image
image = Image.open('image-1.jpg')
exif = image.info['exif']
image.save('image-2.jpg' , exif=exif)
I write a python 3 CLI tool to fix creation dates of photos in a library (see here.
I use Pillow to load and save the image and piexif to handle exif data retrieval/modification.
The problem I have is that I only want to change the EXIF data in the pictures and not recompress the whole image. It seems that Pillow save can't do that.
My question is:
Any better exif library I could use to only play with the exif data (so far I tried py3exiv2, pexif and piexif) ?
If not, is there a way to indicate to Pillow to only change the exif of the image without recompressing when saving ?
Thanks !
Here is the code I use to change the creation date so far:
# Get original exif data
try:
exif_dict = piexif.load(obj.path)
except (KeyError, piexif._exceptions.InvalidImageDataError):
logger.debug('No exif data for {}'.format(obj.path))
return
# Change creation date in exif_dict
date = obj.decided_stamp.strftime('%Y:%m:%d %H:%M:%S').encode('ascii')
try:
exif_dict['Exif'][EXIF_TAKE_TIME_ORIG] = date
except (KeyError, piexif._exceptions.InvalidImageDataError):
return
exif_bytes = piexif.dump(exif_dict)
# Save new exif
im = Image.open(obj.path)
im.save(obj.path, 'jpeg', exif=exif_bytes)
In your case, I think that no need to use Pillow.
exif_bytes = piexif.dump(exif_dict)
piexif.insert(exif_bytes, obj.path)