Python-pptx image file volumn - jpeg

I try to make pptx file to python.
Main task is load many picture(.tif .jpg) in pptx.
Here is code
prs = Presentation()
current_slide = prs.slides.add_slide(prs.slide_layouts[0])
pic = current_slide.shapes.add_picture(img_path, Inches(0),Inches(1),Inches(10),Inches(5))
prs.save(self.output_dir + '/' + self.ppt_txt.get() + '.pptx')
The volume of original image is 1.2GB
And pptx file's volume is 1GB made by python.
but the volume of pptx created by drag and drop image is 10MB.
What makes this difference?
How do i reduce the volume in the code?

Related

running a pillow alpha_composite on two files results in a huge huge file

I have two files I open via these lines:
image0 = Image.open(infile0).convert("RGBA")
image1 = Image.open(infile1).convert("RGBA").resize(image0.size)
image0 is a 8MB jpg.
image1 is a tiny 4.3K file that is an overlay to the first one, so it is resized to be the same size as image0.
After I then run the alpha_composite command:
result = Image.alpha_composite(image0, image1)
result.save(outfile)
Where outfile ends up being a huge 23-43MB PNG file. This seems way to large (I also do this once a minute and create time lapses using FFMPEG which seem to take too long because of the large size of the input files). I want the file to be as detailed as possible, I would like it to be a tad smaller and more manageable in size.
I had some success using this command:
result2 = result.resize((round(result.size[0]*0.20), round(result.size[1]*0.20)))
then doing a result2.save(thumbnail_path), though that file size does seem the quality suffered some, so not sure if I am doing it correctly or if there is a better way in Pillow?

Why is my function not working in my Python script on the first file?

I have a Python script that walks through folders and if it finds one that is a TIFF image that it calls my function and converts it to a JPEG and resizes it. It works to create the JPEG but it doesn't resize the first image it finds in the folders. After the first image the next one is converted properly. The code looks fine and nothing I am seeing in the function is an issue. Can anyone offer assistance on the reason?
Here is my main script:
for root, subdirs, files in os.walk(src):
#Looping through all of the files in the Approved folders
for file in files:
if file != 'Thumbs.db' and file != '.DS_Store':
if minutesOld >= minutes: # <-- We'll only copy the file if it's five minutes old or older.
if file.lower().endswith('.tif'): # <-- If file is a TIFF file and there are no errors yet
try:
image_convert(filepath,'.jpg','RGB',2500,2500)
except:
error
Here is my function code:
def image_convert(filepath,imageType,colorMode,height,width):
imwrite(filepath[:-4] + imageType, imread(filepath)[:,:,:3].copy()) # <-- using the imagecodecs library function of imread, make a copy in memory of the TIFF File.
# The :3 on the end of the numpy array is stripping the alpha channel from the TIFF file if it has one so it can be easily converted to a JPEG file.
# Once the copy is made the imwrite function is creating a JPEG file from the TIFF file.
# The [:-4] is stripping off the .tif extension from the file and the + '.jpg' is adding the .jpg extension to the newly created JPEG file.
img = Image.open(filepath[:-4] + imageType) # <-- Using the Image.open function from the Pillow library, we are getting the newly created JPEG file and opening it.
img = img.convert(colorMode) # <-- Using the convert function we are making sure to convert the JPEG file to RGB color mode.
img = img.resize((height, width)) # <-- Using the resize function we are resizing the JPEG to 2500 x 2500
return(img)
The reason this wasn't working correctly is because it was missing the line of code to save the resized image. This is the revised code that works:
def image_convert(filepath,imageType,colorMode,height,width):
imwrite(filepath[:-4] + imageType, imread(filepath)[:,:,:3].copy()) # <-- using the imagecodecs library function of imread, make a copy in memory of the TIFF File.
# The :3 on the end of the numpy array is stripping the alpha channel from the TIFF file if it has one so it can be easily converted to a JPEG file.
# Once the copy is made the imwrite function is creating a JPEG file from the TIFF file.
# The [:-4] is stripping off the .tif extension from the file and the + '.jpg' is adding the .jpg extension to the newly created JPEG file.
img = Image.open(filepath[:-4] + imageType) # <-- Using the Image.open function from the Pillow library, we are getting the newly created JPEG file and opening it.
img = img.convert(colorMode) # <-- Using the convert function we are making sure to convert the JPEG file to RGB color mode.
imageResize = img.resize((height, width)) # <-- Using the resize function we are resizing the JPEG to 2500 x 2500
imageResize.save(filepath[:-4] + imageType) # <-- Using the save function, we are saving the newly sized JPEG file over the original JPEG file initially created.
return(imageResize)

How do I save my files at 300 dpi using Pillow(PIL)?

I opening an image file using the pillow(PIL) library and saving it again under a different name. But when I save the image under the different name it takes my original 300 DPI file and makes it a 72 DPI file. I tried adding dpi=(300, 300) But still no success.
See code
from PIL import Image
image = Image.open('image-1.jpg')
image.save('image-2.jpg' , dpi=(300, 300))
My original file(image-1.jpg)
https://www.dropbox.com/s/x7xj6hyoemv3t94/image_info_1.jpg?raw=1
My copied file(image-2.jpg)
https://www.dropbox.com/s/dpcnkfozefobopn/image_info_2.jpg?raw=1
Notice how they still have the same image size: 8.45.
Thanks to #HansHirse explaining that the meta data was missing AKA exif information I saved the image with the exif info and it worked
from PIL import Image
image = Image.open('image-1.jpg')
exif = image.info['exif']
image.save('image-2.jpg' , exif=exif)

Google Colab is so slow while reading images from Google Drive

I have my own dataset for a deep learning project. I uploaded that into Google Drive and linked it to a Colab page. But Colab could read only 2-3 images in a second, where my computer can dozens of them. (I used imread to read images.)
There is no speed problem with model compiling process of keras, but only with reading images from Google Drive. Does anybody know a solution? Someone suffered of this problem too, but it's still unsolved: Google Colab very slow reading data (images) from Google Drive (I know this is kind of a duplication of the question in the link, but I reposted it because it is still unsolved. I hope this is not a violation of Stack Overflow rules.)
Edit: The code piece that I use for reading images:
def getDataset(path, classes, pixel=32, rate=0.8):
X = []
Y = []
i = 0
# getting images:
for root, _, files in os.walk(path):
for file in files:
imagePath = os.path.join(root, file)
className = os.path.basename(root)
try:
image = Image.open(imagePath)
image = np.asarray(image)
image = np.array(Image.fromarray(image.astype('uint8')).resize((pixel, pixel)))
image = image if len(image.shape) == 3 else color.gray2rgb(image)
X.append(image)
Y.append(classes[className])
except:
print(file, "could not be opened")
X = np.asarray(X, dtype=np.float32)
Y = np.asarray(Y, dtype=np.int16).reshape(1, -1)
return shuffleDataset(X, Y, rate)
I'd like to provide a more detailed answer about what unzipping the files actually looks like. This is the best way to speed up reading data because unzipping the file into the VM disk is SO much faster than reading each file individually from Drive.
Let's say you have the desired images or data in your local machine in a folder Data. Compress Data to get Data.zip and upload it to Drive.
Now, mount your drive and run the following command:
!unzip "/content/drive/My Drive/path/to/Data.Zip" -d "/content"
Simply amend all your image paths to go through /content/Data, and reading your images will be much much faster.
I recommend you to upload your file to GitHub then clone it to Colab. It can reduce my training time from 1 hour to 3 minutes.
Upload zip files to the drive. After transferring to colab unzip them. File copy overhead is cumbersome therefore you shouldn't copy masses of files instead copy a single zip and unzip.

Why a writer could not for the specified extension in function in OpenCV?

So, I wrote a small program to clone masks for 30k+ images. Every image and masks sample there are converted to jpeg format. However, when I'm trying to start a program it creates some masks and then stops, throwing an error written in the title. It is quiet strange that OpenCV can't create jpeg image, since it is a default format for it.
Question is: how to make OpenCV actually save those newly created masks?
Here is the code:
folders = ["162", "204", "260", "1093", "3297", "5020", "10066", "10870", "10917", "11160", "11331", "17218", "19106", "19306", "19388"]
for folder in folders:
print(folder)
names = os.listdir(folder)
os.chdir("%s/masks"%folder)
image = cv2.imread("%s.jpeg"%folder)
for name in names:
print(name)
cv2.imwrite(img=image, filename=name)

Resources