os.listdir reads images randomly making bounding box training difficult - keras

os.listdir(path) command reads images randomly from a folder. I have saved a csv file with bouding box information for the images in folder sequentially. I assumed os.listdir would read the images sequentially so that my csv file also can be read sequentiallu during the training.
I have tried sorted(os.listdir) but no use. I could not find any other functions or code to read the images sequentially from a folder. I named the images as frame1.jpg, frame2.jpg etc.
PATH = os.getcwd()
# Define data path
data_path = PATH + '/frames'
data_dir_list = sorted(os.listdir(data_path))
print(data_dir_list)
img_data_list=[]
for dataset in (data_dir_list):
img_list=sorted(os.listdir(data_path+'/'+ dataset))
print ('Loaded the images of dataset-'+'{}\n'.format(dataset))
for img in sorted(img_list):
input_img=cv2.imread(data_path + '/'+ dataset + '/'+ img )
input_img=cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
input_img1=input_img
#input_img_resize=cv2.resize(input_img,(512,512))
img_data_list.append(input_img1)
img_data = np.array(img_data_list)
img_data = img_data.astype('float32')
img_data /= 255

As per Python docs, os.listdir() returns filenames in an arbitrary order. It just maps to an underlying operating system call that will return filenames in whatever order is presumably most efficient based on filesystem design.
It's just a standard list of strings, so sorted() would work in the way you're using it. Are the sequential numbers in your filenames correctly padded for this to work with the more than 10 images you're presumably using? What's the random order you're seeing from sorted(os.listdir(...))?

Related

Magick convert through subprocess, Converting tiff images to pdf increases the size by 20 times

I tried using density but it didn't help. The original TIFF image is 459 kB but when it gets converted to PDF the size changes to 8446 KB.
commands = ['magick', 'convert']
commands.extend(waiting_list["images"][2:])
commands.append('-adjoin')
commands.append(combinedFormPathOutput)
process = Popen(commands, stdout=PIPE, stderr=PIPE, shell=True)
process.communicate()
https://drive.google.com/file/d/14V3vKRcyyEx1U23nVC13DDyxGAYOpH-6/view?usp=sharing
Its not teh above code but the below PIL code which is causing the image to increase
images = []
filepath = 'Multi_document_Tiff.tiff'
image = Image.open(filepath)
if filepath.endswith('.tiff'):
imagepath = filepath.replace('.tiff', '.pdf')
for i, page in enumerate(ImageSequence.Iterator(image)):
page = page.convert("RGB")
images.append(page)
if len(images) == 1:
images[0].save(imagepath)
else:
images[0].save(imagepath, save_all=True, append_images=images[1:])
image.close()
When I run
convert Multi_document_Tiff.tiff -adjoin Multi_document.pdf
I get a 473881 bytes PDF that contains the 10 pages of the TIFF. If I run
convert Multi_document_Tiff.tiff Multi_document_Tiff.tiff Multi_document_Tiff.tiff -adjoin Multi_document.pdf
I get a 1420906 bytes PDF that contains 30 pages (three copies of your TIFF).
So obviously if you pass several input files to IM it will coalesce them in the output file.
You code does:
commands.extend(waiting_list["images"][2:])
So it seems it is passing a list of files to IM, and the output should be the accumulation of all these files, which can be a lot bigger that the size of the first file.
So:
did you check the content of the output PDF?
did you check the list of files which is actually passed?

Automate cropping with Pillow and python?

So I have a folder with 500+ images that need to be cropped. And I have searched and have manage to create this cut-and-paste script. But, for some reason it doesn't save the new image!? The terminal is just still, no errors no nothing.
from PIL import Image # import the Python Image processing Library
import os # To read the folder
directory_in_str = "/Users/hora/Downloads/Etik"
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".png"):
image = os.path.join(directory_in_str, filename)
imageObject = Image.open(image) # Create an Image object from an Image
cropped = imageObject.crop((1025,85,2340,2040)) # Crop the iceberg portion (top left x, top left y, bottom right x, bottom right y)
cropped.save("{}".format(filename+"_cropped"), 'png') # Save the cropped portion
continue
else:
continue
Im searching in a specific folder, and the cropped image should be saved with a filename_cropped.png. But not necessary, I have backups if something should go side-ways.
The expected result:
Loop through a folder
Crop all images ending with .png
And save the crop image with the previous filename but with extension
FILNAME_cropped.png
Done
Two issues regarding this line:
cropped.save("{}".format(filename+"_cropped"), 'png')
Your filename still contains the file extension.
You don't add a (new) file extension yourself.
Both issues result in some string xxx.png_cropped for your new file.
My suggestion to modify your code:
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".png"):
image = os.path.join(directory_in_str, filename)
filename, file_extension = os.path.splitext(filename) # <-- After reading, filename can be overwritten
imageObject = Image.open(image)
cropped = imageObject.crop((1025,85,2340,2040))
cropped.save("{}".format(filename+"_cropped.png"), 'png') # <-- Explicitly add .png to your filename
continue
else:
continue
Hope that helps!
add the directory path in saving the file.
directory_in_str = "/Users/hora/Downloads/Etik/"
cropped.save("{}".format(directory_in_str+filename+"_cropped"),'png')

How to change where the output directory where the new images go

I have a small query I'm hoping someone can help me out within Python 3. I am resizing a dataset of 10000 images to all be 1000x1000 in dimension before I do any pytorch analysis with it. I just wanted to ask how I change my code to save the outgoing images to a new folder I have created ''train_resized'' instead of the same folder as the original files as it is doing now when I run. Thanks
# Testing dataset
from PIL import Image
import os, sys
path = (r'G:\My Drive\CATSVDOGS2.0\test1\\')
dirs = os.listdir( path )
def resize():
for item in dirs:
if os.path.isfile(path+item):
im = Image.open(path+item)
f, e = os.path.splitext(path+item)
imResize = im.resize((1000,1000), Image.ANTIALIAS)
imResize.save(f + ' resized.jpg', 'JPEG', quality=90)
resize()
In your line
imResize.save(f + ' resized.jpg', 'JPEG', quality=90)
you're setting the path when using the variable f, as f uses the path variable you defined. A quick way to set the path is to do something like:
imResize.save('G:\\My Drive\\Path\\To\\Folder\\' + item + ' resized.jpg', 'JPEG', quality=90)
of course specify the path to be whatever you want. Untested as I don't have Python installed on my work machine, but that is the general gist.

Read numpy data from GZip file over the network

I am attempting to download the MNIST dataset and decode it without writing it to disk (mostly for fun).
request_stream = urlopen('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz')
zip_file = GzipFile(fileobj=request_stream, mode='rb')
with zip_file as fd:
magic, numberOfItems = struct.unpack('>ii', fd.read(8))
rows, cols = struct.unpack('>II', fd.read(8))
images = np.fromfile(fd, dtype='uint8') # < here be dragons
images = images.reshape((numberOfItems, rows, cols))
return images
This code fails with OSError: obtaining file position failed, an error that seems to be ungoogleable. What could the problem be?
The problem seems to be, that what gzip and similar modules provide, aren't real file objects (unsurprisingly), but numpy attempts to read through the actual FILE* pointer, so this cannot work.
If it's ok to read the entire file into memory (which it might not be), then this can be worked around by reading all non-header information into a bytearray and deserializing from that:
rows, cols = struct.unpack('>II', fd.read(8))
b = bytearray(fd.read())
images = np.frombuffer(b, dtype='uint8')
images = images.reshape((numberOfItems, rows, cols))
return images

processing multiple images in sequence in opencv python

I am trying to build the code using python, for which I need to process at least 50 images. So how should I read the images one by one and process it. Is it possible using a loop and do i need to create a separate database for this or just saving all the images in separate file will do?
I have written some code may statisfy your requirement.
import glob
import os,sys
import cv2
## Get all the png image in the PATH_TO_IMAGES
imgnames = sorted(glob.glob("/PATH_TO_IMAGES/*.png"))
for imgname in imgnames:
## Your core processing code
res = propress(imgname)
## rename and write back to the disk
#name, ext = os.path.splitext(imgname)
#imgname2 = name+"_res"+ext
imgname2 = "_res".join(os.path.splitext(imgname))
cv2.imwrite(imgname2, res)
The task consists of following steps,
Having the images in a directory e.g. foo/
Getting the list of all images in the foo/ directory
Lopp over the list of images
3.1. img = cv2.imread(images(i),0)
3.2. ProcessImage(img) #Run an arbitrary function on the image
3.3. filename = 'test' + str(i) +'.png'
3.4. cv2.imwrite(filename, img)
End of the loop

Resources