Loading Image data uploaded on drive in numpy array on Google Colab - python-3.x

I am working on a deep learning project (image segmentation) and decided to move my work to google colab. I uploaded the notebook and the data then used the following code to mount the drive
from google.colab import drive
drive.mount('/content/mydrive')
The data is in the format of two folders; one containing the images (input data, in .jpg format) and the other contains their masks (ground Truth, in .png format) each 2600 image. I use the following code to load them.
filelist_trainx = sorted(glob.glob('drive/My Drive/Data/Trainx/*.jpg'), key=numericalSort)
X_train = np.array([np.array(Image.open(fname)) for fname in filelist_trainx])
filelist_trainy = sorted(glob.glob('drive/My Drive/Data/Trainy/*.png'), key=numericalSort)
Y_train = np.array([np.array(Image.open(fname)) for fname in filelist_trainy])
When loading the X_train, it does not take any time ,but when running the Y_train it takes so long and I end interrupting the execution of the cell. Anyone knows why does this happen? considering that both of the files contains data of the same dimension and low storage -18 MB total-. here is a sample of the images.
Data sample

Related

How to download part of open_images_v4

Well, in short, open_images_v4 is over 600 GB of data. I want to create a Machine Learning model that would detect balloons. I have most of the code ready but I don't have the Dataset. I want to use open_images_v4 but it is too heavy. Because I use google colab I have limited storage. I don't need all of the 600 GB of data, I need only the balloon images & labels.
I tried to come up with a way to get only the balloons
Here is the code:
filters = {
'label_filter': '/m/01b6bk'
}
dataset, info = tfds.load('open_images_v4', with_info=True, as_supervised=True,
download_and_prepare_kwargs=filters)
But it returns an error TypeError: download_and_prepare() got an unexpected keyword argument 'label_filter'
Can I download only the balloons so it wont take that much space?

Google Colab Pro runs out of ram(35GB) while reading a WSI tiff file using Open Slide

I'm trying to pre-process WSI images using PyTorch. I have stored the WSI images in google drive and mounted it to my google colab pro account(which is having 35GB ram) and trying to read them. To read the images, I'm using openslides. Images were in .tiff format.
wsi = openslide.open_slide(path)
The WSI images were around the average size of 100k x 100k pixels and were taken from miccai2020 pathology challenge. When I run the code, the colab runs out of ram and restarts the session. As I noticed the ram runs out while openslide is trying to read the WSI image from the given path. Can anyone identify the issue and give me a solution?

Flow huge amount of images from memory to Keras generator

I am trying to train keras model with very large number of images and labels. I want to use the model.fit_generator and somehow flow the input images and labels from memory because we prepare all the data in memory, after the image is loaded. The problem is that we have plenty of large images that we then clip into smaller size and provide it like that to the model. We need a for loop inside a While loop.
Something like this:
While True:
for file in files: #lets say that there are 500 files (images)
image = ReadImage (file)
X = prepareImage(image) # here it is cut and prepared in specific shape
Y = labels
yield X[batch_start:batch_end],Y[batch_start:batch_end]
After it yields the last batch for the first image we need to load the next image in the for loop, prepare the data and yield again in the same epoch. For the second epoch we need again all the images. The problem here is that we prepare everything in memory, from 1 image we create millions of training data and then move to next image. We cannot write all the data to the disk and flow_from_directory, since it would require plenty of disk space.
Any hint?

How to increase resolution of output Images using tensor flow object detection API?

I have trained my own model using tensorflow (https://github.com/tensorflow/models/tree/master/research/object_detection) to identify objects in images. I am testing this model using Google object detection API
My question is the way Google coded the ipython notebook is to output image which has size 200 kb to 300 kb output size, the link to this ipythonnotebook(https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb.)
How do I output images with orignal size (which is 15MB) (I am running this code on my local machine). Ive tried changing Helper Code session of the notebook it didnt work. Anything that I am missing here?
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
In the detection part of ipython notebook
I changed image size to
IMAGE_SIZE = (120, 80)
It did the trick

Python saving 13 bit depth .mat data as TIFF converts to 16 bit

I am importing depth data from Kinect V2 saved as .MAT files using scipy.io.loadmat into my python 3.5 code. When I print out the .MAT data I get an uint16 array with values ranging from 0 - 8192. This is expected as the Kinect V2 gives a 13 bit depth image data. Now, when I save this as a TIFF file using
cv2.imwrite('depth_mat.tif' , depth_arr) and read it using
depth_im = tifffile.imread('depth_mat.tif').The range of values are scaled up. In my original .MAT file the maximum value is 7995 and after saving and reading the .TIFF file the maximum value becomes 63728. This throws off my calculations for mapping Kinect Depth to actual distance in real world. Any insight about this would help me a lot.
I have to do some image processing in between, hence it is necessary to preserve the original values. Also instead of using cv2.imwrite(),if tifffile.imsave() is used to save the .MAT file, the image is entirely dark.
I am using python 3.5 on a Win 64 machine

Resources