How to load images from Google Cloud Storage with keras.preprocessing - keras

I am writing machine learning code that can be trained locally or in the cloud. I am using keras.preprocessing to load images, which under the hood uses PIL. It works fine for local files, but understandably doesn't understand Google Cloud Storage paths, like "gs://...".
from keras.preprocessing import image
image.load_img("gs://myapp-some-bucket/123.png")
Gives this error:
.../lib/python2.7/site-packages/keras/preprocessing/image.py", line 320, in load_img img = pil_image.open(path) File
.../lib/python2.7/site-packages/PIL/Image.py", line 2530, in open fp = builtins.open(filename, "rb") IOError: [Errno 2] No such file or directory: 'gs://myapp-some-bucket/123.png'
What is the correct way of doing this? I ultimately need a folder of images to be a single numpy array (images decoded and grayscale).

Found a replacement for keras.preprocessing.image.load_img, that understands GCS. I also included more code to read the whole folder, and turn every image in the folder into a single numpy array for training...
import os
import tensorflow as tf
from tensorflow.python.platform import gfile
filelist = gfile.ListDirectory("gs://myapp-some-bucket")
sess = tf.Session()
with sess.as_default():
x = np.array([np.array(tf.image.decode_png(tf.read_file(os.path.join(train_files_dir, filename))).eval()) for filename in filelist])

Load image:
image_path = 'gs://xxxxxxx.jpg'
image = tf.read_file(image_path)
image = tf.image.decode_jpeg(image)
image_array = sess.run(image)
Save image:
job_dir = 'gs://xxxxxxxx'
image = tf.image.encode_jpeg(image_array)
file_name = 'xxx.jpg'
write_op = tf.write_file(os.path.join(job_dir, file_name), image)
sess.run(write_op)

Related

OSError: Unable to open file (file signature not found)

I am currently doing an assignment on deep learning by downloading the assignment files from github.
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
%matplotlib inline
You are given a dataset ("data.h5") containing: - a training set of m_train images labeled as cat (y=1) or non-cat (y=0) - a test set of m_test images labeled as cat or non-cat - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
I ran the setup.sh file too but the error doesn't seem to go away.
lr_utils.py file:
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
Kindly help!
I solved the issue by downloading uncorrupted .h5 files and putting them in the folder datasets/ in the same directory.
The files you downloaded are corrupted. You can visit https://github.com/abdur75648/Deep-Learning-Specialization-Coursera to download the uncorrupted files.
you can download uncorrupted files from here :
https://www.kaggle.com/datasets/muhammeddalkran/catvnoncat
and replace it in the directory of the corrupted files

Google colab stuck on downloading Fashion_Mnist dataset

I have tried running my code from google collab for the fashion using this code but it is stuck on downloading the code. I also switched between the hardware accelerators but still nothing. Is there any workaround to this problem?
For Google Colab
At the top write !pip install mnist. Use import mnist.
Then simply store the images and labels:
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
That's it!!!
You can download it from the github repository.
Put the downloaded files (from the readme links) in a directory in your current path called data/fashion/, then you can use their loader.
def load_mnist(path, kind='train'):
import os
import gzip
import numpy as np
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte.gz'
% kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte.gz'
% kind)
with gzip.open(labels_path, 'rb') as lbpath:
labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
offset=8)
with gzip.open(images_path, 'rb') as imgpath:
images = np.frombuffer(imgpath.read(), dtype=np.uint8,
offset=16).reshape(len(labels), 784)
return images, labels
X_train, y_train = load_mnist('data/fashion', kind='train')
X_test, y_test = load_mnist('data/fashion', kind='t10k')
The other option would be to use the torchvision FMNIST dataset.
Edit
You can also use:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/fashion', source_url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/')
Edit 2
Here is the code for downloading the files (it can be improved with some try-catch):
import os
import requests
path = 'data/fashion'
def download_fmnist(path):
DEFAULT_SOURCE_URL = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/'
files = dict(
TRAIN_IMAGES='train-images-idx3-ubyte.gz',
TRAIN_LABELS='train-labels-idx1-ubyte.gz',
TEST_IMAGES='t10k-images-idx3-ubyte.gz',
TEST_LABELS='t10k-labels-idx1-ubyte.gz')
if not os.path.exists(path):
os.mkdir(path)
for f in files:
filepath = os.path.join(path, files[f])
if not os.path.exists(filepath):
url = DEFAULT_SOURCE_URL + files[f]
r = requests.get(url, allow_redirects=True)
open(filepath, 'wb').write(r.content)
print('Successfully downloaded', f)
download_fmnist(path)
The command keras.datasets.fashion_mnist.load_data() returns a tuple of numpy arrays: (xtrain, ytrain) and (xtest, ytest).
The dataset won't be downloaded to your local storage this way. This is why the command cd fashion-mnist/ raises an error. There was no directory created. The fashion-mnist dataset was loaded correctly into (xtrain, ytrain) and (xtest, ytest) in your code.

Decompress nifti medical image in gz format using python

I want to decompress a butch of nii.gz files in python so that they could be processed in sitk later on. When I decompress a single file manually by right-clicking the file and choosing 'Extract..', this file is then correctly interpreted by sitk (I do sitk.ReadImage(unzipped)). But when I try to decompress it in python using following code:
with gzip.open(segmentation_zipped, "rb") as f:
bindata = f.read()
segmentation_unzipped = os.path.join(segmentation_zipped.replace(".gz", ""))
with gzip.open(segmentation_unzipped, "wb") as f:
f.write(bindata)
I get error when sitk tries to read the file:
RuntimeError: Exception thrown in SimpleITK ReadImage: C:\d\VS14-Win64-pkg\SimpleITK\Code\IO\src\sitkImageReaderBase.cxx:82:
sitk::ERROR: Unable to determine ImageIO reader for "E:\BraTS19_2013_10_1_seg.nii"
Also when trying to do it a little differently:
input = gzip.GzipFile(segmentation_zipped, 'rb')
s = input.read()
input.close()
segmentation_unzipped = os.path.join(segmentation_zipped.replace(".gz", ""))
output = open(segmentation_unzipped, 'wb')
output.write(s)
output.close()
I get:
RuntimeError: Exception thrown in SimpleITK ReadImage: C:\d\VS14-Win64-pkg\SimpleITK-build\ITK\Modules\IO\PNG\src\itkPNGImageIO.cxx:101:
itk::ERROR: PNGImageIO(0000022E3AF2C0C0): PNGImageIO failed to read header for file:
Reason: fread read only 0 instead of 8
can anyone help?
No need to unzip the Nifti images, libraries such as Nibabel can handle it without decompression.
#==================================
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
#==================================
# load image (4D) [X,Y,Z_slice,time]
nii_img = nib.load('path_to_file.nii.gz')
nii_data = nii_img.get_fdata()
fig, ax = plt.subplots(number_of_frames, number_of_slices,constrained_layout=True)
fig.canvas.set_window_title('4D Nifti Image')
fig.suptitle('4D_Nifti 10 slices 30 time Frames', fontsize=16)
#-------------------------------------------------------------------------------
mng = plt.get_current_fig_manager()
mng.full_screen_toggle()
for slice in range(number_of_slices):
# if your data in 4D, otherwise remove this loop
for frame in range(number_of_frames):
ax[frame, slice].imshow(nii_data[:,:,slice,frame],cmap='gray', interpolation=None)
ax[frame, slice].set_title("layer {} / frame {}".format(slice, frame))
ax[frame, slice].axis('off')
plt.show()
Or you can Use SimpleITK as following:
import SimpleITK as sitk
import numpy as np
# A path to a T1-weighted brain .nii image:
t1_fn = 'path_to_file.nii'
# Read the .nii image containing the volume with SimpleITK:
sitk_t1 = sitk.ReadImage(t1_fn)
# and access the numpy array:
t1 = sitk.GetArrayFromImage(sitk_t1)

Load datasets and store it in another file using opencv

How can I read all images from datasets and store it in another location using opencv.
You can use glob to read the files in a folder.
import glob
import cv2
for file in glob.glob('source/*.png'):
img = cv2.imread(file)
filename = 'destination/'+file.split('source\\')[1]
cv2.imwrite(filename, img)
Split function of python can be used to obtain the image-name which is then written to the destination folder.
NOTE- If the folders are not in the current working directory please specify the absolute path. For more on absolute and relative paths refer here.
import os
import cv2
SOURCE_FOLDER = "a"
DESTINATION_FOLDER = "b"
for image_file_name in os.listdir(SOURCE_FOLDER):
# get full path to image file
image_path = os.path.join(SOURCE_FOLDER, image_file_name)
# read image
img = cv2.imread(image_path)
# store image in another folder
image_write_path = os.path.join(DESTINATION_FOLDER, image_file_name)
cv2.imwrite(image_write_path, img)

Writing figure to Google Cloud Storage instead of local drive

I want to upload the figure which is made with matplotlib to GCS.
Current code:
from tensorflow.gfile import MakeDirs, Open
import numpy as np
import matplotlib.pyplot as plt
import datetime
_LOGDIR = "{date:%Y%m%d-%H%M%S}".format(date=datetime.datetime.now())
_PATH_LOGDIR = 'gs://{0}/logs/{1}'.format('skin_cancer_mnist', _LOGDIR)
MakeDirs(_PATH_LOGDIR))
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("{0}/accuracy_loss_graph.png".format(path_logdir))
plt.close()
saving_figure(_PATH_LOGDIR)
"/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/backends/backend_agg.py", line 512, in print_png
filename_or_obj = open(filename_or_obj, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: 'gs://skin_cancer_mnist/logs/20190116-195604/accuracy_loss_graph.png'
(The directory exists, I checked)
I could change the source code of matplotlib to use the Open method of tf.Gfile.Open, but there should be a better option...
Joans 2nd Option didn't work for me, I found a solution that worked for me:
from google.cloud import storage
import io
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png')
blob.upload_from_file(buf, content_type='image/png', rewind=True)
You cannot directly upload a file to Google Cloud Storage using the python open function (which is the one that matplotlib.pyplot.savefig is using behind the curtains).
Instead, you should use the Cloud Storage Client Library for Python. Check this documentation for details on how this library is used. This will allow you to manipulate files and upload/download them to GCS, among other things.
You will have to import this library in order to use it, you can install it by running pip install google-cloud-storage and import it as from google.cloud import storage.
As well, since the plt.figure is an object, and not the actual .png image that you want to upload, you cannot directly upload it to Google Cloud Storage either.
However you can do either one of the following:
Option 1: Save the image locally, and then upload it to Google Cloud Storage:
Using your code:
from google.cloud import storage
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig.savefig("your_local_path/accuracy_loss_graph.png".format(path_logdir))
plt.close()
# init GCS client and upload file
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_filename(filename="your_local_path/accuracy_loss_graph.png")
Option 2: Save the image result from the figure to a variable, then upload it to GCS as a string (of bytes):
I have found the following StackOverflow answer that seems to save the figure image into a .png byte string, however I haven't tried it myself.
Again, based in your code:
from google.cloud import storage
import io
import urllib, base64
def saving_figure(path_logdir):
data = np.arange(0, 21, 2)
fig = plt.figure(figsize=(20, 10))
plt.plot(data)
fig_to_upload = plt.gcf()
# Save figure image to a bytes buffer
buf = io.BytesIO()
fig_to_upload.savefig(buf, format='png')
buf.seek(0)
image_as_a_string = base64.b64encode(buf.read())
# init GCS client and upload buffer contents
client = storage.Client()
bucket = client.get_bucket('skin_cancer_mnist')
blob = bucket.blob('logs/20190116-195604/accuracy_loss_graph.png') # This defines the path where the file will be stored in the bucket
your_file_contents = blob.upload_from_string(image_as_a_string, content_type='image/png')
Edit: Both options assume that the environment you are running the script from, has the Cloud SDK installed, and a Google Cloud authenticated account activated (if you haven't, you can check this documentation that explains how to do it).

Resources