Google colab stuck on downloading Fashion_Mnist dataset - python-3.x

I have tried running my code from google collab for the fashion using this code but it is stuck on downloading the code. I also switched between the hardware accelerators but still nothing. Is there any workaround to this problem?

For Google Colab
At the top write !pip install mnist. Use import mnist.
Then simply store the images and labels:
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
That's it!!!

You can download it from the github repository.
Put the downloaded files (from the readme links) in a directory in your current path called data/fashion/, then you can use their loader.
def load_mnist(path, kind='train'):
import os
import gzip
import numpy as np
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte.gz'
% kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte.gz'
% kind)
with gzip.open(labels_path, 'rb') as lbpath:
labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
offset=8)
with gzip.open(images_path, 'rb') as imgpath:
images = np.frombuffer(imgpath.read(), dtype=np.uint8,
offset=16).reshape(len(labels), 784)
return images, labels
X_train, y_train = load_mnist('data/fashion', kind='train')
X_test, y_test = load_mnist('data/fashion', kind='t10k')
The other option would be to use the torchvision FMNIST dataset.
Edit
You can also use:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/fashion', source_url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/')
Edit 2
Here is the code for downloading the files (it can be improved with some try-catch):
import os
import requests
path = 'data/fashion'
def download_fmnist(path):
DEFAULT_SOURCE_URL = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/'
files = dict(
TRAIN_IMAGES='train-images-idx3-ubyte.gz',
TRAIN_LABELS='train-labels-idx1-ubyte.gz',
TEST_IMAGES='t10k-images-idx3-ubyte.gz',
TEST_LABELS='t10k-labels-idx1-ubyte.gz')
if not os.path.exists(path):
os.mkdir(path)
for f in files:
filepath = os.path.join(path, files[f])
if not os.path.exists(filepath):
url = DEFAULT_SOURCE_URL + files[f]
r = requests.get(url, allow_redirects=True)
open(filepath, 'wb').write(r.content)
print('Successfully downloaded', f)
download_fmnist(path)

The command keras.datasets.fashion_mnist.load_data() returns a tuple of numpy arrays: (xtrain, ytrain) and (xtest, ytest).
The dataset won't be downloaded to your local storage this way. This is why the command cd fashion-mnist/ raises an error. There was no directory created. The fashion-mnist dataset was loaded correctly into (xtrain, ytrain) and (xtest, ytest) in your code.

Related

OSError: Unable to open file (file signature not found)

I am currently doing an assignment on deep learning by downloading the assignment files from github.
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
%matplotlib inline
You are given a dataset ("data.h5") containing: - a training set of m_train images labeled as cat (y=1) or non-cat (y=0) - a test set of m_test images labeled as cat or non-cat - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
I ran the setup.sh file too but the error doesn't seem to go away.
lr_utils.py file:
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
Kindly help!
I solved the issue by downloading uncorrupted .h5 files and putting them in the folder datasets/ in the same directory.
The files you downloaded are corrupted. You can visit https://github.com/abdur75648/Deep-Learning-Specialization-Coursera to download the uncorrupted files.
you can download uncorrupted files from here :
https://www.kaggle.com/datasets/muhammeddalkran/catvnoncat
and replace it in the directory of the corrupted files

Decompress nifti medical image in gz format using python

I want to decompress a butch of nii.gz files in python so that they could be processed in sitk later on. When I decompress a single file manually by right-clicking the file and choosing 'Extract..', this file is then correctly interpreted by sitk (I do sitk.ReadImage(unzipped)). But when I try to decompress it in python using following code:
with gzip.open(segmentation_zipped, "rb") as f:
bindata = f.read()
segmentation_unzipped = os.path.join(segmentation_zipped.replace(".gz", ""))
with gzip.open(segmentation_unzipped, "wb") as f:
f.write(bindata)
I get error when sitk tries to read the file:
RuntimeError: Exception thrown in SimpleITK ReadImage: C:\d\VS14-Win64-pkg\SimpleITK\Code\IO\src\sitkImageReaderBase.cxx:82:
sitk::ERROR: Unable to determine ImageIO reader for "E:\BraTS19_2013_10_1_seg.nii"
Also when trying to do it a little differently:
input = gzip.GzipFile(segmentation_zipped, 'rb')
s = input.read()
input.close()
segmentation_unzipped = os.path.join(segmentation_zipped.replace(".gz", ""))
output = open(segmentation_unzipped, 'wb')
output.write(s)
output.close()
I get:
RuntimeError: Exception thrown in SimpleITK ReadImage: C:\d\VS14-Win64-pkg\SimpleITK-build\ITK\Modules\IO\PNG\src\itkPNGImageIO.cxx:101:
itk::ERROR: PNGImageIO(0000022E3AF2C0C0): PNGImageIO failed to read header for file:
Reason: fread read only 0 instead of 8
can anyone help?
No need to unzip the Nifti images, libraries such as Nibabel can handle it without decompression.
#==================================
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
#==================================
# load image (4D) [X,Y,Z_slice,time]
nii_img = nib.load('path_to_file.nii.gz')
nii_data = nii_img.get_fdata()
fig, ax = plt.subplots(number_of_frames, number_of_slices,constrained_layout=True)
fig.canvas.set_window_title('4D Nifti Image')
fig.suptitle('4D_Nifti 10 slices 30 time Frames', fontsize=16)
#-------------------------------------------------------------------------------
mng = plt.get_current_fig_manager()
mng.full_screen_toggle()
for slice in range(number_of_slices):
# if your data in 4D, otherwise remove this loop
for frame in range(number_of_frames):
ax[frame, slice].imshow(nii_data[:,:,slice,frame],cmap='gray', interpolation=None)
ax[frame, slice].set_title("layer {} / frame {}".format(slice, frame))
ax[frame, slice].axis('off')
plt.show()
Or you can Use SimpleITK as following:
import SimpleITK as sitk
import numpy as np
# A path to a T1-weighted brain .nii image:
t1_fn = 'path_to_file.nii'
# Read the .nii image containing the volume with SimpleITK:
sitk_t1 = sitk.ReadImage(t1_fn)
# and access the numpy array:
t1 = sitk.GetArrayFromImage(sitk_t1)

LightGBM: loading from json

I am trying to load a LightGBM.Booster from a JSON file pointer, and can't find an example online.
import json ,lightgbm
import numpy as np
X_train = np.arange(0, 200).reshape((100, 2))
y_train = np.tile([0, 1], 50)
tr_dataset = lightgbm.Dataset(X_train, label=y_train)
booster = lightgbm.train({}, train_set=tr_dataset)
model_json = booster.dump_model()
with open('model.json', 'w+') as f:
json.dump(model_json, f, indent=4)
with open('model.json') as f2:
model_json = json.load(f2)
How can I create a lightGBM booster from f2 or model_json? This snippet only shows dumping to JSON. model_from_string might help but seems to require an instance of the booster, which I won't have before loading.
There's no such method for creation of Booster directly from json. No such method in the source code or documentation, also there's no github issue.
Because of it, I just load models from a text file via
gbm.save_model('model.txt') # gbm is trained Booster instance
# ...
bst = lgb.Booster(model_file='model.txt')
or use pickle to dump and load models:
import pickle
pickle.dump(gbm, open('model.pkl', 'wb'))
# ...
gbm = pickle.load(open('model.pkl', 'rb'))
Unforunately, pickle files are unreadable (or, at least, this files are not so clear). But it's better than nothing.

Trying to generate a Keras model with my own data instead of cifar10

I have followed this example:
https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/
and had an issue with the following line(line #51):
((trainX, trainY), (testX, testY)) = cifar10.load_data()
as i would like to train it on my own data
is there any simple way to generate this kind of output without digging deep into cifar's implementation?
I am pretty sure it is something that people already did but i cannot find a sample/tutorial/example
Thanks..
Assume you have your images as .jpg format, and your labels as csv format called label.csv, and separated them into 2 folders, train folder and test folder.
Then you can do the following to get the x_train
import cv2 #library for reading images
import numpy as np
import glob #library for reading files in a folder
x_train= []
for file in glob.glob("train/*.jpg"):
im = cv2.imread(file) #reading each image from the folder
x_train.append(im)
x_train = np.array(x_train)
And you can do the following to get the y_train
import csv
y_train= []
with open('train/label.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
y_train.append([int(row[0])]) #converting the string to int (otherwise the csv data will be read as string)
y_train = np.array(y_train)
You can do the same for your test folder, just change the name of the parameters and arguments.

How to load images from Google Cloud Storage with keras.preprocessing

I am writing machine learning code that can be trained locally or in the cloud. I am using keras.preprocessing to load images, which under the hood uses PIL. It works fine for local files, but understandably doesn't understand Google Cloud Storage paths, like "gs://...".
from keras.preprocessing import image
image.load_img("gs://myapp-some-bucket/123.png")
Gives this error:
.../lib/python2.7/site-packages/keras/preprocessing/image.py", line 320, in load_img img = pil_image.open(path) File
.../lib/python2.7/site-packages/PIL/Image.py", line 2530, in open fp = builtins.open(filename, "rb") IOError: [Errno 2] No such file or directory: 'gs://myapp-some-bucket/123.png'
What is the correct way of doing this? I ultimately need a folder of images to be a single numpy array (images decoded and grayscale).
Found a replacement for keras.preprocessing.image.load_img, that understands GCS. I also included more code to read the whole folder, and turn every image in the folder into a single numpy array for training...
import os
import tensorflow as tf
from tensorflow.python.platform import gfile
filelist = gfile.ListDirectory("gs://myapp-some-bucket")
sess = tf.Session()
with sess.as_default():
x = np.array([np.array(tf.image.decode_png(tf.read_file(os.path.join(train_files_dir, filename))).eval()) for filename in filelist])
Load image:
image_path = 'gs://xxxxxxx.jpg'
image = tf.read_file(image_path)
image = tf.image.decode_jpeg(image)
image_array = sess.run(image)
Save image:
job_dir = 'gs://xxxxxxxx'
image = tf.image.encode_jpeg(image_array)
file_name = 'xxx.jpg'
write_op = tf.write_file(os.path.join(job_dir, file_name), image)
sess.run(write_op)

Resources