How To Import The MNIST Dataset From Local Directory Using PyTorch - python-3.x

I am writing a code of a well-known problem MNIST database of handwritten digits in PyTorch. I downloaded the train and testing dataset (from the main website) including the labeled dataset. The dataset format is t10k-images-idx3-ubyte.gz and after extract t10k-images-idx3-ubyte. My dataset folder looks like
MINST
Data
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
Now, I wrote a code to load data like bellow
def load_dataset():
data_path = "/home/MNIST/Data/"
xy_trainPT = torchvision.datasets.ImageFolder(
root=data_path, transform=torchvision.transforms.ToTensor()
)
train_loader = torch.utils.data.DataLoader(
xy_trainPT, batch_size=64, num_workers=0, shuffle=True
)
return train_loader
My code is showing Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp
How can I solve this problem and I also want to check that my images are loaded (just a figure contains the first 5 images) from the dataset?

Read this Extract images from .idx3-ubyte file or GZIP via Python
Update
You can import data using this format
xy_trainPT = torchvision.datasets.MNIST(
root="~/Handwritten_Deep_L/",
train=True,
download=True,
transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()]),
)
Now, what is happening at download=True first your code will check at the root directory (your given path) contains any datasets or not.
If no then datasets will be downloaded from the web.
If yes this path already contains a dataset then your code will work using the existing dataset and will not download from the internet.
You can check, first give a path without any dataset (data will be downloaded from the internet), and then give another path which already contains dataset data will not be downloaded.

Welcome to stackoverflow !
The MNIST dataset is not stored as images, but in a binary format (as indicated by the ubyte extension). Therefore, ImageFolderis not the type dataset you want. Instead, you will need to use the MNIST dataset class. It could even download the data if you had not done it already :)
This is a dataset class, so just instantiate with the proper root path, then put it as the parameter of your dataloader and everything should work just fine.
If you want to check the images, just use the getmethod of the dataloader, and save the result as a png file (you may need to convert the tensor to a numpy array first).

Related

Spliting an image dataset

I have an image dataset (folder of jpg images) .I would like to split it : 70 % for train and 30% for test randomly.
So I write this simple script:
from sklearn.model_selection import train_test_split
path = ".\dataset"
output_split=train_test_split(path,path,test_size=0.2)
But I don't find anything in folder "output_split"
So where I store output of spliting (train and test)?
I recommend using tf.dataset in your project. You can reference this link here for documentation:
https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory
And this link for a example of the function use:
https://keras.io/api/preprocessing/image/
Please remember to use the same seed for both training and (testing/validation).
Please note that no changes will happen to your files in your local.

Unable to save keras model in databricks

I am saving keras model
model.save('model.h5')
in databricks, but model is not saving,
I have also tried saving in /tmp/model.h5 as mentioned here
but model is not saving.
The saving cell executes but when I load model it shows no model.h5 file is available.
when I do this dbfs_model_path = 'dbfs:/FileStore/models/model.h5' dbutils.fs.cp('file:/tmp/model.h5', dbfs_model_path)
OR try loading model
tf.keras.models.load_model("file:/tmp/model.h5")
I get error message java.io.FileNotFoundException: File file:/tmp/model.h5 does not exist
The problem is that Keras is designed to work only with local files, so it doesn't understand URIs, such as dbfs:/, or file:/. So you need to use local paths for saving & loading operations, and then copy files to/from DBFS (unfortunately /dbfs doesn't play well with Keras because of the way it works).
The following code works just fine. Note that dbfs:/ or file:/ are used only in the calls to the dbutils.fs commands - Keras stuff uses the names of local files.
create model & save locally as /tmp/model-full.h5:
from tensorflow.keras.applications import InceptionV3
model = InceptionV3(weights="imagenet")
model.save('/tmp/model-full.h5')
copy data to DBFS as dbfs:/tmp/model-full.h5 and check it:
dbutils.fs.cp("file:/tmp/model-full.h5", "dbfs:/tmp/model-full.h5")
display(dbutils.fs.ls("/tmp/model-full.h5"))
copy file from DBFS as /tmp/model-full2.h5 & load it:
dbutils.fs.cp("dbfs:/tmp/model-full.h5", "file:/tmp/model-full2.h5")
from tensorflow import keras
model2 = keras.models.load_model("/tmp/model-full2.h5")

where could I find training.pt / test.pt

In the Pytorch docs for MNIST I read:
root (string): Root directory of dataset where MNIST/processed/training.pt
and MNIST/processed/test.pt exist.
Where could I find these two files traing.pt, test.pt? And what are their format?
Assuming pytorch 1.x+, The constructor of torchvision.datasets.MNIST follows this signature:
torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)
The easiest way to get the dataset is to set download=True, that way it will automatically download and store training.pt and test.pt. Assuming a local install, it will by default store them somewhere like .local/lib/python3.6/site-packages/torchvision/, although you don't have to worry about that.

Tensorflow object detection API tfrecord

Im new to the tensorflow TFRecord. so Im studying Tensorflow object detection API codes
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md
but I can`t find the codes that load tfrecord.
I think they use .config file to load tfrecord because I found this in config file.
tf_record_input_reader {
input_path: "/path/to/train_dataset.record-?????-of-00010"
}
Anyone can help?
Have you converted your dataset to TFRecord format yet?
If so, you should have a path which contains your training dataset sharded to a few record files, with the format
<path_to_training_data>/<train_dataset>.record-?????-of-xxxxx
Where <path_to_training_data> is the abovementioned path to your training dataset, <train_dataset> is the file name you gave to each file, xxxxx is the number of record files created (e.g. 00010), and ????? should be left as is, and is used as a format to all record files.
Once you've replaced <path_to_training_data>, <train_dataset> and xxxxx to the correct values of your dataset, TF OD API should handle everything else (finding all records, reading them, etc.).
Note there's usually tf_record_input_reader for both training dataset and eval dataset (validation/test), and each should have the corresponding above-mentioned values (path, dataset name, number of files).

external dataset learning in python for machine learning

Hi I want classify a dataset using naivebayesclassifier.For that I want to use external dataset which i have downloaded from google.this dataset contains a two folder for positive reviews and negative reviews.Each folder contains 1000 .txt files.How to import this file in my code as a train dataset in python.I am new to machine learning so I have very less idea about that.Please help me out.
You can use os.listdir, from (https://docs.python.org/2/library/os.html), e.g.:
import os
fileList = os.listdir('train_directory')
for file in fileList:
# add content of file to dataset.

Resources