How do I load EMNIST-letters dataset with fastai? - python-3.x

I've been following the fastai course on machine learning. Got up to lesson four and thought I'd use what I've learned to create a model that predicts hand-written letters. The code they used to load their training dataset is as follows:
pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")
This works when you have image files but the dataset files I have are
emnist-letters-train-images-idx3-ubyte
emnist-letters-train-labels-idx1-ubyte
emnist-letters-test-images-idx3-ubyte
emnist-letters-test-labels-idx1-ubyte
I could extract all the images from those files but is there a way I can load the ubyte files into my program? The files have the same format as the MNIST digits dataset.

Related

"Delete" classes from images so I can use them as unlabeled Dataset

I have a dataset that is already labeled with specific class names and it is saved on my Computer as:
Train Dataset :
-5_1
-5_2
-5_3
etc...
Where the subfolders(5_1, 5_2, etc.) are the classes of the images. I want to use semi-supervised training where both labeled and unlabeled images must be used. But I don’t know how to “erase” classes from my Dataset in order make them unlabeled and load them to my CNN.
For the labeled images I use datasets.ImageFolder() and DataLoader() so I can load them for training.
Thanks for the help!
PS1: I thought to save them in a different folder named as “Unlabeled” but I am sure that this is gonna use the name of the folder as a new class and this is something that it’s gonna ruin the predictions in training as well in testing
PS2: I must inform you that in this specific time I can't use any other pretrained dataset as CIFAR or MNIST where they already have unlabeled data.
I tried to create my own dataset as a new class but I am confused to the point where I must delete the classes.

How shall I make stream buffer dataset from many files for train on pytorch?

I want to train a CNN model by "pytorch" with several 'pickle' files and each file is about 5 gb size and each one looks like "{'features': 3D matrix, 'labels': a list}";
I can't load them all to train pytorch model so I want to load piece of them one by one, but I dont know how to do that.

Can I combine the results from a model.fit that are in a .h5 file

So I am using the keras module to create a facial recognition program but I have hit one problem, my computer can`t compute all the answers at once so I change the data to smaller amount and calculate each part until it is close to 100% accuracy. My data is constantly being trained with different data e.g. Happy face, Sad face and Confused face the code is then trained with this data then another set of data Angry face, Lonely face and Amazed face and the code is trained with this data. The two datasets are run at different times but both produce a h5 file with the data they have collected. How can I combine these two or more files into one singular file. I am guessing that the model may have to be retrained with the h5 files and then produce a singular 5h file but I do not know. Anyone know how to combine two trained models saved in h5 files?
The code below shows where I train and save the model before changing the data and rerunning the code.
model.fit(train_generator, steps_per_epoch=int(train / batch_size), epochs=epochs, callbacks=[checkpoint, lr_scheduler])

how to use Keras model to predict image?

I have done the train process and got the model with the .hdf5 format
the neural network that I use is the siamese convolutional neural network.
when validating, the predicted image is a random image from my test folder.
i use this when test
test_alphabets = glob('{}/TEST/*'.format(dataset_dirname))
testset={}
for alph in test_alphabets:
dirs = glob('{}/*'.format(alph))
alphabet = {}
for dirname in dirs:
alphabet[dirname] = glob('{}/*'.format(dirname))
testset[alph] = alphabet
then, display the result with
display_validation_test(siamese_model1, testset)
the result is like this
How do I do the test process by inputting the image I want, then displaying the appropriate image using the .h5 model earlier?
You first create your model (keras.Model or keras.Sequential instance) with the same architecture as the one you trained.
load the weights from .h5 file model.load_weights('your_weight_file.h5')
read your image(s). If a single image, make sure to add 1 as the batch dimension.
Call predict: prediction = model.predict(images)

Loading Training Images using Keras

To train a model using Keras, should I load all the images I have to an array to create something like
x_train, y_train
Or is there a better way to read the images on the fly while training. I am not looking for ImageDataGenerator class since my output is an array of points not classes based on directory names..
I managed to get my data csv file to contain the array of points and image file name in 9 columns as follows:
x1 x2 ..... x8 Image_file_name
You can use this data with ImageDataGenerator. You incorrectly assume that it needs folders for classes, but that only applies to flow_from_directory. The method flow_from_dataframe allows you to load data from a Pandas dataframe, from where you can load your data, for example:
idg = ImageDataGenerator(...)
df = pd.load_csv('your_data.csv')
generator = idf.flow_from_dataframe(directory='image folder', x_col = 'filename_column',
y_col = ['col1', 'col2', ..., 'coln'],
class_mode='other')
This generator will data from the dataframe, load the image filename in directory as specified by the value of x_col, and use the corresponding row to build the targets, which in this case will be a numpy array of the values of columns in y_col. More information about this method can be found in the keras documentation.
Loading the entire data set in memory in an array is not a great idea because the memory consumption could go out of control, so you should use a generator. ImageDataGenerator and flow_from_dataframe are a great way of loading images in Keras. Since you don't want to use ImageDataGenerator(can you mention why?) you can create your own generator function that loads chunks of images in memory. If you load your data in a generator make sure you use fit_generator and predict_generator functions.
To load unlabeled data you can do the following hack:
datagen = ImageDataGenerator()
test_data = datagen.flow_from_directory('.', classes=['directory_where_images_are_stored'])
For more information check out link [1].
[1] https://kylewbanks.com/blog/loading-unlabeled-images-with-imagedatagenerator-flowfromdirectory-keras

Resources