To train a model using Keras, should I load all the images I have to an array to create something like
x_train, y_train
Or is there a better way to read the images on the fly while training. I am not looking for ImageDataGenerator class since my output is an array of points not classes based on directory names..
I managed to get my data csv file to contain the array of points and image file name in 9 columns as follows:
x1 x2 ..... x8 Image_file_name
You can use this data with ImageDataGenerator. You incorrectly assume that it needs folders for classes, but that only applies to flow_from_directory. The method flow_from_dataframe allows you to load data from a Pandas dataframe, from where you can load your data, for example:
idg = ImageDataGenerator(...)
df = pd.load_csv('your_data.csv')
generator = idf.flow_from_dataframe(directory='image folder', x_col = 'filename_column',
y_col = ['col1', 'col2', ..., 'coln'],
class_mode='other')
This generator will data from the dataframe, load the image filename in directory as specified by the value of x_col, and use the corresponding row to build the targets, which in this case will be a numpy array of the values of columns in y_col. More information about this method can be found in the keras documentation.
Loading the entire data set in memory in an array is not a great idea because the memory consumption could go out of control, so you should use a generator. ImageDataGenerator and flow_from_dataframe are a great way of loading images in Keras. Since you don't want to use ImageDataGenerator(can you mention why?) you can create your own generator function that loads chunks of images in memory. If you load your data in a generator make sure you use fit_generator and predict_generator functions.
To load unlabeled data you can do the following hack:
datagen = ImageDataGenerator()
test_data = datagen.flow_from_directory('.', classes=['directory_where_images_are_stored'])
For more information check out link [1].
[1] https://kylewbanks.com/blog/loading-unlabeled-images-with-imagedatagenerator-flowfromdirectory-keras
Related
I am importing MNIST dataset as train_data_MNIST = torchvision.datasets.MNIST(root=path+"MNIST", train=True,transform=transforms, download=True)and I am trying to make a smaller dataset from MNIST, let's say the first 10,000 images and corresponding labels. I know this can be handled with torch.utils.data.Subset. But what I want is a torchvision.datasets object (if I directly apply torch.utils.data.Subset to the train_data_MNIST that I list above, the result is an object from torch.utils.data.Subset class).
Is there any possible way such that I can use a fraction of the original MNIST dataset to create a new dataset (not subset)?
Thanks in advance.
What about modifying data and targets directly? For example:
dataset = torchvision.datasets.MNIST(root=path+"MNIST", train=True,transform=transforms, download=True)
dataset.data = dataset.data[:10000]
dataset.targets = dataset.targets[:10000]
from tensorflow.keras.preprocessing.image import ImageDataGenerator
With image data generator's flow_from_directory method can we reshape images also.
e.g. we have color images in 10 classes in 10 folders and we are providing path of that directory let's say train:
gen = ImageDataGenerator(rescale=1./255, width_shift_range=0.05, height_shift_range=0.05)
train_imgs= gen .flow_from_directory(
'/content/data/train',
target_size=(10,10),
batch_size=1,
class_mode='categorical')
Now my model is taking input shape 300. And I want to define training data from this train_imgs that is images of 10X10X3.
Is there any library, method or option available to convert this data generator to matrix in which columns are each image vector?
Generally the best option in these cases is to add a Reshape layer to the start of your model: layers.Reshape((300), input_shape=(10,10,3)). You can also do layers.Reshape((-1), input_shape=(10,10,3)), and it will automatically figure out the correct output length.
I have a huge dataset. My usual approach when I deal with such datset is I split into multiple tiny datasets using numpy archives and use a generator to deal with them. Are there any other alternatives to this? I also wanted to incorporate random run time Image augumentations with Keras Image preprocessing module which also is a generator type function. How do I stream line these two generator processes?
The link for the Keras Image augmentation module is below.
https://keras.io/preprocessing/image/
My current data flow generator is as follows:
def dat_loader(path, batch_size):
while True:
for dir, subdir, files in os.walk(path):
for file in files:
file_path = path + file
archive = np.load(file_path)
img = archive['images']
truth = archive['truth']
del archive
num_batches = len(truth)//batch_size
img = np.array_split(img, num_batches)
truth = np.array_split(truth, num_batches)
while truth:
batch_img = img.pop()
batch_truth = truth.pop()
yield batch_img, batch_truth
One way for handling really large datasets is to use memory mapped files that dynamically load required data at runtime. NumPy has memmap that creates an array that maps to a large file which can be massive (I once had one for a pre-processed version of offline Wikipedia and it was okay) but doesn't nesserarily live in your RAM. Any changes get flushed back to the file when needed or when the object is garbage collected. Here is an example:
import numpy as np
# Create or load a memory mapped array, can contain your huge dataset
nrows, ncols = 1000000, 100
f = np.memmap('memmapped.dat', dtype=np.float32,
mode='w+', shape=(nrows, ncols))
# Use it like a normal array but it will be slower as it might
# access the disk along the way.
for i in range(ncols):
f[:, i] = np.random.rand(nrows)
from the online tutorial. Note this is just a potential solution, for your dataset and usage there might be better alternatives.
I was actually looking through the "load_data()" function in python that returns X_train, X_test, Y_train and Y_test as in this link. As you see it is for CIFAR10 and CIFAR100 dataset, that returns the above mentioned values as uint8 array.
I wanted to know is there some other function like this for loading datasets in our system locally ?
If so please help me with its usage and if not please suggest me some other alternative.
Thanks in advance.
load_data() is not a part of python but rather is defined in keras.datasets.cifar10 module. To load cifar dataset (or any other dataset), there might be many methods depending upon how the dataset in packaged/formatted. Usually, the module pandas can be used for loading/saving/manipulating table-like data.
For cifar data, here is another example: loading an image from cifar-10 dataset
Here the author is using the pickle module to unpack the dataset and then PIL and numpy modules to load and manipulate indivdual images.
I am trying to perform classification on a satellite image using libSVM library. What I want is to display the classified image and save it,not only to get the accuracy results on my terminal. I have extracted the pixel values from teh training datasets (shown below) and I used the script csv2libsvm (https://github.com/zygmuntz/phraug/blob/master/csv2libsvm.py) to bring my data in the right format for libsvm. There are 4 different classes in the image to be classified. My satellite image and the training data are displayed below.
fig 1: Image to be classified with training data.
The steps I followed are based on the following tutorial https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.
split training and testing data (70% training and 30% testing).
svm - subset.py datasets 12000 training.tr testing.te
Train the model
svm-train training.tr
make prediction
svm-predict testing.te training.tr.model classification output
The accuracy of this classification was 95%, which was great.
What I am really interested in now, is to displayed the classified image. So, I need to construct my classified image out of the csv file which containes the classified labels. This is where the problem comes and do not know how to do it. What I have done (and did not work) is the following:
I imported the csv file produced by lbSVM into python using csv module.
I tried to reshape the csv file into the shape of my image
My code is shown below:
with open('/home/io/Desktop/training/1TESTING/libSVM_classification/classification_results', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
results = []
for row in reader:
results.append(row)
results_arr = np.asarray(results, dtype=int)
predicted = results_arr[results_arr>0] #here is my predicted labels
#load the image to be classified and read the projection system
raster_dataset = gdal.Open(sea_ice, gdal.GA_ReadOnly)
geo_transform = raster_dataset.GetGeoTransform()
proj = raster_dataset.GetProjectionRef()
#loop over all bands of the image and append them.
bands_data = []
for b in range(1, raster_dataset.RasterCount+1):
band = raster_dataset.GetRasterBand(b)
bands_data.append(band.ReadAsArray())
bands_data = np.dstack(bands_data)
row, col, n_bands = bands_data.shape
#get the classified labels from libsvm and reshape them into the initial image
#in order to display it using matplotlib
class_prediction = predicted.reshape(bands_data[:, :, 0].shape)
The size of the image to be classified is(303 x 498) and the size of the predicted classes produced by libsvm is 1807. Hence, the error I get when try to reshape the libsvm results i get the following error.
ValueError: cannot reshape array of size 1807 into shape (303,498)
This error makes sense. I have 1907 rows and try to reshape it to match my initial image, which is obviously impossible.
So, How can I display my classified image? I achieved an accuracy of 95% but have not found a way of seeing the classification results. I though that libsvm might have an option for exporting the classification results into tiff but it does not.
I would appreciate any help, thoughs or tips