Keras flow_from_directory class index - keras

I used to make it manually, but i am using now flow_from_directory to train my network with my own data. I just have one question. When i make model.predict(), how can i know that my index 0 on predictions is for label category dog and index 1 is for category cats?
The code i am using is the following.
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_images_path,
target_size=(64, 64),
batch_size=batch_size)
validation_generator = test_datagen.flow_from_directory(
validate_images_path,
target_size=(64, 64),
batch_size=batch_size)
early_stopping = keras.callbacks.EarlyStopping(monitor='val_acc', min_delta=0, patience=3, verbose=1, mode='auto')
history = model.fit_generator(
train_generator,
steps_per_epoch=1700,
epochs=epochs,
verbose=1,
callbacks=[early_stopping],
validation_data=validation_generator,
validation_steps=196
)
What i wanted to know is the pair images vs ground truth label.
Thank you

You can have the the index of each class generated by the generator with class_indices property.
print(validation_generator.class_indices)
Simple...

When you gather data, you define that. There is no rule. But a simple way to check is:
see what your first training image is, look at it yourself: is it a cat or a dog?
then see the training Y (result/class/desired output), is it [0,1] or [1,0]?
This will answer your question.
For getting one sample from a generator, you can see this question: How to get one value from a generator in Python?
As defined in Keras documentation, the generator output is a tuple of (inputs, targets)

Its pretty simple. When you pre-process your data, just replace the class labels with some specific integers (you can call it id). So, when you compute the loss or accuracy from the model's output, just compare the prediction with the ground truth in terms of integer labels (id).
In case if you need the label text, you can get it back from the id (integer).

Related

How to fit an autoencoder using flow_from_directory

I'm building out a basic autoencoder and using the keras documentation here as a guide: https://blog.keras.io/building-autoencoders-in-keras.html.
I'm getting stuck and switching it to be able to fit from a flow_from_directory object, here's the one I've set up:
data_gen = tf.keras.preprocessing.image.ImageDataGenerator()
train_generator = data_gen.flow_from_directory(
directory= 'train_images',
target_size=(28, 28),
color_mode="rgb",
batch_size=128,
shuffle=True,
seed=42,
class_mode=None,
)
I'm trying to fit the model (which is more or less the same as the one in the keras documentation using this code:
autoencoder.fit(train_generator, train_generator,
epochs=500,
shuffle=True)
However, the problem is that passing it in like this gives me this error:
`ValueError: `y` argument is not supported
I think maybe this is saying that I can't specify a y if my x comes from a flow_for_directory, which makes sense, but how can I specify labels to be the same the data-itself?
Putting the answer here in case it helps anyone else. As Frightera suggested in the comments, changing the class_mode to 'input' solved this problem:
data_gen = tf.keras.preprocessing.image.ImageDataGenerator()
train_generator = data_gen.flow_from_directory(
directory= 'train_images',
target_size=(28, 28),
color_mode="rgb",
batch_size=128,
shuffle=True,
seed=42,
class_mode='input',
)
autoencoder.fit(train_generator,
epochs=500,
shuffle=True)
Try modifying your fit function:
autoencoder.fit(train_generator,
epochs=500,
shuffle=True)
You were using train_generator twice, when you are using ImageDataGenerator.flow_from_directory function it returns a DirectoryIterator yielding tuples of (x, y) where x is a numpy array containing batches of data and y is a numpy array of corresponding labels.
Please refer to the Keras docs:
https://keras.io/api/preprocessing/image/#flowfromdirectory-method

Array mismatch in Keras classifier model output layer

I am designing a classifier which takes 10 values - signal (acquired by processing pixels of MNIST dataset, normalized 0-1) at the input, and outputs the class of digit. The 10 valued signal is unique for each digit and therefore classification can be performed.
num_classes=10
y_train=to_categorical(y_train,num_classes)
y_test=to_categorical(y_test,num_classes)
x_train=(60000,10,1,1)
y_train=(60000,10)
x_test=(10000,10,1,1)
y_test=(10000,10)
The code is given as
input_img = Input(shape=(10,1,1))
x = Flatten()(input_img)
x = Dense(100, activation='relu')(x)
x = Dense(100, activation='relu')(x)
decoded = Dense(10, activation='softmax')(x)
autoencoder=Model(input_img,decoded)
autoencoder.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])
history=autoencoder.fit(x_train, y_train,
epochs=30,
batch_size=32,
verbose=1,
shuffle=True,
validation_data=(x_test, y_test))
Please suggest what changes can be made.
I think you should probably use the tf.keras.losses.CategoricalCrossentropy loss function because you encoded your target to one hot vector using to_categorical. According to doc:
Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss.
However, IMO, without reproducible code, it's really ambitious to give a specific answer.

Is it the right way for data augumentation for training a model?

I'm new to keras and deep learning. I'have tried to use data augmentation for for training my model, but not sure if i'm doing it the right way. Can anyone assure me it my approach is correct? here is my code:
train_path = 'Digital_Mamo/OPTIMAM' # Relative Path
valid_path = 'Digital_Mamo/InBreast'
test_path = 'Digital_Mamo/BCDR'
valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(224, 224), classes=['Benign', 'Malignant'], batch_size=9)
test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224, 224), classes=['Benign', 'Malignant'], batch_size=7)
datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.1,
height_shift_range=0.1, shear_range=0.15, zoom_range=0.1,
channel_shift_range=10., horizontal_flip=True)
train_batches = datagen.flow_from_directory(
train_path,
target_size=(224, 224),
batch_size=10,
classes=['Benign','Malignant'])
vgg16_model= load_model('Fetched_VGG.h5')
# transform the model to Sequential
model= Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
model.summary()
# Freezing the layers (Oppose weights to be updated)
for layer in model.layers:
layer.trainable = False
model.add(Dense(2, activation='softmax', name='predictions'))
### Compile the model
model.compile(Adam(lr=.0001), loss='categorical_crossentropy', metrics=['accuracy'])
# train the model
model.fit_generator(train_batches, steps_per_epoch=28, validation_data=valid_batches, validation_steps=3, epochs=5, verbose=2)
#test
predictions = model.predict_generator(test_batches, steps=3, verbose=0)
Actually this is not correct, because the way you coded this, you are applying data augmentation to the validation and test sets, and you should only apply augmentation to the training set.
You would need to create a second instance of ImageDataGenerator for the validation and test sets, without any augmentations set.
It's a correct way, but you can use steps_per_epoch=len(train_batches) and validation_steps=len(val_batches) for easier life when you add more data though. Also you can just include test in valid since it won't help anything even you're using them separately.
Edit:
As #Matias point out that you shouldn't use augmentation on validate. So it's not totally wrong as I said in comment but not really correct.

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

Errors while fine tuning InceptionV3 in Keras

I am going to fine-tune InceptionV3 model using my self-defined dataset. Unfortunately, when using model.fit to train, here comes the error below:
ValueError: Error when checking target: expected dense_6 to have shape (4,) but got array with shape (1,)
Firstly, I load my own dataset as training_data which contains a pair of image and corresponding label. Then, I use the code below to convert them into specific array-type(img_new and label_new) so that it's compatible to Keras's inputs of both data and labels.
for img, label in training_data:
img_new[i,:,:,:] = img
label_new[i,:] = label
i=i+1
Second, I fine tune the Inception Model below.
InceptionV3_model=keras.applications.inception_v3.InceptionV3(include_top=False,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
#InceptionV3_model.summary()
# add a global spatial average pooling layer
x = InceptionV3_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 4 classes
predictions = Dense(4, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=InceptionV3_model.input, outputs=predictions)
# Transfer Learning
for layer in model.layers[:311]:
layer.trainable = False
for layer in model.layers[311:]:
layer.trainable = True
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.001, momentum=0.9), loss='categorical_crossentropy')
model.fit(x=X_train, y=y_train, batch_size=3, epochs=3, validation_split=0.2)
model.save_weights('first_try.h5')
Does anyone have ideas of what is wrong while training using model.fit?
Sincerely thanks for your kind help.
The error is caused because my labels r integers, I gotta compile it by sparse_categorical_crossentropy which is set for integer labels instead of categorical_crossentropy which is used for one-hot encoding.
Sincerely thank for the help by #Amir very much. :-)

Resources