Image Regression: Width of Squares - keras

I have a dataset with lots of pictures and each of these pictures shows me a rectangle with a certain width. My task now is to automatically detect the width of these rectangles by image recognition, and I have trained a CNN for an image regression like in the code below.
However, this CNN gives me very bad values, i.e. mses in the range of 4,000,000 and also a very imprecise estimation of the actual widths. During my experiments I even used the training data set as test data set for the time being, but even here the CNN doesn't seem to learn anything useful.
Do you have an idea what I could be doing wrong? Is it possible that I somehow distort the images themselves while reading them in?
I'm rather new to Machine Learning, so I'm happy about every input you give me! :-)
This is the model:
def create_model():
model = Sequential()
model.add(Convolution2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation="relu"))
model.add(Dense(1))
model.compile(loss="mse", optimizer="adam")
return model
And this is the training code:
classifier = create_model()
// Getting image id and its corresponding square width
data = pd.read_csv('../data/data.csv')
id_width = data[['id', 'width']]
// Training the model
train_datagen = ImageDataGenerator()
training_set = train_datagen.flow_from_dataframe(dataframe=id_width, directory='../data/images',
x_col="id", y_col="width", has_ext=True,
class_mode="raw", target_size=(64, 64),
batch_size=32)
classifier.fit_generator(
training_set,
epochs=50,
validation_data=training_set)

Related

GradCam applied to video sequence classification with TimeDistributed CNN and LSTM

after days working on it, I have found any reasonable way of doing, so here I am.
I have a network that aims to predict the next video class, given the features of the current one. Each video is composed by 30 frames. The idea is to apply a feature extraction method to each input, then feed into an LSTM + Dense layer to make prediction.
Here the code:
video = Input(shape=(30,299,299,3))
inc = InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
cnn_out = GlobalAveragePooling2D()(inc.output)
cnn = Model(inputs=inc.input, outputs=cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(128, activation='relu', return_sequences=False, kernel_initializer=he_uniform(), bias_initializer='zeros', dropout=0.5)(encoded_frames)
hidden_layer = Dense(1024, activation='relu', kernel_initializer=he_uniform(), bias_initializer='zeros')(encoded_sequence)
outputs = Dense(4, activation="softmax", kernel_initializer=glorot_normal(), bias_initializer='zeros')(hidden_layer)
model = Model(inputs=[video], outputs=outputs)
adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
I would like to visualize the feature activations at the CNN stage for each image. So if I look at the saliency map for each input image I can understand which features are more importante than others to make this kind of prediction.
All the examples on internet are facing with just one CNN and one input image, is there any way of doing this?
Any help is really appreciated, thanks!

Error in compiling CNN LSTM neural network

I am trying to create a neural network with keras that will have as input N multivariate time-series and as target output N time-series. I converted the time-series to a supervised problem with the window or lag method. As input I have a 4D matrix (samples, variables, sequence, lag) and as output a 2D matrix (samples, sequence). I found similar examples that used CNN+LSTM models, but I have difficulties applying them. In case it helps, I have train_X, train_y, test_X, test_y with dimensions (112, 5, 7998, 2) (112, 7998) (29, 5, 7998, 2) (29, 7998)
I have tried applying and removing the TimeDistributed Keras wrapper to the CNN part only and the whole model. The relevant part of the code is bellow.
model = Sequential()
model.add(TimeDistributed(Conv2D(filters=32, kernel_size=(1, 80), activation='relu', padding='same', input_shape=(train_X.shape[1], train_X.shape[2], train_X.shape[3]))))
model.add(TimeDistributed(MaxPool2D(pool_size=(1, 2),strides=1)))
model.add(TimeDistributed(Dropout(0.5)))
model.add(TimeDistributed(Flatten()))
model.add(TimeDistributed(LSTM(100, return_sequences=True)))
model.add(TimeDistributed(Dropout(0.2)))
model.add(TimeDistributed(Dense(units=1)))
model.compile(loss='mean_squared_error', optimizer='adam')
I get an index error.
IndexError: list index out of range

Why is the validation accuracy constant at 20%?

I am trying to implement a 5 class animal classifier using Keras. I am building the CNN from scratch and the weird thing is, the validation accuracy stays constant at 0.20 for all epochs. Any idea why this is happening? The dataset folder contains train, test and validation folders. And each of the folders contains 5 folders corresponding to the 5 classes. What am I doing wrong?
I have tried multiple optimizer but the problem persists. I have included the code sample below.
import warnings
warnings.filterwarnings("ignore")
#First convolution layer
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Second convolution layer
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
#Flatten the outputs of the convolution layer into a 1D contigious array
model.add(Flatten())
#Add a fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add another fully connected layer containing 256 neurons
model.add(Dense(256, activation='relu',kernel_initializer='he_normal'))
model.add(BatchNormalization())
#Add the ouput layer containing 5 neurons, because we have 5 categories
model.add(Dense(5, activation='softmax',kernel_initializer='glorot_uniform'))
optim=RMSprop(lr=1e-6)
model.compile(loss='categorical_crossentropy',optimizer=optim,metrics=['accuracy'])
model.summary()
#We will use the below code snippet for rescaling the images to 0-1 for all the train and test images
train_datagen = ImageDataGenerator(rescale=1./255)
#We won't augment the test data. We will just use ImageDataGenerator to rescale the images.
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
validation_generator = test_datagen.flow_from_directory(validation_data_dir,
classes=['frog', 'giraffe', 'horse', 'tiger','dog'],
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
hist=History()
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
callbacks=[hist])
model.save('models/basic_cnn_from_scratch_model.h5') #Save the model weights #Load using: model = load_model('cnn_from_scratch_weights.h5') from keras.models import load_model
print("Time taken to train the baseline model from scratch: ",datetime.now()-global_start)
Check the following for your data:
Shuffle the training data well (I see shuffle=False everywhere)
Properly normalize all data (I see you are doing rescale=1./255, maybe okay)
Proper train/val split (you seem to be doing that too)
Suggestions for your model:
Use multiple Conv2D layers followed by a final Dense. That's what works best for image classification problems. You can also look at popular architectures that are tried and tested; e.g. AlexNet
Can change the optimizer to Adam and try with different learning rates
Have a look at your training and validation loss graphs and see if they look as expected
Also, I guess you corrected the shape of the 2nd Conv2D layer as mentioned in the comments.
It looks as if your output is always the same animal, thus you have a 20% accuracy. I highly recommend you to check your testing outputs to see if they are all the same.
Also you said that you were building a CNN but in the code snipet you posted I see only dense layers, it is going to be hard for a dense architecture to do this task, and it is very small. What is the size of your pictures?
Hope it helps!
The models seems to be working now. I have removed shuffle=False attribute. Corrected the input shape for the 2nd convolution layer. Changed the optimizer to adam. I have reached a validation accuracy of almost 94%. However, I have not yet tested the model on unseen data. There is a bit of overfitting in the model. I will have to use some aggressive dropouts to reduce them. Thanks!

Keras multi-class prediction only returning 1 prediction with softmax and categorical_crossentropy

I'm using Keras and Tensorflow to train a model that predicts a matching font based on an image of some letters. My folder contains data with a separate folder with each image of the letter in varying forms. My code for training the model looks like this:
LETTER_IMAGES_FOLDER = "datasets"
MODEL_FILENAME = "fonts_model.hdf5"
MODEL_LABELS_FILENAME = "model_labels.dat"
data = pd.read_csv('annotations.csv')
paths = list(data['Path'].values)
Y = list(data['Font'].values)
encoder = LabelEncoder()
encoder.fit(Y)
Y = encoder.transform(Y)
Y = np_utils.to_categorical(Y)
data = []
# loop over the input images
for image_file in paths:
# Load the image and convert it to grayscale
image = cv2.imread(image_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Add a third channel dimension to the image to make Keras happy
image = np.expand_dims(image, axis=2)
# Add the letter image and it's label to our training data
data.append(image)
data = np.array(data, dtype="float") / 255.0
train_x, test_x, train_y, test_y = model_selection.train_test_split(data,Y,test_size = 0.1, random_state = 0)
# Save the mapping from labels to one-hot encodings.
# We'll need this later when we use the model to decode what it's predictions mean
with open(MODEL_LABELS_FILENAME, "wb") as f:
pickle.dump(encoder, f)
# Build the neural network!
model = Sequential()
# First convolutional layer with max pooling
model.add(Conv2D(20, (5, 5), padding="same", input_shape=(100, 100, 1), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
# Second convolutional layer with max pooling
model.add(Conv2D(50, (5, 5), padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(500, activation="relu"))
print (len(encoder.classes_))
model.add(Dense(len(encoder.classes_), activation="softmax"))
# Ask Keras to build the TensorFlow model behind the scenes
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# Train the neural network
model.fit(train_x, train_y, validation_data=(test_x, test_y), batch_size=32, epochs=2, verbose=1)
# Save the trained model to disk
model.save(MODEL_FILENAME)
Once the model has been created I'm predicting with it as follows:
predictions = model.predict(letter_image)
print (predictions) # this has the length of 1
The problem is that "predictions" is always an array of size 1 and I'm not sure why. I'm using softmax, categorical_crossentropy and my Dense value is greater than 1 in the last layer. Could someone please tell me why I'm not getting the top n predictions here?
I've also tried sigmoid with binary_crossentropy but get the same result. I think there's something more to it that I'm missing.

How to deal with increasing loss and low validation accuracy in training with VGGNet?

I am trying to create a model to predict the art style of painting. To do so I am using the dataset that Kaggle provides for their competition named Painter by Numbers. Though there are 137 art styles in the dataset I am using only three of them. Those three styles are namely - Impressionism, Expressionism, and Surrealism. I have taken 3000 images from each class to train the model. Moreover, I have used 300 images from each class totaling 900 images to validate the training.
I have planned to use pre-trained VGGNet as the bottom layer of my model. I have trained the model on Google Colab. Now the issue is, as the model started to learn the loss is ever increasing and validation accuracy is near .33 which is not pleasant. Random guessing will also give this accuracy.
I created a model with a base layer of pre-trained VGGNet. I added some fully connected layers with 1024 neurons in the first two layers, 512 neurons in the third layer and 3 neurons in the last layer. Optimizer I used was SGD with a learning rate of 0.01, decay 1e-6, momentum 0.9. My loss function is "categorical_crossentropy". Moreover, the input image shape was (100,100,3).
For training, I declared samples per epoch as 100. The number of the epoch was 30. Below I have provided all the codes.
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
input = Input(shape=(100,100,3), name='image_input')
output_vgg16_conv = model_vgg16_conv(input)
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense( 1024, activation='relu', name='fc1')(x)
x = Dense( 1024, activation='relu', name='fc2')(x)
x = Dense( 512, activation='relu', name='fc3')(x)
x = Dense( 3, activation='softmax', name='predictions')(x)
my_model = Model(input=input, output=x)
sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
my_model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.1, zoom_range=0.2, horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory(train_root, target_size=(100,100), batch_size=32, class_mode='categorical')
test_set = test_datagen.flow_from_directory(test_root, target_size=(100,100), batch_size=32, class_mode='categorical')
my_model.fit_generator(training_set, samples_per_epoch=100, nb_epoch=30, validation_data=test_set, nb_val_samples=300)
This generates low validation accuracy and ever-increasing loss value. Even the loss value is raised to 10. And moreover, this produces a low validation accuracy. What to do to get the situation better?

Resources