Related
I'm trying to build a CNN model in order to classify an image, but whenever the training is done and I try to feed it a single image (from the training dataset) it misclassifies this image always.
Please take a look at the code I wrote below.
Thank you in advance.
First, I declared an Image Data Generator for both my training and testing sets:
train_datagen = ImageDataGenerator(rescale = 1./255, rotation_range=20, horizontal_flip = True,
validation_split=0.3)
test_datagen = ImageDataGenerator(rescale = 1./255,validation_split=0.3)
Then, I used the flow_from_directory() function to load the images:
train_generator = train_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='training',
target_size = (224, 224),
class_mode = 'categorical'
)
test_generator = test_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='validation',
target_size = (224, 224),
class_mode = 'categorical'
)
I then loaded a pretrained model and added a few layers to build my model:
pretrained_model = VGG16(weights="imagenet", include_top=False,
input_tensor=input_shape)
pretrained_model.trainable = False
model = tf.keras.Sequential([
pretrained_model,
Flatten(name="flatten"),
Dense(3, activation="softmax")
])
I then trained the model :
INIT_LR = 3e-4
EPOCHS = 15
opt = Adam(lr=INIT_LR)
model.compile(loss="categorical_crossentropy", optimizer='Adam', metrics=["accuracy"])
H = model.fit(
train_generator,
validation_data=test_generator,
epochs=EPOCHS,
verbose= 1)
Then came the part to predict a single image:
I chose an image that was part of the training set, I even overfitted the model to make sure the predictions should be correct, but it was giving me wrong results for every image I input to the model.
I tried the following ways:
image = image.load_img(url,target_size = (224, 224))
img = tf.keras.preprocessing.image.img_to_array(image)
img = np.array([img])
img = img.astype('float32') / 255.
img = tf.keras.applications.vgg16.preprocess_input(img)
This didn't work
image = cv2.imread(url)
image = cv2.normalize(image, None,beta=255, dtype=cv2.CV_32F)
image = cv2.resize(image, (224, 224))
image = np.expand_dims(image, axis=0)
This also didn't work, I also tried many other ways to predict a single image, but none worked.
Finally, the only way was that I had to create an Image Data Generator and Flow From Directory for this single image, and it worked, but I believe that's not how it should be done.
The code img = tf.keras.applications.vgg16.preprocess_input(img) scales the pixel
values in the image to values between -1 to +1 assuming the original pixel values are in the range 0 to 255. In the previous line of code
img = img.astype('float32') / 255.
You rescaled the pixels. So remove that line of code. Now to predict a single image you need to expand the dimensions with
img = np.expand_dims(img, axis=0)
In your second code effort be aware the CV2 reads in images as BGR. If your model was trained on RGB images then your predictions will be wrong. Use the code below to convert the image to RGB.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
As a side note you can replace tf.keras.applications.vgg16.preprocess_input(img) with the function below which will scale the images between -1 to +1
def scalar(img):
return img/127.5 - 1
This answer could be one starting point:
Resnet50 produces different prediction when image loading and resizing is done with OpenCV
These are possible differences (short gist):
RGB vs BGR (OpenCV loads BGR)
The interpolation method used (INTER_LINEAR vs INTER_NEAREST).
img_to_array() transforms the data type into float32 rather than uint8 which is obtained by default when loading with OpenCV.
tf.keras.applications.vgg16.preprocess_input(img). This preprocessing function can actually differ from what you have written above as image preprocessing; it is also notable that, if you do not preprocess it while training in this particular way (preprocess_input()) then it also makes sense to have bad results on the test set, since the preprocessings are different.
Hope these observations shed some light.
After training model with ImageDataGenerator(1/255.), do I need to rescale image before predicting ?
I thought it is necessary but experiment result said NO.
I trained a Resnet50 model which has 37 class on top layer.
Model was trained with ImageDataGenerator like this.
datagen = ImageDataGenerator(rescale=1./255)
generator=datagen.flow_from_directory(
directory=os.path.join(os.getcwd(), data_folder),
target_size=(224,224),
batch_size=256,
classes=None,
class_mode='categorical')
history = model.fit_generator(generator, steps_per_epoch=generator.n / 256, epochs=10)
Accuracy achieved 98% after 10 epochs on my train dataset.
The problem is, when i tried to predict each image in TRAIN dataset, prediction was wrong ( result is 33 whatever input image was )
img_p = './data/pets/shiba_inu/shiba_inu_27.jpg'
img = cv2.imread(img_p, cv2.IMREAD_COLOR)
img = cv2.resize(img, (224,224))
img_arr = np.zeros((1,224,224,3))
img_arr[0, :, :, :] = img / 255.
pred = model.predict(img_arr)
yhat = np.argmax(pred, axis=1)
yhat is 5, but y is 33
When I replace this line
img_arr[0, :, :, :] = img / 255.
by this
img_arr[0, :, :, :] = img
yhat is exactly 33.
Someone might suggest to use predict_generator() instead of predict(), but I want to understand what I did wrong here.
I knew what's wrong here.
I'm using Imagenet pretrained model, which DO NOT rescale image by divide it to 255. I have to use resnet50.preprocess_input before train/test.
preprocess_input function can be found here.
https://github.com/keras-team/keras-applications/blob/master/keras_applications/imagenet_utils.py
You must do every preprocessing that you do on train data, on each data that you want to feed to your trained network. actually when, for example, you rescale train images and train a network, your network train to get a matrix with entries between 0 and 1 and find the proper category. so if after training phase, you feed an image without rescaling, you feed a matrix with entries between 0 and 255 to your trained network while your network did not learn how treat with such matrix.
If you are following pre-processing exactly same as at the time of training then, you might look at the part of your code where you are predicting class using yhat = np.argmax(pred, axis=1) my hunch is that there might be class mismatch in accordance to indexing, to check how your classes are indexed when you use flow_from_directory use class_map = generator.class_indices this will return you a dictionary which will show you how your classes are mapped against index.
Note: The reason I state this because I've faced similar problem, using Keras flow_from_directory doesn't sort classes and hence it's quite possible that your prediction class 1 lies on the index 10 while np.argmax will return you class 1'.
I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.
I was just modifying some an LSTM network I had written to print out the test error. The issues, I realized, is that the model I had defined depends on the batch size.
Specifically, the input is a tensor of shape [batch_size, time_steps, features]. The input enters the LSTM cell and the output, which I turn into a list of time_steps 2D tensors, with each 2D tensor having shape [batch_size, hidden_units]. Each 2D tensor is then multiplied by a weight vector of shape [hidden_units] to yield a vector of shape [batch_size] which has added to it a bias vector of shape [batch_size].
In words, I give the model N sequences, and I expect it to output a scalar for each time step for each sequence. That is, the output is a list of N vectors, one for each time step.
For training, I give the model batches of size 13. For the test data, I feed the entire data set, which consists of over 400 examples. Thus, an error is raised, since the bias has fixed shape batch_size.
I haven't found a way to make it's shape variable without raising an error.
I can add complete code if requested. Added code anyways.
Thanks.
def basic_lstm(inputs, number_steps, number_features, number_hidden_units, batch_size):
weights = {
'out': tf.Variable(tf.random_normal([number_hidden_units, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[batch_size, 1]))
}
lstm_cell = rnn.BasicLSTMCell(number_hidden_units)
init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
hidden_layer_outputs, states = tf.nn.dynamic_rnn(lstm_cell, inputs,
initial_state=init_state, dtype=tf.float32)
results = tf.squeeze(tf.stack([tf.matmul(output, weights['out'])
+ biases['out'] for output
in tf.unstack(tf.transpose(hidden_layer_outputs, (1, 0, 2)))], axis=1))
return results
You want the biases to be a shape of (batch_size, )
For example (using zeros instead of tf.constant but similar problem), I was able to specify the shape as a single integer:
biases = tf.Variable(tf.zeros(10,dtype=tf.float32))
print(biases.shape)
prints:
(10,)
My question is at the bottom, but first I will explain what I am attempting to achieve.
I have an example I am trying to implement on my own model. I am creating an adversarial image, in essence I want to graph how the image score changes when the epsilon value changes.
So let's say my model has already been trained, and in this example I am using the following model...
x = tf.placeholder(tf.float32, shape=[None, 784])
...
...
# construct model
logits = tf.matmul(x, W) + b
pred = tf.nn.softmax(logits) # Softmax
Next, let us assume I extract an array of images of the number 2 from the mnist data set, and I saved it in the following variable...
# convert into a numpy array of shape [100, 784]
labels_of_2 = np.concatenate(labels_of_2, axis=0)
So now, in the example that I have, the next step is to try different epsilon values on every image...
# random epsilon values from -1.0 to 1.0
epsilon_res = 101
eps = np.linspace(-1.0, 1.0, epsilon_res).reshape((epsilon_res, 1))
labels = [str(i) for i in range(10)]
num_colors = 10
cmap = plt.get_cmap('hsv')
colors = [cmap(i) for i in np.linspace(0, 1, num_colors)]
# Create an empty array for our scores
scores = np.zeros((len(eps), 10))
for j in range(len(labels_of_2)):
# Pick the image for this iteration
x00 = labels_of_2[j].reshape((1, 784))
# Calculate the sign of the derivative,
# at the image and at the desired class
# label
sign = np.sign(im_derivative[j])
# Calculate the new scores for each
# adversarial image
for i in range(len(eps)):
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
Now we can graph the images using the following...
# Create a figure
plt.figure(figsize=(10, 8))
plt.title("Image {}".format(j))
# Loop through the score functions for each
# class label and plot them as a function of
# epsilon
for k in range(len(scores.T)):
plt.plot(eps, scores[:, k],
color=colors[k],
marker='.',
label=labels[k])
plt.legend(prop={'size':8})
plt.xlabel('Epsilon')
plt.ylabel('Class Score')
plt.grid('on')
For the first image the graph would look something like the following...
Now Here Is My Question
Let's say the model I trained used a batch_size of 100, in that case the following line would not work...
scores[i, :] = logits.eval({x: x_fool,
keep_prob: 1.0})
In order for this to work, I would need to pass an array of 100 images to the model, but in this instance x_fool is just one image of size (1, 784).
I want to graph the effect of different epsilon values on class scores for any one image, but how can I do so when I need calculate the score of 100 images at a time (since my model was trained on a batch_size of 100)?
You can choose to not choose a batch size by setting it to None. That way, any batch size can be used.
However, keep in mind that this non-choice could com with a moderate penalty.
This fixes it if you start again from scratch. If you start from an existing trained network with a batch size of 100, you can create a test network that is similar to your starting network except for the batch size. You can set the batch size to 1, or again, to None.
I realised the problem was not with the batch_size but with the format of the image I was attempting to pass to the model. As user1735003 pointed out, the batch_size does not matter.
The reason I could not pass the image to the model was because I was passing it as so...
x_fool = x00 + eps[i] * sign
scores[i, :] = logits.eval({x: x_fool})
The problem with this is that the shape of the image is simply (784,) whereas the placeholder needs to accept an array of images of shape shape=[None, 784], so what needs to be done is to reshape the image.
x_fool = labels_of_2[0].reshape((1, 784)) + eps[i] * sign
scores[i, :] = logits.eval({x:x_fool})
Now my image is shape (1, 784) which can now be accepted by the placeholder.