Can Inception V3 work with image size 150x150x3? - keras

I saw this code on the official keras documentation and I have read the images need to be resized/ scaled prior to feeding to model. Can you please advise?
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.layers import Input
# input size
input_tensor = Input(shape=(150, 150, 3))
model = InceptionV3(input_tensor=input_tensor, weights='imagenet', include_top=True)

Inception V3 can work any size of image as long as your image has 3 channels. Because ImageNet images consist of 3 channels. The reason it can work with any size is that convolutions do not care about image-sizes. You can use it with also grayscale images with some extra work but I am not sure if it will destroy the network performance etc. For this, you need to set include_top = False, otherwise your image size should match with model's defined size, (299,299,3).
You can re-size images with Lambda layer. Let's say you have 1024x1024 images:
input_images = tf.keras.Input(shape=(1024, 1024, 3))
whatever_this_size = tf.keras.layers.Lambda(lambda x: tf.image.resize(x,(150,150),
method=tf.image.ResizeMethod.BILINEAR))(input_images)
model = InceptionV3(input_tensor=whatever_this_size, weights='imagenet', include_top=False)
If you are using TF-Dataset API, you can also do following:
your_train_data = your_train_data.map(lambda x, y: (tf.image.resize(x, (150,150), y))

Related

Predicting single image using Tensorflow not being accurate

I'm trying to build a CNN model in order to classify an image, but whenever the training is done and I try to feed it a single image (from the training dataset) it misclassifies this image always.
Please take a look at the code I wrote below.
Thank you in advance.
First, I declared an Image Data Generator for both my training and testing sets:
train_datagen = ImageDataGenerator(rescale = 1./255, rotation_range=20, horizontal_flip = True,
validation_split=0.3)
test_datagen = ImageDataGenerator(rescale = 1./255,validation_split=0.3)
Then, I used the flow_from_directory() function to load the images:
train_generator = train_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='training',
target_size = (224, 224),
class_mode = 'categorical'
)
test_generator = test_datagen.flow_from_directory(
data_dir,
shuffle=False,
subset='validation',
target_size = (224, 224),
class_mode = 'categorical'
)
I then loaded a pretrained model and added a few layers to build my model:
pretrained_model = VGG16(weights="imagenet", include_top=False,
input_tensor=input_shape)
pretrained_model.trainable = False
model = tf.keras.Sequential([
pretrained_model,
Flatten(name="flatten"),
Dense(3, activation="softmax")
])
I then trained the model :
INIT_LR = 3e-4
EPOCHS = 15
opt = Adam(lr=INIT_LR)
model.compile(loss="categorical_crossentropy", optimizer='Adam', metrics=["accuracy"])
H = model.fit(
train_generator,
validation_data=test_generator,
epochs=EPOCHS,
verbose= 1)
Then came the part to predict a single image:
I chose an image that was part of the training set, I even overfitted the model to make sure the predictions should be correct, but it was giving me wrong results for every image I input to the model.
I tried the following ways:
image = image.load_img(url,target_size = (224, 224))
img = tf.keras.preprocessing.image.img_to_array(image)
img = np.array([img])
img = img.astype('float32') / 255.
img = tf.keras.applications.vgg16.preprocess_input(img)
This didn't work
image = cv2.imread(url)
image = cv2.normalize(image, None,beta=255, dtype=cv2.CV_32F)
image = cv2.resize(image, (224, 224))
image = np.expand_dims(image, axis=0)
This also didn't work, I also tried many other ways to predict a single image, but none worked.
Finally, the only way was that I had to create an Image Data Generator and Flow From Directory for this single image, and it worked, but I believe that's not how it should be done.
The code img = tf.keras.applications.vgg16.preprocess_input(img) scales the pixel
values in the image to values between -1 to +1 assuming the original pixel values are in the range 0 to 255. In the previous line of code
img = img.astype('float32') / 255.
You rescaled the pixels. So remove that line of code. Now to predict a single image you need to expand the dimensions with
img = np.expand_dims(img, axis=0)
In your second code effort be aware the CV2 reads in images as BGR. If your model was trained on RGB images then your predictions will be wrong. Use the code below to convert the image to RGB.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
As a side note you can replace tf.keras.applications.vgg16.preprocess_input(img) with the function below which will scale the images between -1 to +1
def scalar(img):
return img/127.5 - 1
This answer could be one starting point:
Resnet50 produces different prediction when image loading and resizing is done with OpenCV
These are possible differences (short gist):
RGB vs BGR (OpenCV loads BGR)
The interpolation method used (INTER_LINEAR vs INTER_NEAREST).
img_to_array() transforms the data type into float32 rather than uint8 which is obtained by default when loading with OpenCV.
tf.keras.applications.vgg16.preprocess_input(img). This preprocessing function can actually differ from what you have written above as image preprocessing; it is also notable that, if you do not preprocess it while training in this particular way (preprocess_input()) then it also makes sense to have bad results on the test set, since the preprocessings are different.
Hope these observations shed some light.

Unable to run model.predict() with image shape same as that which the model was trained on

I am trying to run inference on a ResNet model that I had designed and trained on google Colab, the link to the notebook can be found here. The dimension of the images that the model is trained on is (32, 32, 3). After training, I saved the model in the SavedModel format so that I could run Inference on my machine. The code I used is
import tensorflow as tf
import cv2 as cv
from resize import resize_to_fit
image = cv.imread('extracted_letter_images/001.png')
image_resized = resize_to_fit(image, 32, 32)
model = tf.keras.models.load_model('Model/CAPTCHA-Model')
model.predict(image_resized)
The resize_to_fit method resizes the image to 32x32px. The shape of the returned image is also (32, 32, 3). When the model.predict() function is called, the following error message is shown
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (32, 32, 3)
I have tried uninstalling and reinstalling Tensorflow as well as tf-nightly several times to no avail. I have even tried expanding the dimension of the image with this
image_resized = np.expand_dims(image_resized, axis=0)
This results in the image having dimensions (1, 32, 32, 3). When the above change is made the following error message is shown
2021-04-07 19:49:11.821261: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:180] None of the MLIR Optimization Passes are enabled (registered 2)
What I'm confused about is that the dimensions of the resized image and the dimensions of the image used the train the model is the same but model.predict() does not seem to work.
In your ImageDataGenerators you used the preprocessing function tf.image.rgb_to_grayscale. This converted the images to 32 X 32 X 1. So you must do the same transformation on the images you wish to predict. You also rescaled the images to be in the range 0 to 1 so you must also rescale the images you wish to predict. The code image_resized = np.expand_dims(image_resized, axis=0) is correct. Not sure if his will be an issue but be aware that cv2 reads in images as BGR not RGB so before you apply tf.image.rgb_to_grayscale first convert the image to RGB with image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB).

how to use pretrained models on low resolution images like tiny imagenet dataset

I download the tiny imagenet dataset that is a subset of imagenet dataset and the size of its images is 64*64 pixels. I want to use pretrained models on original imagenet like alexnet and VGG and feed the images of tiny imagenet as input to the network. Is it true or false?
as you may know the resolution of images in original imagenet is higher than tiny imagenet. is it cause a problem in inference task?
thanks for your attention.
Generally, a CNN layer may be used for images of any size. The number of weights in the CNN layer does not depend on the image size but on the number and shapes of kernels. So, for instance:
Conv2D(16, (3, 3), padding="same",input_shape=(None, None, 3))
has always 16(kernels) * 3 * 3 * 3(channels) + 16(biases) = 448 weights.
The only problem is that a head of the network is typically a set of Dense layers that have a fixed number of inputs. So, if you just Flatten your network between Conv2D and Dense layers, the size of images must be fixed. But if you put for instance tf.keras.layers.GlobalAveragePooling2D layer, the size of images may be variable as this layer produces output that depends only on the number of kernels and not on the size of images.
If you use versions with heads (include_top parameter):
base_model = tf.keras.applications.VGG16(weights = 'imagenet', include_top = True)
or
base_model = tf.keras.applications.MobileNet(weights = 'imagenet', include_top = True)
you may check with base_model.summary() that they expect images with size (224,224,3).
But if you add include_top=False like here:
base_model = tf.keras.applications.VGG16(weights = 'imagenet', include_top = False)
the expected input_shape of the image is (None, None, 3). Such a network for an image of size (W, H, 3) produces an output of size (W/S, H/S, K) where K is the number of kernels in the last layer and S is the shrinkage factor of the specific network. For instance for VGG16 network S=32 and K=512, so for image of size (224,224,3) the output size is (7,7,512) and for image of size (512,512,3) the output is (16,16,512). Such an output is sometimes called the 'patch'.
So, if you want to build the network that uses some pretrained network and classifies images of any size, you may build it like this:
base_model = tf.keras.applications.ResNet50(weights = 'imagenet', include_top = False)
x = base_model.output
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(512, activation = 'relu')(x)
...
last_layer = tf.keras.layers.Dense(num_classes, activation = 'softmax')(x)
model = tf.keras.models.Model(inputs = base_model.input, outputs = last_layer)
Such a model may be fed with images of any size and produces the probalitity vector for num_classes classes. Of course during training you must use images of the same size in one batch, but then you may use any image.

Merging same vgg16 model but with different inputs

I am working on a classification problem in a project. The specificity of my problem is that I have to use two different type of data to manage it. My classes are Car, Pedestrian, Truck and Cyclist. My dataset is composed of :
-Images coming from the Camera : they are RGB image. Here is an example :
Images obtain by projecting Lidar Point Cloud (just 3D points) into 2D camera plane and encoding pixels using Depth & Reflectance. Here are examples :
I already manage to use both modalities in order to perform the classification task by using the Concatenate function of the keras API.
But what I would like to do is to use a more powerful CNN like VGG. I used pre-trained model and freeze all layers except the last 4. I read the grayscale image as RGB because the VGG16 pre-trained model need 3 channels input. Here is my code :
from keras.applications import VGG16
#Load the VGG model
#Camera Model
vgg_conv_C = VGG16(weights='imagenet', include_top=False, input_shape=(227, 227, 3))
#Depth Model
vgg_conv_D = VGG16(weights='imagenet', include_top=False, input_shape= (227, 227, 3))
for layer in vgg_conv_D.layers[:-4]:
layer.trainable = False
for layer in vgg_conv_C.layers[:-4]:
layer.trainable = False
mergedModel = Concatenate()([vgg_conv_C.output,vgg_conv_D.output])
mergedModel = Dense(units = 1024)(mergedModel)
mergedModel = BatchNormalization()(mergedModel)
mergedModel = Activation('relu')(mergedModel)
mergedModel = Dropout(0.5)(mergedModel)
mergedModel = Dense(units = 4,activation = 'softmax')(mergedModel)
fused_model = Model([vgg_conv_C.input, vgg_conv_D.input], mergedModel) )
The last line give the following error :
ValueError: The name "block1_conv1" is used 2 times in the model. All
layer names should be unique.
Did someone know how to handle this? To be simple, I just want to use VGG16 on both type of images, then just get the feature vectors for each modality, then Concatenate them and add fully connected layers at top to predict the image's class. It works with no-pre trained models. Can provide the code if needed
Try this
#Camera Model
vgg_conv_C = VGG16(weights='imagenet', include_top=False, input_shape=(227, 227, 3))
for layer in vgg_conv_C.layers:
layer.name = layer.name + str('_C')
#Depth Model
vgg_conv_D = VGG16(weights='imagenet', include_top=False, input_shape= (227, 227, 3))
for layer in vgg_conv_D.layers:
layer.name = layer.name + str('_D')
In this way, you'd still be able to use two identical pre-trained networks.
As mentioned in the error,
ValueError: The name "block1_conv1" is used 2 times in the model. All
layer names should be unique.
Therefore use Saimse network or If use dual CNN them remember in network layer ame are unique. its better and copy the network for second configuration and change the layers name.
IStackoverflowAndIKnowThings solution gives me the error:
AttributeError: Can't set the attribute "name", likely because it conflicts with an existing read-only #property of the object. Please choose a different name.
The following worked for me (see this post):
..
for layer in vgg_conv_C.layers:
layer._name = layer._name + str('_C')
..

Keras predict_generator and Image generator

How to use ImageDataGenerator and predict_generator on a single JPEG file in Keras?
I am having a single jpeg and i want to predict the probability using model trained using model.fita-generator function.
If you just have a single .jpeg, you don't need to use the ImageDataGenerator. In the code below I'm assuming you trained your model with RGB images sized 150px x 150px.
img = image.load_img(img_path, target_size=(150, 150))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
model.predict(img_tensor)
For more info, check out Francois Chollet's excellent Ipython Notebooks. Specifically, Line (In [2]) of https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.4-visualizing-what-convnets-learn.ipynb
In this section, he looks at the intermediate activation layers for an image that wasn't in his train_generator. He loads in a model he created in another Ipython notebook: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.2-using-convnets-with-small-datasets.ipynb

Resources