Wrong input shape to neural network layer - python-3.x

I am trying to classify handwritten digits using the MNIST dataset to train my model. My model trained successfully and hit an accuracy of 98.9%. But when I try and input a custom image it shows me the following error :
Error when checking : expected conv2d_4_input to have shape (None, 28, 28, 1) but got array with shape (1, 1, 28, 28)
This is the first convolutional layer i.e. the input layer.
What can I do to resolve this issue ?
This is my Convolutional Neural Network :
conv_model = Sequential()
conv_model.add(Conv2D(filters, kernel_size[0], input_shape=(28 , 28 , 1)))
conv_model.add(Activation(act))
conv_model.add(Conv2D(filters, kernel_size[0]))
conv_model.add(Activation(act))
conv_model.add(MaxPool2D(pool_size=(2,2)))
conv_model.add(Dropout(0.25))
conv_model.add(Flatten())
conv_model.add(Dense(128))
conv_model.add(Activation(act))
conv_model.add(Dropout(0.5))
conv_model.add(Dense(10))
conv_model.add(Activation('softmax'))
#conv_model.summary()
Compilation Details :
conv_model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
COMPLETE SOURCE CODE :
https://github.com/tanmay-edgelord/HandwrittenDigitRecognition
The image :
If any further details are required please comment.

The error message is pretty straight:
Your first layer is expecting data with shape (None, 28, 28, 1), where "None" can be any number (it's the batch size, how many examples you have).
Your data on the other hand has shape (1, 1, 28, 28).
The confusion seems to me a common one: Keras puts the channels at the last dimension, and your data has the channels in the first.
Solution:
Just reshape your data in the correct format: (1, 28, 28, 1).
But are you trying to give that entire image to the model??? If so, it won't work very well, it's expecting images with 28 x 28 pixels.
You will have to separate each number in a different 28 x 28 image. And you must take into account the possibility of your image being inverted in terms of what is black and what is white. Usually the MNIST data has a black background (0 values) with a white number (1 values).

The problem got solved by passing it to the reshape function with the correct input size
roi2 = roi.reshape(1,28,28,1)

Related

Unable to run model.predict() with image shape same as that which the model was trained on

I am trying to run inference on a ResNet model that I had designed and trained on google Colab, the link to the notebook can be found here. The dimension of the images that the model is trained on is (32, 32, 3). After training, I saved the model in the SavedModel format so that I could run Inference on my machine. The code I used is
import tensorflow as tf
import cv2 as cv
from resize import resize_to_fit
image = cv.imread('extracted_letter_images/001.png')
image_resized = resize_to_fit(image, 32, 32)
model = tf.keras.models.load_model('Model/CAPTCHA-Model')
model.predict(image_resized)
The resize_to_fit method resizes the image to 32x32px. The shape of the returned image is also (32, 32, 3). When the model.predict() function is called, the following error message is shown
ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (32, 32, 3)
I have tried uninstalling and reinstalling Tensorflow as well as tf-nightly several times to no avail. I have even tried expanding the dimension of the image with this
image_resized = np.expand_dims(image_resized, axis=0)
This results in the image having dimensions (1, 32, 32, 3). When the above change is made the following error message is shown
2021-04-07 19:49:11.821261: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:180] None of the MLIR Optimization Passes are enabled (registered 2)
What I'm confused about is that the dimensions of the resized image and the dimensions of the image used the train the model is the same but model.predict() does not seem to work.
In your ImageDataGenerators you used the preprocessing function tf.image.rgb_to_grayscale. This converted the images to 32 X 32 X 1. So you must do the same transformation on the images you wish to predict. You also rescaled the images to be in the range 0 to 1 so you must also rescale the images you wish to predict. The code image_resized = np.expand_dims(image_resized, axis=0) is correct. Not sure if his will be an issue but be aware that cv2 reads in images as BGR not RGB so before you apply tf.image.rgb_to_grayscale first convert the image to RGB with image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB).

Many to Many LSTM network

I'm building a many to many network in Keras, using an LSTM. I have sequences of varying length (labels always have the same length as the sequence they describe). To handle the varying length and after searching on other SO posts I've found padding + masking to be the best solution.
This is my model :
So I have n (874) samples of max_len (24) padded sequences with 25 features each. But how do I handle my labels ? Do I pad them too ?
If I pad them like in the same way as my X (with the same special value) I get this :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
All is fine except I get the following error :
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 24 for '{{node binary_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast_1)' with input shapes: [1,24].
Searching up this error leads to post about removing retun_sequences=True from my LSTM layer, except I don't want that since each of my timesteps are labelled...
And if I don't pad them, they can't be converted to a tensor to be used by tensorflow.
Edit:
Explanatory illustration of the architecture I want to achieve, courtesy of this answer :https://stackoverflow.com/a/52092176/7732923
Problem found :
X_train shape : (873, 24, 25)
y_train shape : (873, 24)
y_train contained 873 samples of length 24, with one label for each timestep as I said, but, probably due to the possibility of wanting multi-label classification, each label for each timestep must be contained in a list, so the right shape for y_train must be :
y_train shape : (873, 24, 1)
So it was just about wrapping each label between [] during preprocessing, the architecture is sound and works (and now I'm left to determine how well but that's another beast ahah)

expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)

I am having some difficulty properly formatting my labels for keras (tensorflow backend). My model takes in an embedding (list of 128 numbers) as input and outputs one of 18827 distinct numbers (ranging from 1 to 20284) as such:
[0.0344733819366,...,0.153029859066] -> 11516
My training data consists of 316491 embedding-number pairings, so when I tried using keras.utils.to_categorical(training_out, num_classes=20284) to convert the number labels to one-hot vectors for categorical_crossentropy, I received a MemoryError. It seems that
sparse_categorical_crossentropy would resolve this issue, since it looks like it only needs one number instead of a large vector as the label. However, I am not sure how to format my labels correctly for this. Currently my model is:
self.brain = Sequential()
self.brain.add(Dense(1000, input_dim=128))
self.brain.add(Dense(20284, activation='softmax'))
self.brain.compile(optimizer='adadelta', loss='categorical_crossentropy', metrics=['accuracy'])
When I try to fit the model I get the following errors, depending on how I format the labels:
ValueError: Error when checking target: expected dense_22 to have shape (None, 18827) but got array with shape (1, 316491)
or
ValueError: Error when checking target: expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)
18827 is the number of distinct labels I have, but I don't think I specified that number anywhere in my code, so I don't know how or why that is the expected dimension for the labels, especially if each label isn't a vector.
I am unsure of whether I correctly understand sparse_categorical_crossentropy, and if I do, how to use it properly.
Answer in the comment expected dense_20 to have shape (None, 18827) but got array with shape (316491, 1)
From the Keras documentation https://keras.io/losses/
Note: when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample). In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical
In your case, you are not able to one-hot encode because of memory issues, so you need to use sparse_categorical_crossentropy instead.

Keras Input add dimension autoly error

I am a new hand of using Keras. After searching without results. Please Please save me!
Here is my problem:
print(input_shape)
X_input = Input(input_shape)
print(X_input)
result in
(600, 64, 64, 3)
Tensor("input_5:0", shape=(?, 600, 64, 64, 3), dtype=float32)
It add one dimension autoly and error
ValueError: Input 0 is incompatible with layer conv0: expected ndim=4, found ndim=5
Your problem lies in the fact that in keras a first dimension is connected to a number of samples (I guess it's 600 in your case) and it's skipped while defining an input shape. So try:
X_input = Input(input_shape[1:])
in order to skip a sample dimension.

How to correctly get layer weights from Conv2D in keras?

I have Conv2D layer defines as:
Conv2D(96, kernel_size=(5, 5),
activation='relu',
input_shape=(image_rows, image_cols, 1),
kernel_initializer=initializers.glorot_normal(seed),
bias_initializer=initializers.glorot_uniform(seed),
padding='same',
name='conv_1')
This is the first layer in my network.
Input dimensions are 64 by 160, image is 1 channel.
I am trying to visualize weights from this convolutional layer but not sure how to get them.
Here is how I am doing this now:
1.Call
layer.get_weights()[0]
This returs an array of shape (5, 5, 1, 96). 1 is because images are 1-channel.
2.Take 5 by 5 filters by
layer.get_weights()[0][:,:,:,j][:,:,0]
Very ugly but I am not sure how to simplify this, any comments are very appreciated.
I am not sure in these 5 by 5 squares. Are they filters actually?
If not could anyone please tell how to correctly grab filters from the model?
I tried to display the weights like so only the first 25. I have the same question that you do is this the filter or something else. It doesn't seem to be the same filters that are derived from deep belief networks or stacked RBM's.
Here is the untrained visualized weights:
and here are the trained weights:
Strangely there is no change after training! If you compare them they are identical.
and then the DBN RBM filters layer 1 on top and layer 2 on bottom:
If i set kernel_intialization="ones" then I get filters that look good but the net loss never decreases though with many trial and error changes:
Here is the code to display the 2D Conv Weights / Filters.
ann = Sequential()
x = Conv2D(filters=64,kernel_size=(5,5),input_shape=(32,32,3))
ann.add(x)
ann.add(Activation("relu"))
...
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
ann.fit(Xtrain, ytrain_indicator, epochs=5, batch_size=32)
x1w = x.get_weights()[0][:,:,0,:]
for i in range(1,26):
plt.subplot(5,5,i)
plt.imshow(x1w[:,:,i],interpolation="nearest",cmap="gray")
plt.show()
---------------------------UPDATE------------------------
So I tried it again with a learning rate of 0.01 instead of 1e-6 and used the images normalized between 0 and 1 instead of 0 and 255 by dividing the images by 255.0. Now the convolution filters are changing and the output of the first convolutional filter looks like so:
The trained filter you'll notice is changed (not by much) with a reasonable learning rate:
Here is image seven of the CIFAR-10 test set:
And here is the output of the first convolution layer:
And if I take the last convolution layer (no dense layers in between) and feed it to a classifier untrained it is similar to classifying raw images in terms of accuracy but if I train the convolution layers the last convolution layer output increases the accuracy of the classifier (random forest).
So I would conclude the convolution layers are indeed filters as well as weights.
In layer.get_weights()[0][:,:,:,:], the dimensions in [:,:,:,:] are x position of the weight, y position of the weight, the n th input to the corresponding conv layer (coming from the previous layer, note that if you try to obtain the weights of first conv layer then this number is 1 because only one input is driven to the first conv layer) and k th filter or kernel in the corresponding layer, respectively. So, the array shape returned by layer.get_weights()[0] can be interpreted as only one input is driven to the layer and 96 filters with 5x5 size are generated. If you want to reach one of the filters, you can type, lets say the 6th filter
print(layer.get_weights()[0][:,:,:,6].squeeze()).
However, if you need the filters of the 2nd conv layer (see model image link attached below), then notice for each of 32 input images or matrices you will have 64 filters. If you want to get the weights of any of them for example weights of the 4th filter generated for the 8th input image, then you should type
print(layer.get_weights()[0][:,:,8,4].squeeze()).
enter image description here

Resources