Conv2d wrong dimensions on Keras - python-3.x

I'm new to Keras and I'm trying to use convolutional autoencoders for image compression.
In particular I'm compressing images which are all of dimensions (365,929). As I'm working with numpy 2D arrays for the images, I add a dimension to make them tensors.
When feeding the network with the images with this code:
X,X_test=train_test_split(images,test_size=0.1)
# Adds 1D to each matrix, so to have a tensor.
X=np.array([np.expand_dims(i,axis=2) for i in X])
# X is (1036, 365, 929, 1) now
X_test=np.array([np.expand_dims(i,axis=2) for i in X_test])
inputs = Input(shape=(365, 929, 1))
h = Conv2D(4,(3,3),activation='relu',padding="same")(inputs)
encoded = MaxPooling2D(pool_size=2,padding="same")(h)
h = Conv2D(4,(3,3),activation='relu',padding="same")(encoded)
h = UpSampling2D((2,2))(h)
outputs = Conv2D(1,(3,3),activation='relu',padding="same")(h)
model = Model(inputs=inputs, output=outputs)
model.compile(optimizer='adam', loss='mse')
model.fit(X, X, batch_size=64, nb_epoch=5, validation_split=.33)
I get the following error:
ValueError: Error when checking target: expected conv2d_3 to have shape (366, 930, 1) but got array with shape (365, 929, 1)
How can I solve this issue? How can I modify the CNN to take images with uneven dimensions?

Your problem lies in the UpSampling2D. You can pad the image with 0s unsymetrically and then crop the image to its original size, as explained here.
To help debugging you can use print(model.Summary()) to check the dimensions of all layers.

Related

why do we reshape grayscale image to (x,y,1)?

I noticed that when training CNN with a grayscale image. The dimensions of the image is reshaped to (x,y,1). I thought that this shouldn't be necessary but when i try with shape (x,y). I get an error
ValueError: Input 0 of layer conv2d is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 28, 28]
As i understand the only reason we are doing this because keras implemented this way. Or is there any other reason for this?
The input shape of of Conv2D layer in keras is: batch_size + (rows, cols, channels). So, the layer expects number of channels as the final input shape which is 1 for grayscale image. For RGB images this would be 3.

How to solve "logits and labels must have the same first dimension" error

I'm trying out different Neural Network architectures for a word based NLP.
So far I've used bidirectional-, embedded- and models with GRU's guided by this tutorial: https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571 and it all worked out well.
When I tried using LSTM's however, I get an error saying:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
How can I solve this?
My source and target dataset consists of 7200 sample sentences. They are integer tokenized and embedded. The source dataset is post padded to match the length of the target dataset.
Here is my model and the relevant code:
lstm_model = Sequential()
lstm_model.add(Embedding(src_vocab_size, 128, input_length=X.shape[1], input_shape=X.shape[1:]))
lstm_model.add(LSTM(128, return_sequences=False, dropout=0.1, recurrent_dropout=0.1))
lstm_model.add(Dense(128, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add((Dense(target_vocab_size, activation='softmax')))
lstm_model.compile(optimizer=Adam(0.002), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = lstm_model.fit(X, Y, batch_size = 32, callbacks=CALLBACK, epochs = 100, validation_split = 0.25) #At this line the error is raised!
With the shapes:
X.shape = (7200, 147)
Y.shape = (7200, 147, 1)
src_vocab_size = 188
target_vocab_size = 186
I've looked at similar question on here already and tried adding a Reshape layer
simple_lstm_model.add(Reshape((-1,)))
but this only causes the following error:
"TypeError: __int__ returned non-int (type NoneType)"
It's really weird as I preprocess the dataset the same way for all models and it works just fine except for the above.
You should have return_sequences=True and return_state=False in calling the LSTM constructor.
In your snippet, the LSTM only return its last state, instead of the sequence of states for every input embedding. In theory, you could have spotted it from the error message:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
The logits should be three-dimensional: batch size × sequence length × number of classes. The length of the sequences is 147 and indeed 32 × 147 = 4704 (number of your labels). This could have told you the length of the sequences disappeared.

Struggle with LSTM and RNN using Keras

I'm working on a speech recognition problem running on Colab using LSTM. The audio files were converted into spectrograms and then normalized. There are 6840 spectrograms in total and the shape of each one is (288, 864, 4).
I already tried a few examples with RNN and CNN and they worked, but when I try an example using a LSTM I get shape errors, every time either there is one more or one less dimension than expected. Here are some of these cases :
rnn = keras.Sequential()
rnn.add(keras.layers.SimpleRNN(500, input_shape = (864, 4)))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.LSTM(500, return_sequences = True))
rnn.add(keras.layers.Dropout(0.2))
rnn.add(keras.layers.Dense(212, activation = 'softmax'))
rnn.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',metrics = ['accuracy'])
rnn.fit(X_train, y_train, epochs = 5, validation_data=(X_test, y_test))
scores = rnn.evaluate(X_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', '%.2f' % (scores[1] * 100), '%')
The following error is raised on the first LSTM layer : ValueError: Input 0 of layer lstm_54 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 500]
If I remove the SimpleRNN line and feed the input directly to the first LSTM like this
rnn.add(keras.layers.LSTM(500, return_sequences = True, input_shape = (288, 864, 4)))
I get : ValueError: Input 0 of layer lstm_56 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 288, 864, 4]
I tried reshaping the images to (4, 288 * 864) and got the same error when trying to use the RNN layer, but with just the LSTM I got InvalidArgumentError: Incompatible shapes: [32] vs. [32,4].
No idea where the 32 came from, though.
One last thing, not really an issue but more of a request, is there any library that can resize images the simple way? 288x864 is too big for Colab, so I'll have to do it eventually to be able to load all 6840 images and feed it to the neural network. Right now I'm just using 100 samples to test.
Feel free to leave suggestions about other methods, cabalistic number of nodes/layers or anything like that.
LSTM input is 3 dimensions [n_samples, n_timesteps, n_features], so your first line also need to enable return sequences:
rnn.add(keras.layers.SimpleRNN(500, return_sequences = True, input_shape = (864, 4)))
Next, your Dense layer will complain from wrong input size, so you want to remove return_sequence on the last LSTM network:
rnn.add(keras.layers.LSTM(500))
If you still want to keep the return_sequences = True on the last LSTM layer, you might want to wrap the Dense layer in a TimeDistributed.
I tried it on the following input and they seems to work
X_train = np.random.rand(100, 864, 4)
y_train = np.random.rand(100, 1)
The PIL from pillow package has plenty of image manipulation methods.

Resnet with Custom Data

I am trying to modify Resnet50 with my custom data as follows:
X = [[1.85, 0.460,... -0.606] ... [0.229, 0.543,... 1.342]]
y = [2, 4, 0, ... 4, 2, 2]
X is a feature vector of length 2000 for 784 images. y is an array of size 784 containing the binary representation of labels.
Here is the code:
def __classifyRenet(self, X, y):
image_input = Input(shape=(2000,1))
num_classes = 5
model = ResNet50(weights='imagenet',include_top=False)
model.summary()
last_layer = model.output
# add a global spatial average pooling layer
x = GlobalAveragePooling2D()(last_layer)
# add fully-connected & dropout layers
x = Dense(512, activation='relu',name='fc-1')(x)
x = Dropout(0.5)(x)
x = Dense(256, activation='relu',name='fc-2')(x)
x = Dropout(0.5)(x)
# a softmax layer for 5 classes
out = Dense(num_classes, activation='softmax',name='output_layer')(x)
# this is the model we will train
custom_resnet_model2 = Model(inputs=model.input, outputs=out)
custom_resnet_model2.summary()
for layer in custom_resnet_model2.layers[:-6]:
layer.trainable = False
custom_resnet_model2.layers[-1].trainable
custom_resnet_model2.compile(loss='categorical_crossentropy',
optimizer='adam',metrics=['accuracy'])
clf = custom_resnet_model2.fit(X, y,
batch_size=32, epochs=32, verbose=1,
validation_data=(X, y))
return clf
I am calling to function as:
clf = self.__classifyRenet(X_train, y_train)
It is giving an error:
ValueError: Error when checking input: expected input_24 to have 4 dimensions, but got array with shape (785, 2000)
Please help. Thank you!
1. First, understand the error.
Your input does not match the input of ResNet, for ResNet, the input should be (n_sample, 224, 224, 3) but you are having (785, 2000). From your question, you have 784 images with array of size 2000, which doesn't really align with the original ResNet50 input shape of (224 x 224) no matter how you reshape it. That means you cannot use the ResNet50 directly with your data. The only thing you did in your code is to take the last layer of ResNet50 and added you output layer to align with your output class size.
2. Then, what you can do.
If you insist to use the ResNet architecture, you will need to change the input layer rather than output layer. Also, you will need to reshape your image data to utilize the convolution layers. That means, you cannot have it in a (2000,) array, but need to be something like (height, width, channel), just like what ResNet and other architectures are doing. Of course you will also need to change the output layer as well just like you did so that you are predicting for your classes. Try something like:
model = ResNet50(input_tensor=image_input_shape, include_top=True,weights='imagenet')
This way, you can specify customized input image shape. You can check the github code for more information (https://github.com/keras-team/keras/blob/master/keras/applications/resnet50.py). Here's part of the docstring:
input_shape: optional shape tuple, only to be specified
if `include_top` is False (otherwise the input shape
has to be `(224, 224, 3)` (with `channels_last` data format)
or `(3, 224, 224)` (with `channels_first` data format).
It should have exactly 3 inputs channels,
and width and height should be no smaller than 197.
E.g. `(200, 200, 3)` would be one valid value.

1D Convolutional network with keras, error on input size

I'm trying to build a convolutional neural network for my dataset. My training dataset has 1209 examples of 800 features each.
Here's what part of the code looks like :
model = Sequential()
model.add(Conv1D(64, 3, activation='linear', input_shape=(1209, 800)))
model.add(GlobalMaxPooling1D())
model.add(Dense(1, activation='linear'))
model.compile(loss=loss_type, optimizer=optimizer_type, metrics=[metrics_type])
model.fit(X, Y, validation_data=(X2,Y2),epochs = nb_epochs,
batch_size = batch_size,shuffle=True)
When I compile this code, I get the following error :
Error when checking input: expected conv1d_25_input to have 3 dimensions,
but got array with shape (1209, 800)
So I add a dimension, here's what I do :
X = np.expand_dims(X, axis=0)
X2 = np.expand_dims(X2, axis=0)
And then I get this error :
ValueError: Input arrays should have the same number of samples as target arrays.
Found 1 input samples and 1209 target samples.
My training data has now a shape like this (1, 1209, 800), should it be something else ?
Thanks a lot for reading this.
Instead of expanding the dimensions on X at axis 0, you should expand on axis 2. Thus, rather than X = np.expand_dims(X, axis=0), you need X = np.expand_dims(X, axis=2).
Afterwards, the shape of X should be (1209, 800, 1), and you should then specify input_shape=(800, 1) in your first layer.

Resources