How to accelerate CNN training using fit generator

How to accelerate CNN training using fit generator - keras

I have a dataset of 60000 images which I split in train and validation set (80/20) and I use ImageDataGenerator to get the images from disk as batches of size 32. I am dealing with a multi-label classification task with 6000 classes (labels). In order to tackle this problem I am using the following CNN using Keras:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_27 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
max_pooling2d_27 (MaxPooling (None, 128, 128, 64) 0
_________________________________________________________________
conv2d_28 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
max_pooling2d_28 (MaxPooling (None, 64, 64, 128) 0
_________________________________________________________________
conv2d_29 (Conv2D) (None, 64, 64, 128) 147584
_________________________________________________________________
max_pooling2d_29 (MaxPooling (None, 32, 32, 128) 0
_________________________________________________________________
flatten_10 (Flatten) (None, 131072) 0
_________________________________________________________________
dense_20 (Dense) (None, 512) 67109376
_________________________________________________________________
dense_21 (Dense) (None, 5216) 2675808
=================================================================
Total params: 70,008,416
Trainable params: 70,008,416
Non-trainable params: 0
_________________________________________________________________
I am using fit_generator to train the model and steps_per_epoch = total_training_samples/batch_size. However it takes way too long (more than a week) to be trained for 10 epochs and it is a quite simple model. I tried to make the architecture even simpler by reducing the number of layers and neurons but the train time was also too much and the results was bad. I know that the last decision layer (with 5216 neurons) is responsible for the huge number of parameters. What else could I change to make the model more feasible to be trained?

Related

tensorflow model gives "graph disconnected" error

I am experimenting/fiddling/learning with some small ML problems.
I have a loaded model based on a pre-trained convolution base with some self-trained dense layers (for model details see below).
I wanted to try to apply some visualizations like activations and the Grad CAM Visualization (https://www.statworx.com/de/blog/erklaerbbarkeit-von-deep-learning-modellen-mit-grad-cam/) on the model. But I was not able to do so.
I tried to create a new model based on mine (like in the article) with
grad_model = tf.keras.models.Model(model.inputs,
[model.get_layer('vgg16').output,
model.output])
but this already fails with the error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_5_12:0", shape=(None, None, None, 3), dtype=float32) at layer "block1_conv1". The following previous layers were accessed without issue: []
I do not understand what this means. the model surely works (i can evaluate it and make predictions with it).
The call does not fail if I omit the model.get_layer('vgg16').output from the outputs list but of course, this is required for the visualization.
What I am doing wrong?
In a model that I constructed and trained from scratch, I was able to create a similar model with the activations as outputs but here i get these errors.
My model's details
The model was created with the following code and then trained and saved.
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers
conv_base = keras.applications.vgg16.VGG16(
weights="vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5",
include_top=False)
conv_base.trainable = False
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal"),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.2),
]
)
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
later it was loaded:
model = keras.models.load_model("myModel.keras")
print(model.summary())
print(model.get_layer('sequential').summary())
print(model.get_layer('vgg16').summary())
output:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 180, 180, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 180, 180, 3) 0
_________________________________________________________________
vgg16 (Functional) (None, None, None, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 12800) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 3277056
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 257
=================================================================
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
_________________________________________________________________
None
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
random_flip (RandomFlip) (None, 180, 180, 3) 0
_________________________________________________________________
random_rotation (RandomRotat (None, 180, 180, 3) 0
_________________________________________________________________
random_zoom (RandomZoom) (None, 180, 180, 3) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
None
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, None, None, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) multiple 1792
_________________________________________________________________
block1_conv2 (Conv2D) multiple 36928
_________________________________________________________________
block1_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block2_conv1 (Conv2D) multiple 73856
_________________________________________________________________
block2_conv2 (Conv2D) multiple 147584
_________________________________________________________________
block2_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block3_conv1 (Conv2D) multiple 295168
_________________________________________________________________
block3_conv2 (Conv2D) multiple 590080
_________________________________________________________________
block3_conv3 (Conv2D) multiple 590080
_________________________________________________________________
block3_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block4_conv1 (Conv2D) multiple 1180160
_________________________________________________________________
block4_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block4_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block5_conv1 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) multiple 0
=================================================================
Total params: 14,714,688
Trainable params: 7,079,424
Non-trainable params: 7,635,264

You can achieve what you want in the following way. First, define your model as follows:
inputs = tf.keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs, training=True)
x = keras.applications.VGG16(input_tensor=x,
include_top=False,
weights=None)
x.trainable = False
x = layers.Flatten()(x.output)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, x)
for i, layer in enumerate(model.layers):
print(i, layer.name, layer.output_shape, layer.trainable)
...
17 block5_conv2 (None, 11, 11, 512) False
18 block5_conv3 (None, 11, 11, 512) False
19 block5_pool (None, 5, 5, 512) False
20 flatten_2 (None, 12800) True
21 dense_4 (None, 256) True
22 dropout_2 (None, 256) True
23 dense_5 (None, 1) True
Now, build the grad-cam model with desired output layer as follows:
grad_model = keras.models.Model(
[model.inputs],
[model.get_layer('block5_pool').output,
model.output]
)
Test
image = np.random.rand(1, 180, 180, 3).astype(np.float32)
with tf.GradientTape() as tape:
convOutputs, predictions = grad_model(tf.cast(image, tf.float32))
loss = predictions[:, tf.argmax(predictions[0])]
grads = tape.gradient(loss, convOutputs)
print(grads)
tf.Tensor(
[[[[ 9.8454033e-04 3.6991197e-03 ... -1.2012678e-02
-1.7934230e-03 2.2925171e-03]
[ 1.6165405e-03 -1.9513096e-03 ... -2.5789393e-03
1.2443252e-03 -1.3931725e-03]
[-2.0554627e-04 1.2232144e-03 ... 5.2324748e-03
3.1955825e-04 3.4566019e-03]
[ 2.3650150e-03 -2.5699558e-03 ... -2.4103196e-03
5.8940407e-03 5.3285398e-03]
...

How to display the layers of a pretrained model instead of a single entry in model.summary() output?

As the title clearly describes the question, I want to display the layers of a pretained model instead of a single entry (please see the vgg19 (Functional) entry below) in model.summary() function output?
Here is a sample model that is implemented using the Keras Sequential API:
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))
And here is the output of the model.summary() function call:
Model: "sequential_15"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 512) 20024384
_________________________________________________________________
flatten_15 (Flatten) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_22 (Dense) (None, 10) 10010
=================================================================
Total params: 20,547,394
Trainable params: 523,010
Non-trainable params: 20,024,384
Edit: Here is the Functional API equivalent of the implemented Sequential API model - the result is the same:
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3), pooling='max', classes=10)
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
x = Flatten()(base_out)
x = Dense(1_000, activation='relu')(x)
m_outputs = Dense(10, activation='softmax')(x)
model = Model(inputs=m_inputs, outputs=m_outputs)

Instead of using the Sequential, I tried using the Functional API i.e. the tf.keras.models.Model class, like,
import tensorflow as tf
base_model = tf.keras.applications.VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
x = tf.keras.layers.Flatten()( base_model.output )
x = tf.keras.layers.Dense(1_000, activation='relu')( x )
outputs = tf.keras.layers.Dense(10, activation='softmax')( x )
model = tf.keras.models.Model( base_model.input , outputs )
model.summary()
The output of the above snippet,
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 32, 32, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 16, 16, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 16, 16, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 16, 16, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 8, 8, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 8, 8, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 4, 4, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 2, 2, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 1, 1, 512) 0
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_3 (Dense) (None, 10) 10010
=================================================================
Total params: 15,237,698
Trainable params: 15,237,698
Non-trainable params: 0
_________________________________________________________________

My understanding after going through the docs and running a few tests (via TF 2.5.0) is that when such a model is included in another model, Keras conceives of it as a "black box". It is not a simple layer, definitely no tensor, basically of complex type tensorflow.python.keras.engine.functional.Functional.
I reckon this is the underlying reason that you can not print it out in a detailed way as part of the model summary.
Now, if you'd like to just review the pre-trained model, have a sneak peak etc., you can simply run:
base_model.summary()
or after constructing your model (sequential or functional, doesn't matter at this point):
model.layers[i].summary() # i: the index of your pre-trained model
If you need to access the pre-trained model's layers, e.g. to use its weights separately etc., you can access them with this way as well.
If you'd like to print the layers of your model as a whole, then you need to trick Keras into beliving the "black box" is no stranger but just yet another KerasTensor. In order to do that, you can wrap the pre-trained model in another layer -in other words, connect them directly via Functional API-, which was suggested above and has worked fine for me.
x = tf.keras.layers.Flatten()( base_model.output )
I don't know if there is any specific reason that you'd like to pursue the new input route as in...
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
Whenever you locate the pre-trained model in the middle of your new model, as coming after the new Input layer or adding it to a Sequential model per se, the layers within would disappear from the summary output.
Generating a new Input layer or just feeding the pre-trained model's output as input to the current model didn't make any difference for me in this case.
Hope this clarifies the topic a wee bit more, and helps.

This should do what you want to do
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
for layer in base_model.layers:
layer.trainable = False
model.add(layer)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))

Is it possible to implement switching between layers based on input in Keras?

It is possible to implement a neural network with structure like this in Keras?
The idea is the following:
In the input the model receives an integer i (labeled red) and some other stuff v (in picture it is 0.12345). Next, there are several similar parallel layers. Depending on the value of i, v goes into the ith layer, ignoring other layers and then output of that layer goes into the output layer.
To put it in another words, all layers except of ith a being ignored.

I think the easiest solution here, if I have understood your question correctly, is to separate your data sets based on your value of i.
So take your X_train and split it into X_train_1, X_train_2, etc. Likewise with your X_test, split it into X_test_1, X_test_2, etc.
from keras.models import Sequential, Model
from keras.layers import *
from keras.utils import plot_model
Then set up separate models:
model1 = Sequential()
model1.add(Conv2D(32, kernel_size=(3,3), activation="relu", input_shape=(24,24,3)))
model1.add(MaxPooling2D(pool_size=(2,2)))
model1.add(Flatten())
model1.add(Dropout(0.5))
model1.add(Dense(512, activation = "relu"))
model2 = Sequential()
model2.add(Conv2D(32, kernel_size=(3,3), activation="relu", input_shape=(24,24,3)))
model2.add(MaxPooling2D(pool_size=(2,2)))
model2.add(Flatten())
model2.add(Dropout(0.5))
model2.add(Dense(512, activation = "relu"))
You will want to use the functional API to combine them. I have used Concatenate(), other options are shown in the documentation here.
outputs = Concatenate()([model1.output,model2.output])
outputs = Dense(256, activation='relu')(outputs)
outputs = Dropout(.5)(outputs)
outputs = Dense(5, activation='softmax')(outputs)
Now configure your final model, specifying the inputs and outputs:
model = Model(inputs=[model1.inputs, model2.inputs], outputs=outputs)
Checking model.summary(), you can see how each layer is connected:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
conv2d_input (InputLayer) [(None, 24, 24, 3)] 0
__________________________________________________________________________________________________
conv2d_1_input (InputLayer) [(None, 24, 24, 3)] 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 22, 22, 32) 896 conv2d_input[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 22, 22, 32) 896 conv2d_1_input[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 11, 11, 32) 0 conv2d[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 11, 11, 32) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 3872) 0 max_pooling2d[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 3872) 0 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 3872) 0 flatten[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 3872) 0 flatten_1[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 512) 1982976 dropout[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 512) 1982976 dropout_1[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 1024) 0 dense[0][0]
dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 256) 262400 concatenate[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 256) 0 dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 5) 1285 dropout_2[0][0]
==================================================================================================
Total params: 4,231,429
Trainable params: 4,231,429
Non-trainable params: 0
__________________________________________________________________________________________________
But it's easier to visualise the model with plot_model(model, to_file='image.png', show_shapes=True):
Then for training the model, you will need to feed in the different inputs, not forgetting your test (or validation) data:
model.fit([X_train_1, X_train_2], y_train, validation_data = ([X_test_1, X_test_2], y_val), ...)
NB: The sub-models (here model1, model2, etc.) don't have to have the same structure. They can have different sized layers, different numbers of layers, and different types of layers. This is also how you can include data sets with different types of features in your model.

KERAS: Pretrained a CNN+Dense model. How to freeze CNN weights and substitute Dense with LSTM?

I trained and load a cnn+dense model:
# load model
cnn_model = load_model('my_cnn_model.h5')
cnn_model.summary()
The output is this (I have images dimension 2 X 3600):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 2, 3600, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 1800, 32) 3104
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 2, 600, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 2, 600, 64) 6208
_________________________________________________________________
conv2d_4 (Conv2D) (None, 2, 300, 64) 12352
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 100, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 2, 100, 128) 24704
_________________________________________________________________
conv2d_6 (Conv2D) (None, 2, 50, 128) 49280
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 16, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 4195328
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 3) 3075
=================================================================
Total params: 5,343,779
Trainable params: 5,343,779
Non-trainable params: 0
Now, what I want is to leave weights up to flatten and replace dense layers with LSTM to train the added LSTM part.
I just wrote:
# freeze model
base_model = cnn_model(input_shape=(2, 3600, 1))
#base_model = cnn_model
base_model.trainable = False
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(base_model.output)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(x)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
But I obtained:
base_model = cnn_model(input_shape=(2, 3600, 1))
TypeError: __call__() missing 1 required positional argument: 'inputs'
I know I have to add TimeDistributed ideally in the Flatten layer, but I do not know how to do.
Moreover I'm not sure about base_model.trainable = False if it do exactly what I want.
Can you please help me to do the job?
Thank you very much!

You can't directly take the output from Flatten(), LSTM needs 2-d features (time, filters). You have to reshape your tensors.
You can take the output from the layer before flatten (max-pooling), let's say this layer has index i in the model, we can take the output from that layer and reshape it based on our needs and pass it to LSTM.
before_flatten = base_model.layers[i].output # i is the index of the layer from which you want to take the model output
conv2lstm_reshape = Reshape((-1, 2))(before_flatten) # you have to select it, the temporal dim and filters
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(conv2lstm_reshape)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(before_flatten)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
model.summary()

Remove middle layers in the pre-trained VGG16 model in Keras

everyone,
I have a question about how to modify the pre-trained VGG16 network in Keras. I try to remove the max-pooling layers at the end the last three convolutional layers and add the batch normalization layer at the end of each convolutional layer. At the same time, I want to keep the parameters. This means that the whole modification process will not only include removing some middle layers, adding some new layers, but also concatenating the modified layers with the rest layers.
I'm still very new in Keras. The only way I can find is as shown in
Removing then Inserting a New Middle Layer in a Keras Model
So the codes I edited are as below:
from keras import applications
from keras.models import Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers.normalization import BatchNormalization
vgg_model = applications.VGG16(weights='imagenet',
include_top=False,
input_shape=(160, 80, 3))
# Disassemble layers
layers = [l for l in vgg_model.layers]
# Defining new convolutional layer.
# Important: the number of filters should be the same!
# Note: the receiptive field of two 3x3 convolutions is 5x5.
layer_dict = dict([(layer.name, layer) for layer in vgg_model.layers])
x = layer_dict['block3_conv3'].output
for i in range(11, len(layers)-5):
# layers[i].trainable = False
x = layers[i](x)
for j in range(15, len(layers)-1):
# layers[j].trainable = False
x = layers[j](x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Conv2D(filters=128, kernel_size=(1, 1))(x)
x = BatchNormalization()(x)
x = Flatten()(x)
x = Dense(50, activation='softmax')(x)
custom_model = Model(inputs=vgg_model.input, outputs=x)
for layer in custom_model.layers[:16]:
layer.trainable = False
custom_model.summary()
However, the output shape of the convolutional layers in block 4 and block 5 are multiple. I tried to correct it by adding a layer MaxPool2D(batch_size=(1,1), stride=none), but the output shape is still multiple. Just like this:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 160, 80, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 160, 80, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 160, 80, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 80, 40, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 80, 40, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 80, 40, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 40, 20, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 40, 20, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 40, 20, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 40, 20, 256) 590080
_________________________________________________________________
block4_conv1 (Conv2D) multiple 1180160
_________________________________________________________________
block4_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block4_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv1 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
conv2d_1 (Conv2D) (None, 40, 20, 128) 65664
_________________________________________________________________
batch_normalization_1 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
conv2d_2 (Conv2D) (None, 40, 20, 128) 16512
_________________________________________________________________
batch_normalization_2 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
conv2d_3 (Conv2D) (None, 40, 20, 128) 16512
_________________________________________________________________
batch_normalization_3 (Batch (None, 40, 20, 128) 512
_________________________________________________________________
flatten_1 (Flatten) (None, 102400) 0
_________________________________________________________________
dense_1 (Dense) (None, 50) 5120050
=================================================================
Total params: 19,934,962
Trainable params: 5,219,506
Non-trainable params: 14,715,456
_________________________________________________________________
Can anyone provide some suggestions about how to reach my goal?
Thanks very much.

The multiple output shape is there because these layers were called two times so they have two output shapes.
You can see here that in case calling layer.output_shape raises an AttributeError, the printed output shape will be 'multiple'.
If you call custom_model.layers[10].output_shape, you will get this error :
AttributeError: The layer "block4_conv1 has multiple inbound nodes, with different output shapes. Hence the notion of "output shape" is ill-defined for the layer. Use `get_output_shape_at(node_index)` instead.
And if you then call custom_model.layers[10].get_output_shape_at(0), you will get the output shape corresponding to the initial network, and for custom_model.layers[10].get_output_shape_at(1), you will get the output shape that you are expecting.
Let me just express that I'm doubting your intention with this modification : if you remove the MaxPooling layer, and that you apply the next layer (number 11) to the output that came before the MaxPooling layer, the learnt filters are "expecting" an image with two times less resolution so they probably won't work.
Let's imagine that one filter is "looking" for eyes and that usually eyes are 10 pixels wide, you'll need an 20 pixels wide eye to trigger the same activation in the layer.
My example is obviously over-simplistic and not accurate but it's just to show that the original idea is wrong, you should either retrain the top of the model / keep the MaxPooling layer/ define a brand new model on the top off layer block3_conv3.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string