Multi-class semantic segmentation difficulty with RGB images - keras

Could anyone help me with this problem of multi-class semantic segmentation. I have modified a code to accept RGB images and RGB labels as masks. I am using the following model
Layer (type) Output Shape Param # Connected to
img (InputLayer) (None, 128, 128, 3) 0
conv2d_170 (Conv2D) (None, 128, 128, 16) 448 img[0][0]
batch_normalization_163 (BatchN (None, 128, 128, 16) 64 conv2d_170[0][0]
activation_163 (Activation) (None, 128, 128, 16) 0 batch_normalization_163[0][0]
conv2d_171 (Conv2D) (None, 128, 128, 16) 2320 activation_163[0][0]
batch_normalization_164 (BatchN (None, 128, 128, 16) 64 conv2d_171[0][0]
activation_164 (Activation) (None, 128, 128, 16) 0 batch_normalization_164[0][0]
max_pooling2d_37 (MaxPooling2D) (None, 64, 64, 16) 0 activation_164[0][0]
dropout_73 (Dropout) (None, 64, 64, 16) 0 max_pooling2d_37[0][0]
conv2d_172 (Conv2D) (None, 64, 64, 32) 4640 dropout_73[0][0]
batch_normalization_165 (BatchN (None, 64, 64, 32) 128 conv2d_172[0][0]
activation_165 (Activation) (None, 64, 64, 32) 0 batch_normalization_165[0][0]
conv2d_173 (Conv2D) (None, 64, 64, 32) 9248 activation_165[0][0]
batch_normalization_166 (BatchN (None, 64, 64, 32) 128 conv2d_173[0][0]
activation_166 (Activation) (None, 64, 64, 32) 0 batch_normalization_166[0][0]
max_pooling2d_38 (MaxPooling2D) (None, 32, 32, 32) 0 activation_166[0][0]
dropout_74 (Dropout) (None, 32, 32, 32) 0 max_pooling2d_38[0][0]
conv2d_174 (Conv2D) (None, 32, 32, 64) 18496 dropout_74[0][0]
batch_normalization_167 (BatchN (None, 32, 32, 64) 256 conv2d_174[0][0]
activation_167 (Activation) (None, 32, 32, 64) 0 batch_normalization_167[0][0]
conv2d_175 (Conv2D) (None, 32, 32, 64) 36928 activation_167[0][0]
batch_normalization_168 (BatchN (None, 32, 32, 64) 256 conv2d_175[0][0]
activation_168 (Activation) (None, 32, 32, 64) 0 batch_normalization_168[0][0]
max_pooling2d_39 (MaxPooling2D) (None, 16, 16, 64) 0 activation_168[0][0]
dropout_75 (Dropout) (None, 16, 16, 64) 0 max_pooling2d_39[0][0]
conv2d_176 (Conv2D) (None, 16, 16, 128) 73856 dropout_75[0][0]
batch_normalization_169 (BatchN (None, 16, 16, 128) 512 conv2d_176[0][0]
activation_169 (Activation) (None, 16, 16, 128) 0 batch_normalization_169[0][0]
conv2d_177 (Conv2D) (None, 16, 16, 128) 147584 activation_169[0][0]
batch_normalization_170 (BatchN (None, 16, 16, 128) 512 conv2d_177[0][0]
activation_170 (Activation) (None, 16, 16, 128) 0 batch_normalization_170[0][0]
max_pooling2d_40 (MaxPooling2D) (None, 8, 8, 128) 0 activation_170[0][0]
dropout_76 (Dropout) (None, 8, 8, 128) 0 max_pooling2d_40[0][0]
conv2d_178 (Conv2D) (None, 8, 8, 256) 295168 dropout_76[0][0]
batch_normalization_171 (BatchN (None, 8, 8, 256) 1024 conv2d_178[0][0]
activation_171 (Activation) (None, 8, 8, 256) 0 batch_normalization_171[0][0]
conv2d_179 (Conv2D) (None, 8, 8, 256) 590080 activation_171[0][0]
batch_normalization_172 (BatchN (None, 8, 8, 256) 1024 conv2d_179[0][0]
activation_172 (Activation) (None, 8, 8, 256) 0 batch_normalization_172[0][0]
conv2d_transpose_37 (Conv2DTran (None, 16, 16, 128) 295040 activation_172[0][0]
concatenate_37 (Concatenate) (None, 16, 16, 256) 0 conv2d_transpose_37[0][0]
dropout_77 (Dropout) (None, 16, 16, 256) 0 concatenate_37[0][0]
conv2d_180 (Conv2D) (None, 16, 16, 128) 295040 dropout_77[0][0]
batch_normalization_173 (BatchN (None, 16, 16, 128) 512 conv2d_180[0][0]
activation_173 (Activation) (None, 16, 16, 128) 0 batch_normalization_173[0][0]
conv2d_181 (Conv2D) (None, 16, 16, 128) 147584 activation_173[0][0]
batch_normalization_174 (BatchN (None, 16, 16, 128) 512 conv2d_181[0][0]
activation_174 (Activation) (None, 16, 16, 128) 0 batch_normalization_174[0][0]
conv2d_transpose_38 (Conv2DTran (None, 32, 32, 64) 73792 activation_174[0][0]
concatenate_38 (Concatenate) (None, 32, 32, 128) 0 conv2d_transpose_38[0][0]
dropout_78 (Dropout) (None, 32, 32, 128) 0 concatenate_38[0][0]
conv2d_182 (Conv2D) (None, 32, 32, 64) 73792 dropout_78[0][0]
batch_normalization_175 (BatchN (None, 32, 32, 64) 256 conv2d_182[0][0]
activation_175 (Activation) (None, 32, 32, 64) 0 batch_normalization_175[0][0]
conv2d_183 (Conv2D) (None, 32, 32, 64) 36928 activation_175[0][0]
batch_normalization_176 (BatchN (None, 32, 32, 64) 256 conv2d_183[0][0]
activation_176 (Activation) (None, 32, 32, 64) 0 batch_normalization_176[0][0]
conv2d_transpose_39 (Conv2DTran (None, 64, 64, 32) 18464 activation_176[0][0]
concatenate_39 (Concatenate) (None, 64, 64, 64) 0 conv2d_transpose_39[0][0]
dropout_79 (Dropout) (None, 64, 64, 64) 0 concatenate_39[0][0]
conv2d_184 (Conv2D) (None, 64, 64, 32) 18464 dropout_79[0][0]
batch_normalization_177 (BatchN (None, 64, 64, 32) 128 conv2d_184[0][0]
activation_177 (Activation) (None, 64, 64, 32) 0 batch_normalization_177[0][0]
conv2d_185 (Conv2D) (None, 64, 64, 32) 9248 activation_177[0][0]
batch_normalization_178 (BatchN (None, 64, 64, 32) 128 conv2d_185[0][0]
activation_178 (Activation) (None, 64, 64, 32) 0 batch_normalization_178[0][0]
conv2d_transpose_40 (Conv2DTran (None, 128, 128, 16) 4624 activation_178[0][0]
concatenate_40 (Concatenate) (None, 128, 128, 32) 0 conv2d_transpose_40[0][0]
dropout_80 (Dropout) (None, 128, 128, 32) 0 concatenate_40[0][0]
conv2d_186 (Conv2D) (None, 128, 128, 16) 4624 dropout_80[0][0]
batch_normalization_179 (BatchN (None, 128, 128, 16) 64 conv2d_186[0][0]
activation_179 (Activation) (None, 128, 128, 16) 0 batch_normalization_179[0][0]
conv2d_187 (Conv2D) (None, 128, 128, 16) 2320 activation_179[0][0]
batch_normalization_180 (BatchN (None, 128, 128, 16) 64 conv2d_187[0][0]
activation_180 (Activation) (None, 128, 128, 16) 0 batch_normalization_180[0][0]
conv2d_188 (Conv2D) (None, 128, 128, 1) 17 activation_180[0][0]
Total params: 2,164,593
Trainable params: 2,161,649
Non-trainable params: 2,944
As you can see the input has 3 channels. Should the last layer have 1 channel of 11 channels? The dataset I am using has 11 classes which are denoted by different RGB value combinations in the image..

The last layer should be 11 channels corresponding to the 11 classes for each pixel location. It is just like doing a multi-class classification for each pixel location.


Keras Pretrained ResNet 101 V2: How to get filter size used?

I am using keras' pretrained resnet 101 v2 CNN model. I wanted to know what the size of the filter was. I tried checking my model's summary but it doesn't really tell me the size directly. is it a 2x2x2 matrix or a 3x3x3 or something else?
The snippet of the model summary is:
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 255, 255, 3) 0
conv1_pad (ZeroPadding2D) (None, 261, 261, 3) 0 input_3[0][0]
conv1_conv (Conv2D) (None, 128, 128, 64) 9472 conv1_pad[0][0]
pool1_pad (ZeroPadding2D) (None, 130, 130, 64) 0 conv1_conv[0][0]
pool1_pool (MaxPooling2D) (None, 64, 64, 64) 0 pool1_pad[0][0]
conv2_block1_preact_bn (BatchNo (None, 64, 64, 64) 256 pool1_pool[0][0]
conv2_block1_preact_relu (Activ (None, 64, 64, 64) 0 conv2_block1_preact_bn[0][0]
conv2_block1_1_conv (Conv2D) (None, 64, 64, 64) 4096 conv2_block1_preact_relu[0][0]
conv2_block1_1_bn (BatchNormali (None, 64, 64, 64) 256 conv2_block1_1_conv[0][0]
conv2_block1_1_relu (Activation (None, 64, 64, 64) 0 conv2_block1_1_bn[0][0]
conv2_block1_2_pad (ZeroPadding (None, 66, 66, 64) 0 conv2_block1_1_relu[0][0]
conv2_block1_2_conv (Conv2D) (None, 64, 64, 64) 36864 conv2_block1_2_pad[0][0]
conv2_block1_2_bn (BatchNormali (None, 64, 64, 64) 256 conv2_block1_2_conv[0][0]
conv2_block1_2_relu (Activation (None, 64, 64, 64) 0 conv2_block1_2_bn[0][0]
conv2_block1_0_conv (Conv2D) (None, 64, 64, 256) 16640 conv2_block1_preact_relu[0][0]
conv2_block1_3_conv (Conv2D) (None, 64, 64, 256) 16640 conv2_block1_2_relu[0][0]
I am not sure if there is a predefined method to get this. It should be possible to get filter shape, count for a layer this way for a certain layer,
This gives output,
(7, 7, 3, 64)
Printing print(model.layers[2].weights) gives something like,
[<tf.Variable 'conv1_conv/kernel:0' shape=(7, 7, 3, 64) dtype=float32, numpy=
array([[[[ 2.04881709e-02, 1.74432080e-02, -1.19661177e-02, ...,
To get details for all the layers,
for i, layer in enumerate(model.layers):
if layer.weights:
print('-' * 30)
Partial output,
(7, 7, 3, 64)
(1, 1, 64, 64)
(3, 3, 64, 64)

Keras multiple output expected shape and got shape

I am try to train a model which detect 128d vector to recognize face. Input of model is an image and output is 128d vector (regression) which get from "face_recognition" library.
When I put 128 output to train I got this error:
ValueError: Error when checking target: expected dense_24 to have shape (1,) but got array with shape (128,)
But when I try only one output, fit function works.
The strange part of that prediction shape is (1, 128) but I can't give 128 output to train.
Here is my model:
from keras.applications.vgg16 import VGG16
from keras.layers import Flatten, Dense
import keras
def build_facereg_disc():
# load model
model = VGG16(include_top=False, input_shape=(64, 64, 3))
# add new classifier layers
flat1 = Flatten()(model.outputs)
class1 = Dense(2048, activation='relu')(flat1)
output = Dense(128, activation='relu')(class1)
# define new model
model = models.Model(inputs=model.inputs, outputs=output)
# summarize
return model
facereg_disc = build_facereg_disc()
facereg_disc.compile(optimizer=keras.optimizers.Adam(), # Optimizer
# Loss function to minimize
# List of metrics to monitor
And summary:
Model: "model_27"
Layer (type) Output Shape Param #
input_20 (InputLayer) (None, 64, 64, 3) 0
block1_conv1 (Conv2D) (None, 64, 64, 64) 1792
block1_conv2 (Conv2D) (None, 64, 64, 64) 36928
block1_pool (MaxPooling2D) (None, 32, 32, 64) 0
block2_conv1 (Conv2D) (None, 32, 32, 128) 73856
block2_conv2 (Conv2D) (None, 32, 32, 128) 147584
block2_pool (MaxPooling2D) (None, 16, 16, 128) 0
block3_conv1 (Conv2D) (None, 16, 16, 256) 295168
block3_conv2 (Conv2D) (None, 16, 16, 256) 590080
block3_conv3 (Conv2D) (None, 16, 16, 256) 590080
block3_pool (MaxPooling2D) (None, 8, 8, 256) 0
block4_conv1 (Conv2D) (None, 8, 8, 512) 1180160
block4_conv2 (Conv2D) (None, 8, 8, 512) 2359808
block4_conv3 (Conv2D) (None, 8, 8, 512) 2359808
block4_pool (MaxPooling2D) (None, 4, 4, 512) 0
block5_conv1 (Conv2D) (None, 4, 4, 512) 2359808
block5_conv2 (Conv2D) (None, 4, 4, 512) 2359808
block5_conv3 (Conv2D) (None, 4, 4, 512) 2359808
block5_pool (MaxPooling2D) (None, 2, 2, 512) 0
flatten_10 (Flatten) (None, 2048) 0
dense_23 (Dense) (None, 2048) 4196352
dense_24 (Dense) (None, 128) 262272
Total params: 19,173,312
Trainable params: 19,173,312
Non-trainable params: 0
Here is preprocessing:
dir_data = "data_faces/img_align_celeba/"
Ntrain = 2000
Ntest = 100
nm_imgs = np.sort(os.listdir(dir_data))
## name of the jpg files for training set
nm_imgs_train = nm_imgs[:Ntrain]
## name of the jpg files for the testing data
nm_imgs_test = nm_imgs[Ntrain:Ntrain + Ntest]
img_shape = (64, 64, 3)
def get_npdata(nm_imgs_train):
X_train = []
for i, myid in enumerate(nm_imgs_train):
image = load_img(dir_data + "/" + myid,
image = img_to_array(image)/255.0
X_train = np.array(X_train)
X_train = get_npdata(nm_imgs_train)
X_train.shape = (2000, 64, 64, 3)
y_train.shape = (2000, 128)
I use batch size like:
idx = np.random.randint(0, X_train.shape[0], half_batch)
imgs = X_train[idx]
labels = y_train[idx]
reg_d_loss_real = facereg_disc.train_on_batch(imgs, labels)
Your issue comes from your loss function. As explained in the doc, SparseCategoricalCrossentropy expects each sample in y_true to be an integer encoding the class, whereas CategoricalCrossentropy expects a one-hot encoded representation (which is your case).
So, switch to CategoricalCrossentropy and you should be fine.
However, to reproduce, I had to change:
flat1 = Flatten()(model.outputs)
flat1 = Flatten()(model.outputs[0])

Unet: Multi Class Image Segmentation

I have recently started learning about Image Segmentation and UNet. I am trying to do a multi class Image Segmentation where I have 7 classes and input is a (256, 256, 3) rgb image and output is (256, 256, 1) grayscale image where each intensity value corresponds to one class. I am doing pixel wise softmax. I am using sparse categorical cross entropy so as to avoid doing One Hot Encoding.
def soft1(x):
return keras.activations.softmax(x, axis = -1)
def conv2d_block(input_tensor, n_filters, kernel_size = 3, batchnorm = True):
x = Conv2D(filters = n_filters, kernel_size = (kernel_size, kernel_size),\
kernel_initializer = 'he_normal', padding = 'same')(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(filters = n_filters, kernel_size = (kernel_size, kernel_size),\
kernel_initializer = 'he_normal', padding = 'same')(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def get_unet(input_img, n_classes, n_filters = 16, dropout = 0.1, batchnorm = True):
# Contracting Path
c1 = conv2d_block(input_img, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
p1 = MaxPooling2D((2, 2))(c1)
p1 = Dropout(dropout)(p1)
c2 = conv2d_block(p1, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
p2 = MaxPooling2D((2, 2))(c2)
p2 = Dropout(dropout)(p2)
c3 = conv2d_block(p2, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
p3 = MaxPooling2D((2, 2))(c3)
p3 = Dropout(dropout)(p3)
c4 = conv2d_block(p3, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
p4 = MaxPooling2D((2, 2))(c4)
p4 = Dropout(dropout)(p4)
c5 = conv2d_block(p4, n_filters = n_filters * 16, kernel_size = 3, batchnorm = batchnorm)
# Expansive Path
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides = (2, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
u7 = Conv2DTranspose(n_filters * 4, (3, 3), strides = (2, 2), padding = 'same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block(u7, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides = (2, 2), padding = 'same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block(u8, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
u9 = Conv2DTranspose(n_filters * 1, (3, 3), strides = (2, 2), padding = 'same')(c8)
u9 = concatenate([u9, c1])
u9 = Dropout(dropout)(u9)
c9 = conv2d_block(u9, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
outputs = Conv2D(n_classes, (1, 1))(c9)
outputs = Reshape((image_height*image_width, 1, n_classes), input_shape = (image_height, image_width, n_classes))(outputs)
outputs = Activation(soft1)(outputs)
model = Model(inputs=[input_img], outputs=[outputs])
return model
My Model Summary is:
Model: "model_2"
Layer (type) Output Shape Param # Connected to
input_12 (InputLayer) (None, 256, 256, 3) 0
conv2d_211 (Conv2D) (None, 256, 256, 16) 448 input_12[0][0]
batch_normalization_200 (BatchN (None, 256, 256, 16) 64 conv2d_211[0][0]
activation_204 (Activation) (None, 256, 256, 16) 0 batch_normalization_200[0][0]
max_pooling2d_45 (MaxPooling2D) (None, 128, 128, 16) 0 activation_204[0][0]
dropout_89 (Dropout) (None, 128, 128, 16) 0 max_pooling2d_45[0][0]
conv2d_213 (Conv2D) (None, 128, 128, 32) 4640 dropout_89[0][0]
batch_normalization_202 (BatchN (None, 128, 128, 32) 128 conv2d_213[0][0]
activation_206 (Activation) (None, 128, 128, 32) 0 batch_normalization_202[0][0]
max_pooling2d_46 (MaxPooling2D) (None, 64, 64, 32) 0 activation_206[0][0]
dropout_90 (Dropout) (None, 64, 64, 32) 0 max_pooling2d_46[0][0]
conv2d_215 (Conv2D) (None, 64, 64, 64) 18496 dropout_90[0][0]
batch_normalization_204 (BatchN (None, 64, 64, 64) 256 conv2d_215[0][0]
activation_208 (Activation) (None, 64, 64, 64) 0 batch_normalization_204[0][0]
max_pooling2d_47 (MaxPooling2D) (None, 32, 32, 64) 0 activation_208[0][0]
dropout_91 (Dropout) (None, 32, 32, 64) 0 max_pooling2d_47[0][0]
conv2d_217 (Conv2D) (None, 32, 32, 128) 73856 dropout_91[0][0]
batch_normalization_206 (BatchN (None, 32, 32, 128) 512 conv2d_217[0][0]
activation_210 (Activation) (None, 32, 32, 128) 0 batch_normalization_206[0][0]
max_pooling2d_48 (MaxPooling2D) (None, 16, 16, 128) 0 activation_210[0][0]
dropout_92 (Dropout) (None, 16, 16, 128) 0 max_pooling2d_48[0][0]
conv2d_219 (Conv2D) (None, 16, 16, 256) 295168 dropout_92[0][0]
batch_normalization_208 (BatchN (None, 16, 16, 256) 1024 conv2d_219[0][0]
activation_212 (Activation) (None, 16, 16, 256) 0 batch_normalization_208[0][0]
conv2d_transpose_45 (Conv2DTran (None, 32, 32, 128) 295040 activation_212[0][0]
concatenate_45 (Concatenate) (None, 32, 32, 256) 0 conv2d_transpose_45[0][0]
dropout_93 (Dropout) (None, 32, 32, 256) 0 concatenate_45[0][0]
conv2d_221 (Conv2D) (None, 32, 32, 128) 295040 dropout_93[0][0]
batch_normalization_210 (BatchN (None, 32, 32, 128) 512 conv2d_221[0][0]
activation_214 (Activation) (None, 32, 32, 128) 0 batch_normalization_210[0][0]
conv2d_transpose_46 (Conv2DTran (None, 64, 64, 64) 73792 activation_214[0][0]
concatenate_46 (Concatenate) (None, 64, 64, 128) 0 conv2d_transpose_46[0][0]
dropout_94 (Dropout) (None, 64, 64, 128) 0 concatenate_46[0][0]
conv2d_223 (Conv2D) (None, 64, 64, 64) 73792 dropout_94[0][0]
batch_normalization_212 (BatchN (None, 64, 64, 64) 256 conv2d_223[0][0]
activation_216 (Activation) (None, 64, 64, 64) 0 batch_normalization_212[0][0]
conv2d_transpose_47 (Conv2DTran (None, 128, 128, 32) 18464 activation_216[0][0]
concatenate_47 (Concatenate) (None, 128, 128, 64) 0 conv2d_transpose_47[0][0]
dropout_95 (Dropout) (None, 128, 128, 64) 0 concatenate_47[0][0]
conv2d_225 (Conv2D) (None, 128, 128, 32) 18464 dropout_95[0][0]
batch_normalization_214 (BatchN (None, 128, 128, 32) 128 conv2d_225[0][0]
activation_218 (Activation) (None, 128, 128, 32) 0 batch_normalization_214[0][0]
conv2d_transpose_48 (Conv2DTran (None, 256, 256, 16) 4624 activation_218[0][0]
concatenate_48 (Concatenate) (None, 256, 256, 32) 0 conv2d_transpose_48[0][0]
dropout_96 (Dropout) (None, 256, 256, 32) 0 concatenate_48[0][0]
conv2d_227 (Conv2D) (None, 256, 256, 16) 4624 dropout_96[0][0]
batch_normalization_216 (BatchN (None, 256, 256, 16) 64 conv2d_227[0][0]
activation_220 (Activation) (None, 256, 256, 16) 0 batch_normalization_216[0][0]
conv2d_228 (Conv2D) (None, 256, 256, 7) 119 activation_220[0][0]
reshape_12 (Reshape) (None, 65536, 1, 7) 0 conv2d_228[0][0]
activation_221 (Activation) (None, 65536, 1, 7) 0 reshape_12[0][0]
Total params: 1,179,511
Trainable params: 1,178,039
Non-trainable params: 1,472
Is my model right? Shouldn't the final output be (65536, 1, 1) as I am using softmax?
The code is compiling but dice coefficient is very low.
Your model should end in (256,256,7).
That is 7 classes per pixel, and the shape should agree with your output images that are (256,256,1). This will work only for 'sparse_categorical_crossentropy' or a custom loss.
So, up to conv_228 the model seems fine (didn't look in detail, though).
There is no need for anything that comes after this convolution.
You can place the softmax directly in the conv_228 or directly after.
y_train should be (256,256,1) for this.
Your output in fact represents its pixel of your image. For its pixel, you have as an output of 1x7. Since it is sigmoid the values that this representation takes are between 0-1. Therefore the output fires when you have the desired class and therefore segmentation. If it was (65536, 1, 1) you should have not categorical but dense representation.

Expand Model layer

I have a pretained model with summary:
Layer (type) Output Shape Param #
vgg19 (Model) (None, 4, 4, 512) 20024384
flatten_1 (Flatten) (None, 8192) 0
dense_1 (Dense) (None, 1024) 8389632
dropout_1 (Dropout) (None, 1024) 0
dense_2 (Dense) (None, 1024) 1049600
dense_3 (Dense) (None, 5) 5125
I need the version with vgg19 expanded not in a single layer. Something like
this :
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 128, 128, 3) 0
block1_conv1 (Conv2D) (None, 128, 128, 64) 1792
block1_conv2 (Conv2D) (None, 128, 128, 64) 36928
block1_pool (MaxPooling2D) (None, 64, 64, 64) 0
block2_conv1 (Conv2D) (None, 64, 64, 128) 73856
** end of vgg16 **
flatten_1 (Flatten) (None, 8192) 0
dense_1 (Dense) (None, 1024) 8389632
dropout_1 (Dropout) (None, 1024) 0
dense_2 (Dense) (None, 1024) 1049600
dense_3 (Dense) (None, 5) 5125
I have trying to copy layer by layer but I have encountered lots of problems. There exist a way to accomplish this, that also copy the weights?
I don't know how you implemented, you can see the code how I implemented. I hope it will help.
from keras.applications.vgg19 import VGG19
from keras.models import Model
from keras.layers import *
model = VGG19(weights='imagenet', include_top=False, input_shape=(128,128,3))
flatten_1 = Flatten()(model.output)
dense_1 = Dense(1024)(flatten_1)
dropout_1 = Dropout(0.2)(dense_1)
dense_2 = Dense(1024)(dropout_1)
dense_3 = Dense(5)(dense_2)
model = Model(inputs=model.input, outputs=dense_3)
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 128, 128, 3) 0
block1_conv1 (Conv2D) (None, 128, 128, 64) 1792
block1_conv2 (Conv2D) (None, 128, 128, 64) 36928
block1_pool (MaxPooling2D) (None, 64, 64, 64) 0
block2_conv1 (Conv2D) (None, 64, 64, 128) 73856
block2_conv2 (Conv2D) (None, 64, 64, 128) 147584
block2_pool (MaxPooling2D) (None, 32, 32, 128) 0
block3_conv1 (Conv2D) (None, 32, 32, 256) 295168
block3_conv2 (Conv2D) (None, 32, 32, 256) 590080
block3_conv3 (Conv2D) (None, 32, 32, 256) 590080
block3_conv4 (Conv2D) (None, 32, 32, 256) 590080
block3_pool (MaxPooling2D) (None, 16, 16, 256) 0
block4_conv1 (Conv2D) (None, 16, 16, 512) 1180160
block4_conv2 (Conv2D) (None, 16, 16, 512) 2359808
block4_conv3 (Conv2D) (None, 16, 16, 512) 2359808
block4_conv4 (Conv2D) (None, 16, 16, 512) 2359808
block4_pool (MaxPooling2D) (None, 8, 8, 512) 0
block5_conv1 (Conv2D) (None, 8, 8, 512) 2359808
block5_conv2 (Conv2D) (None, 8, 8, 512) 2359808
block5_conv3 (Conv2D) (None, 8, 8, 512) 2359808
block5_conv4 (Conv2D) (None, 8, 8, 512) 2359808
block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
flatten_1 (Flatten) (None, 8192) 0
dense_1 (Dense) (None, 1024) 8389632
dropout_1 (Dropout) (None, 1024) 0
dense_2 (Dense) (None, 1024) 1049600
dense_3 (Dense) (None, 5) 5125
Total params: 29,468,741
Trainable params: 29,468,741
Non-trainable params: 0

Keras implementation of VGG19 net has 26 layers. How?

A VGG-19 network has 25 layers as shown here. But if I check the number of layers in Keras implementation, it shows 26 layers. How?
model = VGG19()
gives output
If you are confused, you can print out the structure of VGG19 directly with model.summary(). It show a layer input_1 (InputLayer) as the input layer.
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
flatten (Flatten) (None, 25088) 0
fc1 (Dense) (None, 4096) 102764544
fc2 (Dense) (None, 4096) 16781312
predictions (Dense) (None, 1000) 4097000
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
If you want to get output from 1st FC layer, you should use model.layers[23] instead of 22. In fact, you can print out the shape directly and compare it with the output of model.summary().
(?, ?) # flatten (Flatten)
(?, 4096) # fc1 (Dense)
(?, 4096) # fc2 (Dense)
(?, 1000) # predictions (Dense)
In addition, you can get 1st FC layer directly by using the layer name 'fc1'.
(?, 4096)
The 19 in VGG-19 refers to layers with learn-able weights. If you print the model summary you get the following
Layer (type) Output Shape Param #
input_1 (InputLayer) (None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
flatten (Flatten) (None, 25088) 0
fc1 (Dense) (None, 4096) 102764544
fc2 (Dense) (None, 4096) 16781312
predictions (Dense) (None, 1000) 4097000
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
Here you have 7 layers that don't have any learn-able weights. These are one InputLayer, five MaxPooling2D layer and one Flatten layer. This is how you get 26 layers (19+1+5+1).
