Multi-class semantic segmentation difficulty with RGB images - keras

Could anyone help me with this problem of multi-class semantic segmentation. I have modified a code to accept RGB images and RGB labels as masks. I am using the following model
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
img (InputLayer) (None, 128, 128, 3) 0
__________________________________________________________________________________________________
conv2d_170 (Conv2D) (None, 128, 128, 16) 448 img[0][0]
__________________________________________________________________________________________________
batch_normalization_163 (BatchN (None, 128, 128, 16) 64 conv2d_170[0][0]
__________________________________________________________________________________________________
activation_163 (Activation) (None, 128, 128, 16) 0 batch_normalization_163[0][0]
__________________________________________________________________________________________________
conv2d_171 (Conv2D) (None, 128, 128, 16) 2320 activation_163[0][0]
__________________________________________________________________________________________________
batch_normalization_164 (BatchN (None, 128, 128, 16) 64 conv2d_171[0][0]
__________________________________________________________________________________________________
activation_164 (Activation) (None, 128, 128, 16) 0 batch_normalization_164[0][0]
__________________________________________________________________________________________________
max_pooling2d_37 (MaxPooling2D) (None, 64, 64, 16) 0 activation_164[0][0]
__________________________________________________________________________________________________
dropout_73 (Dropout) (None, 64, 64, 16) 0 max_pooling2d_37[0][0]
__________________________________________________________________________________________________
conv2d_172 (Conv2D) (None, 64, 64, 32) 4640 dropout_73[0][0]
__________________________________________________________________________________________________
batch_normalization_165 (BatchN (None, 64, 64, 32) 128 conv2d_172[0][0]
__________________________________________________________________________________________________
activation_165 (Activation) (None, 64, 64, 32) 0 batch_normalization_165[0][0]
__________________________________________________________________________________________________
conv2d_173 (Conv2D) (None, 64, 64, 32) 9248 activation_165[0][0]
__________________________________________________________________________________________________
batch_normalization_166 (BatchN (None, 64, 64, 32) 128 conv2d_173[0][0]
__________________________________________________________________________________________________
activation_166 (Activation) (None, 64, 64, 32) 0 batch_normalization_166[0][0]
__________________________________________________________________________________________________
max_pooling2d_38 (MaxPooling2D) (None, 32, 32, 32) 0 activation_166[0][0]
__________________________________________________________________________________________________
dropout_74 (Dropout) (None, 32, 32, 32) 0 max_pooling2d_38[0][0]
__________________________________________________________________________________________________
conv2d_174 (Conv2D) (None, 32, 32, 64) 18496 dropout_74[0][0]
__________________________________________________________________________________________________
batch_normalization_167 (BatchN (None, 32, 32, 64) 256 conv2d_174[0][0]
__________________________________________________________________________________________________
activation_167 (Activation) (None, 32, 32, 64) 0 batch_normalization_167[0][0]
__________________________________________________________________________________________________
conv2d_175 (Conv2D) (None, 32, 32, 64) 36928 activation_167[0][0]
__________________________________________________________________________________________________
batch_normalization_168 (BatchN (None, 32, 32, 64) 256 conv2d_175[0][0]
__________________________________________________________________________________________________
activation_168 (Activation) (None, 32, 32, 64) 0 batch_normalization_168[0][0]
__________________________________________________________________________________________________
max_pooling2d_39 (MaxPooling2D) (None, 16, 16, 64) 0 activation_168[0][0]
__________________________________________________________________________________________________
dropout_75 (Dropout) (None, 16, 16, 64) 0 max_pooling2d_39[0][0]
__________________________________________________________________________________________________
conv2d_176 (Conv2D) (None, 16, 16, 128) 73856 dropout_75[0][0]
__________________________________________________________________________________________________
batch_normalization_169 (BatchN (None, 16, 16, 128) 512 conv2d_176[0][0]
__________________________________________________________________________________________________
activation_169 (Activation) (None, 16, 16, 128) 0 batch_normalization_169[0][0]
__________________________________________________________________________________________________
conv2d_177 (Conv2D) (None, 16, 16, 128) 147584 activation_169[0][0]
__________________________________________________________________________________________________
batch_normalization_170 (BatchN (None, 16, 16, 128) 512 conv2d_177[0][0]
__________________________________________________________________________________________________
activation_170 (Activation) (None, 16, 16, 128) 0 batch_normalization_170[0][0]
__________________________________________________________________________________________________
max_pooling2d_40 (MaxPooling2D) (None, 8, 8, 128) 0 activation_170[0][0]
__________________________________________________________________________________________________
dropout_76 (Dropout) (None, 8, 8, 128) 0 max_pooling2d_40[0][0]
__________________________________________________________________________________________________
conv2d_178 (Conv2D) (None, 8, 8, 256) 295168 dropout_76[0][0]
__________________________________________________________________________________________________
batch_normalization_171 (BatchN (None, 8, 8, 256) 1024 conv2d_178[0][0]
__________________________________________________________________________________________________
activation_171 (Activation) (None, 8, 8, 256) 0 batch_normalization_171[0][0]
__________________________________________________________________________________________________
conv2d_179 (Conv2D) (None, 8, 8, 256) 590080 activation_171[0][0]
__________________________________________________________________________________________________
batch_normalization_172 (BatchN (None, 8, 8, 256) 1024 conv2d_179[0][0]
__________________________________________________________________________________________________
activation_172 (Activation) (None, 8, 8, 256) 0 batch_normalization_172[0][0]
__________________________________________________________________________________________________
conv2d_transpose_37 (Conv2DTran (None, 16, 16, 128) 295040 activation_172[0][0]
__________________________________________________________________________________________________
concatenate_37 (Concatenate) (None, 16, 16, 256) 0 conv2d_transpose_37[0][0]
activation_170[0][0]
__________________________________________________________________________________________________
dropout_77 (Dropout) (None, 16, 16, 256) 0 concatenate_37[0][0]
__________________________________________________________________________________________________
conv2d_180 (Conv2D) (None, 16, 16, 128) 295040 dropout_77[0][0]
__________________________________________________________________________________________________
batch_normalization_173 (BatchN (None, 16, 16, 128) 512 conv2d_180[0][0]
__________________________________________________________________________________________________
activation_173 (Activation) (None, 16, 16, 128) 0 batch_normalization_173[0][0]
__________________________________________________________________________________________________
conv2d_181 (Conv2D) (None, 16, 16, 128) 147584 activation_173[0][0]
__________________________________________________________________________________________________
batch_normalization_174 (BatchN (None, 16, 16, 128) 512 conv2d_181[0][0]
__________________________________________________________________________________________________
activation_174 (Activation) (None, 16, 16, 128) 0 batch_normalization_174[0][0]
__________________________________________________________________________________________________
conv2d_transpose_38 (Conv2DTran (None, 32, 32, 64) 73792 activation_174[0][0]
__________________________________________________________________________________________________
concatenate_38 (Concatenate) (None, 32, 32, 128) 0 conv2d_transpose_38[0][0]
activation_168[0][0]
__________________________________________________________________________________________________
dropout_78 (Dropout) (None, 32, 32, 128) 0 concatenate_38[0][0]
__________________________________________________________________________________________________
conv2d_182 (Conv2D) (None, 32, 32, 64) 73792 dropout_78[0][0]
__________________________________________________________________________________________________
batch_normalization_175 (BatchN (None, 32, 32, 64) 256 conv2d_182[0][0]
__________________________________________________________________________________________________
activation_175 (Activation) (None, 32, 32, 64) 0 batch_normalization_175[0][0]
__________________________________________________________________________________________________
conv2d_183 (Conv2D) (None, 32, 32, 64) 36928 activation_175[0][0]
__________________________________________________________________________________________________
batch_normalization_176 (BatchN (None, 32, 32, 64) 256 conv2d_183[0][0]
__________________________________________________________________________________________________
activation_176 (Activation) (None, 32, 32, 64) 0 batch_normalization_176[0][0]
__________________________________________________________________________________________________
conv2d_transpose_39 (Conv2DTran (None, 64, 64, 32) 18464 activation_176[0][0]
__________________________________________________________________________________________________
concatenate_39 (Concatenate) (None, 64, 64, 64) 0 conv2d_transpose_39[0][0]
activation_166[0][0]
__________________________________________________________________________________________________
dropout_79 (Dropout) (None, 64, 64, 64) 0 concatenate_39[0][0]
__________________________________________________________________________________________________
conv2d_184 (Conv2D) (None, 64, 64, 32) 18464 dropout_79[0][0]
__________________________________________________________________________________________________
batch_normalization_177 (BatchN (None, 64, 64, 32) 128 conv2d_184[0][0]
__________________________________________________________________________________________________
activation_177 (Activation) (None, 64, 64, 32) 0 batch_normalization_177[0][0]
__________________________________________________________________________________________________
conv2d_185 (Conv2D) (None, 64, 64, 32) 9248 activation_177[0][0]
__________________________________________________________________________________________________
batch_normalization_178 (BatchN (None, 64, 64, 32) 128 conv2d_185[0][0]
__________________________________________________________________________________________________
activation_178 (Activation) (None, 64, 64, 32) 0 batch_normalization_178[0][0]
__________________________________________________________________________________________________
conv2d_transpose_40 (Conv2DTran (None, 128, 128, 16) 4624 activation_178[0][0]
__________________________________________________________________________________________________
concatenate_40 (Concatenate) (None, 128, 128, 32) 0 conv2d_transpose_40[0][0]
activation_164[0][0]
__________________________________________________________________________________________________
dropout_80 (Dropout) (None, 128, 128, 32) 0 concatenate_40[0][0]
__________________________________________________________________________________________________
conv2d_186 (Conv2D) (None, 128, 128, 16) 4624 dropout_80[0][0]
__________________________________________________________________________________________________
batch_normalization_179 (BatchN (None, 128, 128, 16) 64 conv2d_186[0][0]
__________________________________________________________________________________________________
activation_179 (Activation) (None, 128, 128, 16) 0 batch_normalization_179[0][0]
__________________________________________________________________________________________________
conv2d_187 (Conv2D) (None, 128, 128, 16) 2320 activation_179[0][0]
__________________________________________________________________________________________________
batch_normalization_180 (BatchN (None, 128, 128, 16) 64 conv2d_187[0][0]
__________________________________________________________________________________________________
activation_180 (Activation) (None, 128, 128, 16) 0 batch_normalization_180[0][0]
__________________________________________________________________________________________________
conv2d_188 (Conv2D) (None, 128, 128, 1) 17 activation_180[0][0]
==================================================================================================
Total params: 2,164,593
Trainable params: 2,161,649
Non-trainable params: 2,944
__________________________________________________________________________________________________
As you can see the input has 3 channels. Should the last layer have 1 channel of 11 channels? The dataset I am using has 11 classes which are denoted by different RGB value combinations in the image..
Thanks.

The last layer should be 11 channels corresponding to the 11 classes for each pixel location. It is just like doing a multi-class classification for each pixel location.

Related

Keras Pretrained ResNet 101 V2: How to get filter size used?

I am using keras' pretrained resnet 101 v2 CNN model. I wanted to know what the size of the filter was. I tried checking my model's summary but it doesn't really tell me the size directly. is it a 2x2x2 matrix or a 3x3x3 or something else?
The snippet of the model summary is:
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 255, 255, 3) 0
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 261, 261, 3) 0 input_3[0][0]
__________________________________________________________________________________________________
conv1_conv (Conv2D) (None, 128, 128, 64) 9472 conv1_pad[0][0]
__________________________________________________________________________________________________
pool1_pad (ZeroPadding2D) (None, 130, 130, 64) 0 conv1_conv[0][0]
__________________________________________________________________________________________________
pool1_pool (MaxPooling2D) (None, 64, 64, 64) 0 pool1_pad[0][0]
__________________________________________________________________________________________________
conv2_block1_preact_bn (BatchNo (None, 64, 64, 64) 256 pool1_pool[0][0]
__________________________________________________________________________________________________
conv2_block1_preact_relu (Activ (None, 64, 64, 64) 0 conv2_block1_preact_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_1_conv (Conv2D) (None, 64, 64, 64) 4096 conv2_block1_preact_relu[0][0]
__________________________________________________________________________________________________
conv2_block1_1_bn (BatchNormali (None, 64, 64, 64) 256 conv2_block1_1_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_1_relu (Activation (None, 64, 64, 64) 0 conv2_block1_1_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_2_pad (ZeroPadding (None, 66, 66, 64) 0 conv2_block1_1_relu[0][0]
__________________________________________________________________________________________________
conv2_block1_2_conv (Conv2D) (None, 64, 64, 64) 36864 conv2_block1_2_pad[0][0]
__________________________________________________________________________________________________
conv2_block1_2_bn (BatchNormali (None, 64, 64, 64) 256 conv2_block1_2_conv[0][0]
__________________________________________________________________________________________________
conv2_block1_2_relu (Activation (None, 64, 64, 64) 0 conv2_block1_2_bn[0][0]
__________________________________________________________________________________________________
conv2_block1_0_conv (Conv2D) (None, 64, 64, 256) 16640 conv2_block1_preact_relu[0][0]
__________________________________________________________________________________________________
conv2_block1_3_conv (Conv2D) (None, 64, 64, 256) 16640 conv2_block1_2_relu[0][0]
I am not sure if there is a predefined method to get this. It should be possible to get filter shape, count for a layer this way for a certain layer,
print(model.layers[2].name)
print(model.layers[2].weights[0].shape)
This gives output,
conv1_conv
(7, 7, 3, 64)
Printing print(model.layers[2].weights) gives something like,
conv1_conv
[<tf.Variable 'conv1_conv/kernel:0' shape=(7, 7, 3, 64) dtype=float32, numpy=
array([[[[ 2.04881709e-02, 1.74432080e-02, -1.19661177e-02, ...,
...
To get details for all the layers,
for i, layer in enumerate(model.layers):
print(layer.name)
if layer.weights:
print(layer.weights[0].shape)
print(layer.weights[1].shape)
print('-' * 30)
Partial output,
input_3
------------------------------
conv1_pad
------------------------------
conv1_conv
(7, 7, 3, 64)
(64,)
------------------------------
conv1_bn
(64,)
(64,)
------------------------------
conv1_relu
------------------------------
pool1_pad
------------------------------
pool1_pool
------------------------------
conv2_block1_1_conv
(1, 1, 64, 64)
(64,)
------------------------------
conv2_block1_1_bn
(64,)
(64,)
------------------------------
conv2_block1_1_relu
------------------------------
conv2_block1_2_conv
(3, 3, 64, 64)
(64,)
------------------------------

Keras multiple output expected shape and got shape

I am try to train a model which detect 128d vector to recognize face. Input of model is an image and output is 128d vector (regression) which get from "face_recognition" library.
When I put 128 output to train I got this error:
ValueError: Error when checking target: expected dense_24 to have shape (1,) but got array with shape (128,)
But when I try only one output, fit function works.
The strange part of that prediction shape is (1, 128) but I can't give 128 output to train.
Here is my model:
from keras.applications.vgg16 import VGG16
from keras.layers import Flatten, Dense
import keras
def build_facereg_disc():
# load model
model = VGG16(include_top=False, input_shape=(64, 64, 3))
# add new classifier layers
flat1 = Flatten()(model.outputs)
class1 = Dense(2048, activation='relu')(flat1)
output = Dense(128, activation='relu')(class1)
# define new model
model = models.Model(inputs=model.inputs, outputs=output)
# summarize
return model
facereg_disc = build_facereg_disc()
facereg_disc.compile(optimizer=keras.optimizers.Adam(), # Optimizer
# Loss function to minimize
loss=keras.losses.SparseCategoricalCrossentropy(),
# List of metrics to monitor
metrics=['binary_crossentropy'])
And summary:
Model: "model_27"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_20 (InputLayer) (None, 64, 64, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 64, 64, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 64, 64, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 32, 32, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 32, 32, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 32, 32, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 16, 16, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 16, 16, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 16, 16, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 16, 16, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 8, 8, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 8, 8, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 4, 4, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 2, 2, 512) 0
_________________________________________________________________
flatten_10 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_23 (Dense) (None, 2048) 4196352
_________________________________________________________________
dense_24 (Dense) (None, 128) 262272
=================================================================
Total params: 19,173,312
Trainable params: 19,173,312
Non-trainable params: 0
Here is preprocessing:
dir_data = "data_faces/img_align_celeba/"
Ntrain = 2000
Ntest = 100
nm_imgs = np.sort(os.listdir(dir_data))
## name of the jpg files for training set
nm_imgs_train = nm_imgs[:Ntrain]
## name of the jpg files for the testing data
nm_imgs_test = nm_imgs[Ntrain:Ntrain + Ntest]
img_shape = (64, 64, 3)
def get_npdata(nm_imgs_train):
X_train = []
for i, myid in enumerate(nm_imgs_train):
image = load_img(dir_data + "/" + myid,
target_size=img_shape[:2])
image = img_to_array(image)/255.0
X_train.append(image)
X_train = np.array(X_train)
return(X_train)
X_train = get_npdata(nm_imgs_train)
X_train.shape = (2000, 64, 64, 3)
y_train.shape = (2000, 128)
I use batch size like:
idx = np.random.randint(0, X_train.shape[0], half_batch)
imgs = X_train[idx]
labels = y_train[idx]
reg_d_loss_real = facereg_disc.train_on_batch(imgs, labels)
Your issue comes from your loss function. As explained in the doc, SparseCategoricalCrossentropy expects each sample in y_true to be an integer encoding the class, whereas CategoricalCrossentropy expects a one-hot encoded representation (which is your case).
So, switch to CategoricalCrossentropy and you should be fine.
However, to reproduce, I had to change:
flat1 = Flatten()(model.outputs)
To:
flat1 = Flatten()(model.outputs[0])

Unet: Multi Class Image Segmentation

I have recently started learning about Image Segmentation and UNet. I am trying to do a multi class Image Segmentation where I have 7 classes and input is a (256, 256, 3) rgb image and output is (256, 256, 1) grayscale image where each intensity value corresponds to one class. I am doing pixel wise softmax. I am using sparse categorical cross entropy so as to avoid doing One Hot Encoding.
def soft1(x):
return keras.activations.softmax(x, axis = -1)
def conv2d_block(input_tensor, n_filters, kernel_size = 3, batchnorm = True):
x = Conv2D(filters = n_filters, kernel_size = (kernel_size, kernel_size),\
kernel_initializer = 'he_normal', padding = 'same')(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = Conv2D(filters = n_filters, kernel_size = (kernel_size, kernel_size),\
kernel_initializer = 'he_normal', padding = 'same')(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def get_unet(input_img, n_classes, n_filters = 16, dropout = 0.1, batchnorm = True):
# Contracting Path
c1 = conv2d_block(input_img, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
p1 = MaxPooling2D((2, 2))(c1)
p1 = Dropout(dropout)(p1)
c2 = conv2d_block(p1, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
p2 = MaxPooling2D((2, 2))(c2)
p2 = Dropout(dropout)(p2)
c3 = conv2d_block(p2, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
p3 = MaxPooling2D((2, 2))(c3)
p3 = Dropout(dropout)(p3)
c4 = conv2d_block(p3, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
p4 = MaxPooling2D((2, 2))(c4)
p4 = Dropout(dropout)(p4)
c5 = conv2d_block(p4, n_filters = n_filters * 16, kernel_size = 3, batchnorm = batchnorm)
# Expansive Path
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides = (2, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
u7 = Conv2DTranspose(n_filters * 4, (3, 3), strides = (2, 2), padding = 'same')(c6)
u7 = concatenate([u7, c3])
u7 = Dropout(dropout)(u7)
c7 = conv2d_block(u7, n_filters * 4, kernel_size = 3, batchnorm = batchnorm)
u8 = Conv2DTranspose(n_filters * 2, (3, 3), strides = (2, 2), padding = 'same')(c7)
u8 = concatenate([u8, c2])
u8 = Dropout(dropout)(u8)
c8 = conv2d_block(u8, n_filters * 2, kernel_size = 3, batchnorm = batchnorm)
u9 = Conv2DTranspose(n_filters * 1, (3, 3), strides = (2, 2), padding = 'same')(c8)
u9 = concatenate([u9, c1])
u9 = Dropout(dropout)(u9)
c9 = conv2d_block(u9, n_filters * 1, kernel_size = 3, batchnorm = batchnorm)
outputs = Conv2D(n_classes, (1, 1))(c9)
outputs = Reshape((image_height*image_width, 1, n_classes), input_shape = (image_height, image_width, n_classes))(outputs)
outputs = Activation(soft1)(outputs)
model = Model(inputs=[input_img], outputs=[outputs])
print(outputs.shape)
return model
My Model Summary is:
Model: "model_2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_12 (InputLayer) (None, 256, 256, 3) 0
__________________________________________________________________________________________________
conv2d_211 (Conv2D) (None, 256, 256, 16) 448 input_12[0][0]
__________________________________________________________________________________________________
batch_normalization_200 (BatchN (None, 256, 256, 16) 64 conv2d_211[0][0]
__________________________________________________________________________________________________
activation_204 (Activation) (None, 256, 256, 16) 0 batch_normalization_200[0][0]
__________________________________________________________________________________________________
max_pooling2d_45 (MaxPooling2D) (None, 128, 128, 16) 0 activation_204[0][0]
__________________________________________________________________________________________________
dropout_89 (Dropout) (None, 128, 128, 16) 0 max_pooling2d_45[0][0]
__________________________________________________________________________________________________
conv2d_213 (Conv2D) (None, 128, 128, 32) 4640 dropout_89[0][0]
__________________________________________________________________________________________________
batch_normalization_202 (BatchN (None, 128, 128, 32) 128 conv2d_213[0][0]
__________________________________________________________________________________________________
activation_206 (Activation) (None, 128, 128, 32) 0 batch_normalization_202[0][0]
__________________________________________________________________________________________________
max_pooling2d_46 (MaxPooling2D) (None, 64, 64, 32) 0 activation_206[0][0]
__________________________________________________________________________________________________
dropout_90 (Dropout) (None, 64, 64, 32) 0 max_pooling2d_46[0][0]
__________________________________________________________________________________________________
conv2d_215 (Conv2D) (None, 64, 64, 64) 18496 dropout_90[0][0]
__________________________________________________________________________________________________
batch_normalization_204 (BatchN (None, 64, 64, 64) 256 conv2d_215[0][0]
__________________________________________________________________________________________________
activation_208 (Activation) (None, 64, 64, 64) 0 batch_normalization_204[0][0]
__________________________________________________________________________________________________
max_pooling2d_47 (MaxPooling2D) (None, 32, 32, 64) 0 activation_208[0][0]
__________________________________________________________________________________________________
dropout_91 (Dropout) (None, 32, 32, 64) 0 max_pooling2d_47[0][0]
__________________________________________________________________________________________________
conv2d_217 (Conv2D) (None, 32, 32, 128) 73856 dropout_91[0][0]
__________________________________________________________________________________________________
batch_normalization_206 (BatchN (None, 32, 32, 128) 512 conv2d_217[0][0]
__________________________________________________________________________________________________
activation_210 (Activation) (None, 32, 32, 128) 0 batch_normalization_206[0][0]
__________________________________________________________________________________________________
max_pooling2d_48 (MaxPooling2D) (None, 16, 16, 128) 0 activation_210[0][0]
__________________________________________________________________________________________________
dropout_92 (Dropout) (None, 16, 16, 128) 0 max_pooling2d_48[0][0]
__________________________________________________________________________________________________
conv2d_219 (Conv2D) (None, 16, 16, 256) 295168 dropout_92[0][0]
__________________________________________________________________________________________________
batch_normalization_208 (BatchN (None, 16, 16, 256) 1024 conv2d_219[0][0]
__________________________________________________________________________________________________
activation_212 (Activation) (None, 16, 16, 256) 0 batch_normalization_208[0][0]
__________________________________________________________________________________________________
conv2d_transpose_45 (Conv2DTran (None, 32, 32, 128) 295040 activation_212[0][0]
__________________________________________________________________________________________________
concatenate_45 (Concatenate) (None, 32, 32, 256) 0 conv2d_transpose_45[0][0]
activation_210[0][0]
__________________________________________________________________________________________________
dropout_93 (Dropout) (None, 32, 32, 256) 0 concatenate_45[0][0]
__________________________________________________________________________________________________
conv2d_221 (Conv2D) (None, 32, 32, 128) 295040 dropout_93[0][0]
__________________________________________________________________________________________________
batch_normalization_210 (BatchN (None, 32, 32, 128) 512 conv2d_221[0][0]
__________________________________________________________________________________________________
activation_214 (Activation) (None, 32, 32, 128) 0 batch_normalization_210[0][0]
__________________________________________________________________________________________________
conv2d_transpose_46 (Conv2DTran (None, 64, 64, 64) 73792 activation_214[0][0]
__________________________________________________________________________________________________
concatenate_46 (Concatenate) (None, 64, 64, 128) 0 conv2d_transpose_46[0][0]
activation_208[0][0]
__________________________________________________________________________________________________
dropout_94 (Dropout) (None, 64, 64, 128) 0 concatenate_46[0][0]
__________________________________________________________________________________________________
conv2d_223 (Conv2D) (None, 64, 64, 64) 73792 dropout_94[0][0]
__________________________________________________________________________________________________
batch_normalization_212 (BatchN (None, 64, 64, 64) 256 conv2d_223[0][0]
__________________________________________________________________________________________________
activation_216 (Activation) (None, 64, 64, 64) 0 batch_normalization_212[0][0]
__________________________________________________________________________________________________
conv2d_transpose_47 (Conv2DTran (None, 128, 128, 32) 18464 activation_216[0][0]
__________________________________________________________________________________________________
concatenate_47 (Concatenate) (None, 128, 128, 64) 0 conv2d_transpose_47[0][0]
activation_206[0][0]
__________________________________________________________________________________________________
dropout_95 (Dropout) (None, 128, 128, 64) 0 concatenate_47[0][0]
__________________________________________________________________________________________________
conv2d_225 (Conv2D) (None, 128, 128, 32) 18464 dropout_95[0][0]
__________________________________________________________________________________________________
batch_normalization_214 (BatchN (None, 128, 128, 32) 128 conv2d_225[0][0]
__________________________________________________________________________________________________
activation_218 (Activation) (None, 128, 128, 32) 0 batch_normalization_214[0][0]
__________________________________________________________________________________________________
conv2d_transpose_48 (Conv2DTran (None, 256, 256, 16) 4624 activation_218[0][0]
__________________________________________________________________________________________________
concatenate_48 (Concatenate) (None, 256, 256, 32) 0 conv2d_transpose_48[0][0]
activation_204[0][0]
__________________________________________________________________________________________________
dropout_96 (Dropout) (None, 256, 256, 32) 0 concatenate_48[0][0]
__________________________________________________________________________________________________
conv2d_227 (Conv2D) (None, 256, 256, 16) 4624 dropout_96[0][0]
__________________________________________________________________________________________________
batch_normalization_216 (BatchN (None, 256, 256, 16) 64 conv2d_227[0][0]
__________________________________________________________________________________________________
activation_220 (Activation) (None, 256, 256, 16) 0 batch_normalization_216[0][0]
__________________________________________________________________________________________________
conv2d_228 (Conv2D) (None, 256, 256, 7) 119 activation_220[0][0]
__________________________________________________________________________________________________
reshape_12 (Reshape) (None, 65536, 1, 7) 0 conv2d_228[0][0]
__________________________________________________________________________________________________
activation_221 (Activation) (None, 65536, 1, 7) 0 reshape_12[0][0]
==================================================================================================
Total params: 1,179,511
Trainable params: 1,178,039
Non-trainable params: 1,472
__________________________________________________________________________________________________
Is my model right? Shouldn't the final output be (65536, 1, 1) as I am using softmax?
The code is compiling but dice coefficient is very low.
Your model should end in (256,256,7).
That is 7 classes per pixel, and the shape should agree with your output images that are (256,256,1). This will work only for 'sparse_categorical_crossentropy' or a custom loss.
So, up to conv_228 the model seems fine (didn't look in detail, though).
There is no need for anything that comes after this convolution.
You can place the softmax directly in the conv_228 or directly after.
y_train should be (256,256,1) for this.
Your output in fact represents its pixel of your image. For its pixel, you have as an output of 1x7. Since it is sigmoid the values that this representation takes are between 0-1. Therefore the output fires when you have the desired class and therefore segmentation. If it was (65536, 1, 1) you should have not categorical but dense representation.

Expand Model layer

I have a pretained model with summary:
Layer (type) Output Shape Param #
=================================================================
vgg19 (Model) (None, 4, 4, 512) 20024384
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 8389632
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 5) 5125
=================================================================
I need the version with vgg19 expanded not in a single layer. Something like
this :
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 128, 128, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 128, 128, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 128, 128, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 64, 64, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 64, 64, 128) 73856
.
.
.
** end of vgg16 **
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 8389632
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 5) 5125
=================================================================
I have trying to copy layer by layer but I have encountered lots of problems. There exist a way to accomplish this, that also copy the weights?
I don't know how you implemented, you can see the code how I implemented. I hope it will help.
from keras.applications.vgg19 import VGG19
from keras.models import Model
from keras.layers import *
model = VGG19(weights='imagenet', include_top=False, input_shape=(128,128,3))
flatten_1 = Flatten()(model.output)
dense_1 = Dense(1024)(flatten_1)
dropout_1 = Dropout(0.2)(dense_1)
dense_2 = Dense(1024)(dropout_1)
dense_3 = Dense(5)(dense_2)
model = Model(inputs=model.input, outputs=dense_3)
print(model.summary())
Result.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 128, 128, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 128, 128, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 128, 128, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 64, 64, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 64, 64, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 64, 64, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 32, 32, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 32, 32, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 32, 32, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 32, 32, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 32, 32, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 16, 16, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 16, 16, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 8, 8, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block5_conv4 (Conv2D) (None, 8, 8, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 4, 4, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 8192) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 8389632
_________________________________________________________________
dropout_1 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 5) 5125
=================================================================
Total params: 29,468,741
Trainable params: 29,468,741
Non-trainable params: 0
_________________________________________________________________

Keras implementation of VGG19 net has 26 layers. How?

A VGG-19 network has 25 layers as shown here. But if I check the number of layers in Keras implementation, it shows 26 layers. How?
model = VGG19()
len(model.layers)
gives output
26
If you are confused, you can print out the structure of VGG19 directly with model.summary(). It show a layer input_1 (InputLayer) as the input layer.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
_________________________________________________________________
If you want to get output from 1st FC layer, you should use model.layers[23] instead of 22. In fact, you can print out the shape directly and compare it with the output of model.summary().
print(model.layers[22].output.shape)
print(model.layers[23].output.shape)
print(model.layers[24].output.shape)
print(model.layers[25].output.shape)
(?, ?) # flatten (Flatten)
(?, 4096) # fc1 (Dense)
(?, 4096) # fc2 (Dense)
(?, 1000) # predictions (Dense)
In addition, you can get 1st FC layer directly by using the layer name 'fc1'.
print(model.get_layer('fc1').output.shape)
(?, 4096)
The 19 in VGG-19 refers to layers with learn-able weights. If you print the model summary you get the following
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
=================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
Here you have 7 layers that don't have any learn-able weights. These are one InputLayer, five MaxPooling2D layer and one Flatten layer. This is how you get 26 layers (19+1+5+1).

Resources