I am using the u-net code from this Kaggle notebook that I've also pasted below:
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = Lambda(lambda x: x / 255) (inputs)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (s)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)
c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (p1)
c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (p2)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)
c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (p3)
c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (p4)
c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (c5)
u6 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (u6)
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (c6)
u7 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (u7)
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (c7)
u8 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (u8)
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (c8)
u9 = Conv2DTranspose(8, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (u9)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (c9)
outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[mean_iou])
My question is where to properly add a kernal_regularizer (l2 regularization). I've looked at countless repos and notebooks, but I'm not able to find any source where l2 regularization was used successfully. Although I know how l2 regularization works, I have no intuition about which layers to add it into.
Hence, some intuition on where to add the kernal regularizer and what to set the param to would be helpful.
Going over the Kaggele notebook you have linked. It appears that no weight regularization is being used throughout the entire model (so the code you added is correct).
This is quit peculiar and very uncommon, in almost all cases and models, L2 weight regularization (a.k.a ridge regression) is being used in every single layer, perhaps just with different weight-decay coefficients.
I suggest adding weight regularization to all the layers but starting with a very small weight decay coefficient:
c1 = Conv2D(8, (3, 3), activation='relu', padding='same', kernel_regularizer=regularizers.l2(w_decay)) (s)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same', kernel_regularizer=regularizers.l2(w_decay)) (c1)
p1 = MaxPooling2D((2, 2)) (c1)
...
Related
I am trying to write a VGG19 neural network for single-channel images, where everything is essentially the same as in a three-channel network except for the input layer.
def model(self, inputShape=(64, 64, 1)):
inputLayer = Input(shape=inputShape)
After applying the Flatten layer to the convolution tensor I use the same dense layer parameters as in classic VGG19 but I get an error when compiling the model
ValueError: Shapes (None, 64, 64, 1) and (None, 1000) are incompatible
As far as I understand the number of neurons in dense layer should correspond to the dimensionality of the input data. That is 64x64 image, after applying the Flatten layer, the dense layer should receive a vector with 4096 neurons. As described in the classical model
layerSet = Flatten()(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
outputLayer = Dense(1000, activation='relu')(layerSet)
The last dense layer gets 1000 neurons, each corresponding to some recognizable class.
In my case, I need a set of features for SRGAN, so I doubt that for my problem there is a need to use classification vector. Features derived from VGG19 model in association with features derived from discriminative model should be passed as output layer of generative-competitive model.
Next I give you the full code example where I give the model itself and the training method. I expect to eventually get the required features from the model
class VGG19DeepConvolutionNetwork:
__model = None
def __init__(self):
self.model()
def model(self, inputShape=(64, 64, 1)):
inputLayer = Input(shape=inputShape)
layerSet = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(inputLayer)
layerSet = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(layerSet)
layerSet = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Flatten()(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
outputLayer = Dense(1000, activation='relu')(layerSet)
self.__model = Model(inputs=[inputLayer], outputs=[outputLayer])
self.__model.compile(optimizer='adam', loss='categorical_crossentropy')
print(self.__model.summary())
def train(self, imageDataPath:string='srgangImageData.h5', weightsPath:string='vgg19Weights.h5', sliceSize=32, epochsNumber=100):
if self.__model is None:
self.model((sliceSize, sliceSize, 1))
imageData = ImageDataProcessing()
sourceTrain, targetTrain, sourceTest, targetTest = imageData.readImageData(imageDataPath)
del imageData
print( 'train source', sourceTrain.shape )
print( 'train target', targetTrain.shape )
print( 'test source', sourceTest.shape )
print( 'test target', targetTest.shape )
checkpoint = ModelCheckpoint(weightsPath, verbose=1, save_best_only=True, save_weights_only=False, mode='min')
callbacks_list = [checkpoint]
history = self.__model.fit(sourceTrain, targetTrain, batch_size=128, steps_per_epoch=len(sourceTrain)//128, validation_data=(sourceTest, targetTest),
callbacks=callbacks_list, shuffle=True, epochs=epochsNumber, verbose=1)
Some corrections:
The flatten layer should result with 2 x 2 x 512 = 2048 parameters as that is the output of the last convolutional layer. Tensorflow/keras should infer that for you.
The reason the last layer gets 1000 neurons is because the model was originally trained on a dataset with 1000 classes (1 neuron per class).
What version of tensorflow are you using? Are you sure it is failing at the compile step? I tried to compile your model with tensorflow 2.10.0 (Python 3.10.4) and everything worked fine. I tried to do a forward pass with an input of (10,64,64,1) and that worked fine too.
Here is the code I tried both locally and in Google Colab:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras import Model
import tensorflow as tf
class VGG19DeepConvolutionNetwork:
__model = None
def __init__(self):
self.model()
def model(self, inputShape=(64, 64, 1)):
inputLayer = Input(shape=inputShape)
layerSet = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(inputLayer)
layerSet = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(layerSet)
layerSet = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(layerSet)
layerSet = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(layerSet)
layerSet = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv4')(layerSet)
layerSet = MaxPooling2D(strides=(2,2), padding='same')(layerSet)
layerSet = Flatten()(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
layerSet = Dense(4096, activation='relu')(layerSet)
layerSet = Dropout(0.5)(layerSet)
outputLayer = Dense(1000, activation='relu')(layerSet)
self.__model = Model(inputs=[inputLayer], outputs=[outputLayer])
self.__model.compile(optimizer='adam', loss='categorical_crossentropy')
print(self.__model.summary())
def getModel(self):
return self.__model
def train(self, imageDataPath: str='srgangImageData.h5', weightsPath: str='vgg19Weights.h5', sliceSize=32, epochsNumber=100):
if self.__model is None:
self.model((sliceSize, sliceSize, 1))
imageData = ImageDataProcessing()
sourceTrain, targetTrain, sourceTest, targetTest = imageData.readImageData(imageDataPath)
del imageData
print( 'train source', sourceTrain.shape )
print( 'train target', targetTrain.shape )
print( 'test source', sourceTest.shape )
print( 'test target', targetTest.shape )
checkpoint = ModelCheckpoint(weightsPath, verbose=1, save_best_only=True, save_weights_only=False, mode='min')
callbacks_list = [checkpoint]
history = self.__model.fit(sourceTrain, targetTrain, batch_size=128, steps_per_epoch=len(sourceTrain)//128, validation_data=(sourceTest, targetTest),
callbacks=callbacks_list, shuffle=True, epochs=epochsNumber, verbose=1)
modelWrapper = VGG19DeepConvolutionNetwork()
model = modelWrapper.getModel()
X = tf.random.uniform((10,64,64,1))
output = model(X)
print(output)
# modelWrapper.train()
I want set up a Keras Sequential Deep UNET, but I dont know how to concatenate
specific layers.
from keras import models
from keras import layers
from keras.layers.convolutional import Conv2D, Conv2DTranspose
model = models.Sequential()
model.add(layers.Conv2D(8,(3,3), activation="relu", padding='same', input_shape=(512, 512, 4)))
model.add(layers.Conv2D(8,(3,3), activation="relu", padding='same'))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(16,(3,3), activation="relu", padding='same'))
model.add(layers.Conv2D(16,(3,3), activation="relu", padding='same'))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(32,(3,3), activation="relu", padding='same'))
model.add(layers.Conv2D(32,(3,3), activation="relu", padding='same'))
model.add(layers.MaxPooling2D(2,2))
model.add(layers.Dropout(0.2))
model.add(Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same'))
model.add(layers.Conv2D(16,(3,3), activation="relu", padding='same'))
model.add(layers.Conv2D(16,(3,3), activation="relu", padding='same'))
model.add(Conv2DTranspose(8, (2, 2), strides=(2, 2), padding='same'))
model.add(layers.Conv2D(8,(3,3), activation="relu", padding='same'))
model.add(layers.Conv2D(classes,(3,3), activation="relu", padding='same'))
model = model.compile(optimizer=optimizer,loss=loss, metrics=['accuracy'])
model.summary()
In a non Sequential Model it would be something like that
u6 = Conv2DTranspose(n_filters * 8, (3, 3), strides = (2, 2), padding = 'same')(c5)
u6 = concatenate([u6, c4])
u6 = Dropout(dropout)(u6)
c6 = conv2d_block(u6, n_filters * 8, kernel_size = 3, batchnorm = batchnorm)
I am running an U-net as defined below:
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = Lambda(lambda x: x / 255) (inputs)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (s)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)
c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (p1)
c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (p2)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)
c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (p3)
c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (p4)
c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (c5)
u6 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (u6)
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (c6)
u7 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (u7)
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (c7)
u8 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (u8)
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (c8)
u9 = Conv2DTranspose(8, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (u9)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (c9)
outputs = Conv2D(10, (1, 1), activation='sigmoid') (c9)
model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='Adamax', loss = dice, metrics = [mIoU])
Notice that I'm doing multi-class prediction on ten classes. And the inputs are 256x256x3 (rgb) images and the ground truths are binary masks of size 256x256x10 since the depth=num_classes=10. My question is, I accidently forgot to change the activation function from sigmoid to softmax and ran the network. The network still ran. How is this possible?? Is it because it's treating each binary mask independently?
More intriguingly, the network actually yielded better results when using sigmoid as opposed to when I ran it with softmax.
Q1: Why my network is still trainable with a *wrong* loss function?
A1: Because your network is optimized in terms of gradient descent, which does not care about which loss function is used as long as it is differentiable. This fact reveals the difficulty to debug a network when it doesn't work, because it is not a code bug (e.g. causing memory leak, numerical overflow, etc.), but some bug does not scientifically sound (e.g. your regression target is of range (0,100), but you use sigmoid as the activation function of the last dense layer).
Q2: How come `sigmoid` gives better performance than `softmax`?
A2: First, using the sigmoid loss function means to train 10 binary classifiers, one for each class (i.e. the classic one v.s. all or one v.s. rest setting), and thus it is also technically sound.
The only difference between sigmoid and softmax is that the sum of the class-wise predicted probability is always 1 for the softmax network, while may not necessarily to be 1 for the sigmoid network. In other words, you might have confusions to decide a label during testing for the sigmoid network.
Regarding to why sigmoid is better than softmax, it is related to many aspects and difficult to analyze without careful studies. One possible explanation is that sigmoid treats rows in the weight matrix of the last dense layer independently, while softmax treats them dependently. Therefore, sigmoid may better handle those samples with contradicting gradient directions. Another thought is that maybe you should try the recent heated-up softmax.
Finally, if you believe sigmoid version gives you better performance but you still want a softmax network, you may reuse all the layers until the last dense layer in the sigmoid network and finetune a new softmax layer, or use both losses just as in a multi-task problem.
I am trying to apply batch normalization on an U-net and I have the following architecture:
inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
s = Lambda(lambda x: x / 255) (inputs)
width = 32
activation = 'sigmoid'
c1 = Conv2D(width, (3, 3), activation='elu', padding='same') (s)
c1 = Conv2D(width, (3, 3), activation='elu', padding='same') (c1)
c1 = BatchNormalization()(c1)
p1 = MaxPooling2D((2, 2)) (c1)
#p1 = Dropout(0.2)(p1)
c2 = Conv2D(width*2, (3, 3), activation='elu', padding='same') (p1)
c2 = Conv2D(width*2, (3, 3), activation='elu', padding='same') (c2)
c2 = BatchNormalization()(c2)
p2 = MaxPooling2D((2, 2)) (c2)
#p2 = Dropout(0.2)(p2)
c3 = Conv2D(width*4, (3, 3), activation='elu', padding='same') (p2)
c3 = Conv2D(width*4, (3, 3), activation='elu', padding='same') (c3)
c3 = BatchNormalization()(c3)
p3 = MaxPooling2D((2, 2)) (c3)
#p3 = Dropout(0.2)(p3)
c4 = Conv2D(width*8, (3, 3), activation='elu', padding='same') (p3)
c4 = Conv2D(width*8, (3, 3), activation='elu', padding='same') (c4)
c4 = BatchNormalization()(c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
#p4 = Dropout(0.2)(p4)
c5 = Conv2D(width*16, (3, 3), activation='elu', padding='same') (p4)
c5 = Conv2D(width*16, (3, 3), activation='elu', padding='same') (c5)
u6 = Conv2DTranspose(width*8, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
#u6 = Dropout(0.2)(u6)
c6 = Conv2D(width*8, (3, 3), activation='elu', padding='same') (u6)
c6 = Conv2D(width*8, (3, 3), activation='elu', padding='same') (c6)
u7 = Conv2DTranspose(width*4, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
#u7 = Dropout(0.2)(u7)
c7 = Conv2D(width*4, (3, 3), activation='elu', padding='same') (u7)
c7 = Conv2D(width*4, (3, 3), activation='elu', padding='same') (c7)
u8 = Conv2DTranspose(width*2, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
#u8 = Dropout(0.2)(u8)
c8 = Conv2D(width*2, (3, 3), activation='elu', padding='same') (u8)
c8 = Conv2D(width*2, (3, 3), activation='elu', padding='same') (c8)
u9 = Conv2DTranspose(width, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
#u9 = Dropout(0.2)(u9)
c9 = Conv2D(width, (3, 3), activation='elu', padding='same') (u9)
c9 = Conv2D(width, (3, 3), activation='elu', padding='same') (c9)
outputs = Conv2D(num_classes, (1, 1), activation=activation) (c9)
model = Model(inputs=[inputs], outputs=[outputs])
What happens is the training loss very quickly approaches a plateau value (within 2 epochs) and the whole time val loss remains nan. I looked at other posts and some say it's because the dimension ordering is wrong. But if this were true, then i shouldn't be getting training loss either. Other reasons are that the value is diminishing due to learning rate. However, this reason too is offset by the fact that I am getting a loss for the training. What am I doing wrong?
if num_classes>1 your activation should be "softmax" and not "sigmoid" and then it'll probably work
I wasn't passing in any validation data to the fit method! I needed to do something like this: model.fit(X_train, Y_train, validation_split=0.1, batch_size=8, epochs=30)
I have a situation where input is an image and a group of (3) numeric fields and output is an image mask. I am not sure about how to do that in KERAS...
My architecture is somewhat like the attachment. I am aware about the CNN and Dense architectures, just not sure how to pass the inputs in the corresponding networks and do the concat operation. Also, suggestion of berrer architecture for this will be great!!!!!
Please suggest me, preferably with example code.
Thanks in Advance, Utpal.
I can advice to try U-net model for this problem. Usual U-net represents several conv and maxpooling layers, and then several conv and upsampling layers:
In the current problem you can mix up non-spatial data (image annotation) at the middle:
Also maybe it's a good idea to start with pre-trained VGG-16 (see below vgg.load_weights(VGG_Weights_path)).
See code below (based on Divam Gupta's repo):
from keras.models import *
from keras.layers import *
def VGGUnet(n_classes, input_height=416, input_width=608, data_length=128, vgg_level=3):
assert input_height % 32 == 0
assert input_width % 32 == 0
# https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_th_dim_ordering_th_kernels.h5
img_input = Input(shape=(3, input_height, input_width))
data_input = Input(shape=(data_length,))
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', data_format=IMAGE_ORDERING)(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', data_format=IMAGE_ORDERING)(x)
f1 = x
# Block 2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool', data_format=IMAGE_ORDERING)(x)
f2 = x
# Block 3
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool', data_format=IMAGE_ORDERING)(x)
f3 = x
# Block 4
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool', data_format=IMAGE_ORDERING)(x)
f4 = x
# Block 5
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2', data_format=IMAGE_ORDERING)(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3', data_format=IMAGE_ORDERING)(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)
f5 = x
x = Flatten(name='flatten')(x)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(1000, activation='softmax', name='predictions')(x)
vgg = Model(img_input, x)
vgg.load_weights(VGG_Weights_path)
levels = [f1, f2, f3, f4, f5]
# Several dense layers for image annotation processing
data_layer = Dense(1024, activation='relu', name='data1')(data_input)
data_layer = Dense(input_height * input_width / 256, activation='relu', name='data2')(data_layer)
data_layer = Reshape((1, input_height / 16, input_width / 16))(data_layer)
# Mix image annotations here
o = (concatenate([f4, data_layer], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f3], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f2], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
o = (concatenate([o, f1], axis=1))
o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
o = (Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
o = (BatchNormalization())(o)
o = Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(o)
o_shape = Model(img_input, o).output_shape
output_height = o_shape[2]
output_width = o_shape[3]
o = (Reshape((n_classes, output_height * output_width)))(o)
o = (Permute((2, 1)))(o)
o = (Activation('softmax'))(o)
model = Model([img_input, data_input], o)
model.outputWidth = output_width
model.outputHeight = output_height
return model
To train and evaluate a keras model with several inputs prepare separate arrays for each of the input layers - image_train and annotation_train (preserving an order by the first axis, i.e. number of the sample) and call this:
model.fit([image_train, annotation_train], result_segmentation_train, batch_size=..., epochs=...)
test_loss, test_acc = model.evaluate([image_test, annotation_test], result_segmentation_test)
Good luck!