How cnn layers connect in keras? - keras

Is there anyone who knows how cnn layers connect in keras?
## layer #1
inputs = Input((img_rows, img_cols, 5, 1), name='inputs') # shape (?, 192,192,5,1)
conv1_1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(inputs) # shape (?,192,192,5,32)
conv1_1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv1_1)
pool1_1 = MaxPooling3D(pool_size=(2, 2, 1))(conv1_1)
drop1_1 = Dropout(0.2)(pool1_1)
## layer #2
conv2_1 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(drop1_1) # shape(?,96,96,5,64)
conv2_1 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv2_1)
pool2_1 = MaxPooling3D(pool_size=(2, 2, 1))(conv2_1)
drop2_1 = Dropout(0.2)(pool2_1)
## layer #3
conv3_1 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(drop2_1) # shape(?, 48,48,5,128)
conv3_1 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv3_1)
pool3_1 = MaxPooling3D(pool_size=(2, 2, 1))(conv3_1)
drop3_1 = Dropout(0.2)(pool3_1)
For example, the shape of conv1_1 layer is (?,192,192,5,32) and the shape of conv2_1 layer is (?,96,96,5,64). In this case, 32 and 64 indicate the number of filters(or output channels) in each cnn layer. At this point, how can I estimate the number of features or number of nodes from layer #1 to layer #2?

how can I estimate the number of features or number of nodes from
layer #1 to layer #2?
If you mean number of weights, if you define your layer as l (l = Conv3D(...) etc) you can access kernel and biase weights, and get its shape. For the number of features you can do this with layer output (you'd need to define separate variables for that, and not just use one name as you did).

Related

Autoencoder of CNN - decrease or increase filters?

In an Autoencoder based on CNN, will you increase or decrease the number of filters between layers ? As we compress the information, I was thinking of decreasing.
Example here of the encoder part where the number of filters is decreased at each new layer, from 16 to 8 to 4.
x = Conv2D(filters = 16, kernel_size = 3, activation='relu', padding='same', name='encoder_1a')(inputs)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_1b')(x)
x = Conv2D(filters = 8, kernel_size = 3, activation='relu', padding='same', name='encoder_2a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_2b')(x)
x = Conv2D(filters = 4, kernel_size = 3, activation='relu', padding='same', name='encoder_3a')(x)
x = MaxPooling2D(pool_size = (2, 2), padding='same', name='encoder_3b')(x)
It is not always the case that the filter sizes are reduced or increased with increasing number of layers in encoder. In most examples of encoder I have seen of convolutional autoencoder architectures the height and width is decreased through strided convolution or pooling, and depth of layer is increased (filter sizes are increased), kept similar to last one or varied with each new layer in encoder. But there is also examples where the output channels or filter sizes are decreased with more layers.
Usually autoencoder encodes input into latent representation/vector or embedding that has lower dimension than input that minimizes reconstruction error. So both of the above can be used for creating undercomplete autoencoder by varying kernel size, number of layers, adding an extra layer at the end of encoder with a certain dimension etc.
Filter increase example
In the image below as more layers are added in encoder the filter sizes increase. But as the input 28*28*1 = 784 dimension features and the flattened representation 3*3*128 = 1152 is more so another layer is added before final layer which is the embedding layer. It reduces the feature dimension with predefined number of outputs in fully connected network. Even the last dense/fully connected layer can be replaced by varying the number of layers or kernel size to have an output (1, 1, NUM_FILTERS).
Filter decrease example
An easy example of filters decreasing in encoder as the number of layers increase can be found on keras convolutional autoencoder example just as your code.
import keras
from keras import layers
input_img = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)
References
https://www.deeplearningbook.org/contents/autoencoders.html
https://xifengguo.github.io/papers/ICONIP17-DCEC.pdf
https://blog.keras.io/building-autoencoders-in-keras.html

Adding CTC Loss and CTC decode to a Keras model

I am trying to solve a use case of handwritten text recognition. I have used CNN and LSTM to create a network. The output of this needs to be fed to a CTC layer. I could find some codes to do this in native tensorflow. Is there an easier option for this in Keras.
model = Sequential()
model.add(Conv2D(64, kernel_size=(5,5),activation = 'relu', input_shape=(128,32,1), padding='same', data_format='channels_last'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(128, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Conv2D(256, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,2),padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1,1)))
model.add(Conv2D(512, kernel_size=(5,5),activation = 'relu', padding='same'))
model.add(Lambda(lambda x: x[:, :, 0, :], output_shape=(None,31,512), mask=None, arguments=None))
#model.add(Bidirectional(LSTM(256, return_sequences=True), input_shape=(31, 256)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Bidirectional(LSTM(128, return_sequences=True)))
model.add(Dense(75, activation = 'softmax'))
Any help on how we can easily add CTC Loss and Decode layers to this would be great
A CTC loss function requires four arguments to compute the loss, predicted outputs, ground truth labels, input sequence length to LSTM and ground truth label length. To get this we need to create a custom loss function and then pass it to the model. To make it compatible with your defined model, we need to create a model which takes these four inputs and outputs the loss. This model will be used for training and for testing, the model that you have created earlier can be used.
Let's create a keras model that you used in a different way so that we can create two different versions of the model to be used at training and testing time.
# input with shape of height=32 and width=128
inputs = Input(shape=(32, 128, 1))
# convolution layer with kernel size (3,3)
conv_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
# poolig layer with kernel size (2,2)
pool_1 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_1)
conv_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPool2D(pool_size=(2, 2), strides=2)(conv_2)
conv_3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_2)
conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_3)
# poolig layer with kernel size (2,1)
pool_4 = MaxPool2D(pool_size=(2, 1))(conv_4)
conv_5 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4)
# Batch normalization layer
batch_norm_5 = BatchNormalization()(conv_5)
conv_6 = Conv2D(512, (3, 3), activation='relu', padding='same')(batch_norm_5)
batch_norm_6 = BatchNormalization()(conv_6)
pool_6 = MaxPool2D(pool_size=(2, 1))(batch_norm_6)
conv_7 = Conv2D(512, (2, 2), activation='relu')(pool_6)
squeezed = Lambda(lambda x: K.squeeze(x, 1))(conv_7)
# bidirectional LSTM layers with units=128
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(squeezed)
blstm_2 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.2))(blstm_1)
outputs = Dense(len(char_list) + 1, activation='softmax')(blstm_2)
# model to be used at test time
test_model = Model(inputs, outputs)
We will use ctc_loss_fuction during training. So, lets implement the ctc_loss_function and create a training model using ctc_loss_function:
labels = Input(name='the_labels', shape=[max_label_len], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([outputs, labels,
input_length, label_length])
#model to be used at training time
training_model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
--> Train this model and save the weights in .h5 file
Now use the test model and load saved weights of the training model by using arguments by_name=True so it will load weights for only matching layers.

How to copy weights from a 2D convnet in to a 3D Convnet on Keras?

I'm trying to implement a 3D convnet followed by LSTM layer for sequence generation using 3D images as input , on Keras with Tensorflow backend.
I would like to start training with weights of an existing pre-trained model in order to avoid common issues with random initialization .
In order to start with a basic example, I took VGG-16 and I implemented a "3D" version of this network (without the FC layers):
img_input = Input((100,80,80,3))
x = Conv3D(64, (3, 3 ,3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv3D(64, (3, 3 ,3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling3D((1, 2, 2), strides=(1, 2, 2), name='block1_pool')(x)
x = Conv3D(128, (3, 3 ,3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv3D(128, (3, 3 ,3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1,2, 2), name='block2_pool')(x)
x = Conv3D(256, (3, 3 ,3), activation='relu', padding='same', name='block3_conv1')(x)
x = Conv3D(256, (3, 3 , 3), activation='relu', padding='same', name='block3_conv2')(x)
x = Conv3D(256, (3, 3, 3), activation='relu', padding='same', name='block3_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1,2, 2), name='block3_pool')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv1')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv2')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block4_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1, 2, 2), name='block4_pool')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv1')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv2')(x)
x = Conv3D(512, (3, 3 ,3), activation='relu', padding='same', name='block5_conv3')(x)
x = MaxPooling3D((1, 2 ,2), strides=(1, 2, 2), name='block5_pool')(x)
So I would like to know how can I load the weights of the pretrained VGG-16 into each one of the 100 slices (my 3D images are composed by 100 80x80 rgb slices) ,
Any advise you can give to me would be useful,
Thanks
This depends on what you are looking to do in your application.
If you are just looking to process the 3D image in terms of slices, then you can define a TimeDistributed VGG16 network (Conv2D instead of Conv3D) would be the way to go.
The model then becomes something like this for each layer you define above:
img_input = Input((100,80,80,3))
x = TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', trainable=False))(img_input)
x = TimeDistributed(Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', trainable=False))(x)
x = TimeDistributed((MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', trainable=False)(x)
...
...
Note that I have included the option 'trainable=False' over here. This is pretty useful if you only want to train the deeper layers and freeze the lower layers with the well trained wights of VGG.
To load the VGG weights for the model, you can then use the load_weights function of Keras.
model.load_weights(filepath, by_name=True)
If you set the layer names which you do not want to train to be the same as what is defined in the VGG16, then you can simply load those layers by name over here.
However, spatio-temporal feature learning is something that can be potentially done better much by using 3D ConvNets.
If this is the basis of your application, then you cannot directly import VGG16 weights in to a Conv3D model, because the number of parameters in each layer now increases because the filter went from being a 3*3 to a 3*3*3 for example.
You could still load the weights layer by layer to the model by considering which patch of 3*3 from the 3*3*3 would be most suitable for initialization with VGG16 weights. set_weights() function takes as input a list of numpy arrays (for the kernel weights and the bias respectively). You can extract each layer weight from VGG16 and then construct a new numpy array for an equivalent Conv3D weight matrix and feed it to your Conv3D model.
But I would encourage you to look at existing literature and models for processing 3D images to see if those can give you the better initialization using transfer learning.
For example, C3D is one such popular model. ShapeNet and Pascal3D are popular 3D datasets.
This discussion on how to process video data might also be useful to give you better insights on how to proceed.

Neural Networks: Why can't I go deeper?

I am using this model to get depth maps from images:
def get_model(learning_rate=0.001, channels=2):
h = 128 # height of the image
w = 128 # width of the image
c = channels # no of channels
encoding_size = 512
# encoder
image = Input(shape=(c, h, w))
conv_1_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(image)
conv_1_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv_1_1)
pool_1_2 = MaxPooling2D((2, 2))(conv_1_2)
conv_2_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool_1_2)
conv_2_2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv_2_1)
pool_2_2 = MaxPooling2D((2, 2))(conv_2_2)
conv_3_1 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_2_2)
conv_3_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv_3_1)
# pool_3_2 = MaxPooling2D((2, 2))(conv_3_2)
# conv_4_1 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_3_2)
# conv_4_2 = Conv2D(256, (3, 3), activation='relu', padding='same')(conv_4_1)
# pool_4_3 = MaxPooling2D((2, 2))(conv_4_2)
# conv_5_1 = Conv2D(512, (3, 3), activation='relu', padding='same')(pool_4_3)
# conv_5_2 = Conv2D(512, (3, 3), activation='relu', padding='same')(conv_5_1)
flat_5_2 = Flatten()(conv_3_2)
encoding = Dense(encoding_size, activation='tanh')(flat_5_2)
# decoder
reshaped_6_1 = Reshape((8, 8, 8))(encoding)
conv_6_1 = Conv2D(128, (3, 3), activation='relu', padding='same')(reshaped_6_1)
conv_6_2 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv_6_1)
upsample_6_2 = UpSampling2D((2, 2))(conv_6_2)
conv_7_1 = Conv2D(64, (3, 3), activation='relu', padding='same')(upsample_6_2)
conv_7_2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv_7_1)
upsample_7_2 = UpSampling2D((2, 2))(conv_7_2)
conv_8_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(upsample_7_2)
conv_8_2 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv_8_1)
upsample_8_2 = UpSampling2D((2, 2))(conv_8_2)
conv_9_1 = Conv2D(16, (3, 3), activation='relu', padding='same')(upsample_8_2)
conv_9_2 = Conv2D(16, (3, 3), activation='relu', padding='same')(conv_9_1)
upsample_9_2 = UpSampling2D((2, 2))(conv_9_2)
conv_10_1 = Conv2D(8, (3, 3), activation='relu', padding='same')(upsample_9_2)
conv_10_2 = Conv2D(1, (3, 3), activation='relu', padding='same')(conv_10_1)
output = Conv2D(1, (1, 1), activation=relu_normalized, padding='same')(conv_10_2)
model = Model(inputs=image, outputs=output)
model.compile(loss='mae', optimizer=Adam(learning_rate))
return model
Input: 2x128x128 (two bw images) - Squished to [0,1] (preprocessing normalization)
Output: 1x128x128 (depth map) - Squished to [0,1] by relu-normalized
NOTE: relu_normalized is just relu followed by squishing values to 0-1 so as to have a proper image. Sigmoid doesn't seem to fit this criteria.
When I add any more layers, the loss becomes a constant and backprop is not happening properly because both the output and gradients are becoming zero (and hence changing the learning rate didn't change anything in the network)
So if I want to go deeper to generalize more, by uncommenting the lines (and of course connecting conv_5_2 to flat_5_2), what is it that I am missing?
My thoughts:
Using Sigmoid would lead to vanishing gradient problem, but I am using relu's, would that problem still exist?
Changing anything in the network, like encoding size, even changing to activations to elu or selu doesn't show any progress.
Why are my outputs getting closer to zero when I try to add even one more conv layer followed by max_pooling?
UPDATE:
Here's relu_normalized,
def relu_normalized(x):
epsilon = 1e-6
relu_x = relu(x)
relu_scaled_x = relu_x / (K.max(relu_x) + epsilon)
return relu_scaled_x
and later after getting the output which has range [0,1], we simple do output_image = 255 * output and we can save this as b/w image now.
If you want go deeper you have to add some batch normalization layer (in Keras https://keras.io/layers/normalization/#batchnormalization) in this case.
From Ian Goodfellow's book, on the batch normalization chapter:
Very deep models involve the composition of several functions or layers. The
gradient tells how to update each parameter, under the assumption that the other
layers do not change. In practice, we update all of the layers simultaneously.
When we make the update, unexpected results can happen because many functions
composed together are changed simultaneously, using updates that were computed
under the assumption that the other functions remain constant
Also, tanh is easily saturated so use only if you need it :)
There is a problem that might happen with "relu" when you have learning rates too big.
There is a high chance of all activations going to 0 and getting stuck there never to change anymore. (When they're at 0, their gradient is also 0).
Since I'm not an expert on adjusting the parameters in detail for using "relu", and my results with "relu" are always bad, I prefer using "sigmoid" or "tanh". (It's worth trying, although there might be some vanishing there...). I keep my images ranged from 0 to 1 and use "binary_crossentropy" as loss, which is a lot faster than "mae/mse" in this case.
Another thing that happened to me was an "apparently" frozen loss function. It happened that the value was changing so little, that the displayed decimals weren't enough to see the variation, but after a lot of epochs, it found a reasonable way to go down properly. (Probably some kind of saturation indeed, but for me it's still better than getting freezes or nans)
You can introduce recurrent layers like LSTM which would "trap" the errors using gating, potentially improving the situation.

Where is Keras 2 's channels?

I once used keras 1 (maybe 1.0.5) for multi-category classification. And my input in CNN is (n, 1, 24, 113) and 113 is channel numbers, and kernel size is (1, 5).
code like:
X_train = X_train.reshape((-1, 1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
X_test = X_test.reshape((-1, 1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
# network
inputs = Input(shape=(1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
conv1 = ELU()(Convolution2D(NUM_FILTERS, FILTER_SIZE, 1, border_mode='valid', init='normal', activation='relu')(inputs))
conv2 = ELU()(Convolution2D(NUM_FILTERS, FILTER_SIZE, 1, border_mode='valid', init='normal', activation='relu')(conv1))
conv3 = ELU()(Convolution2D(NUM_FILTERS, FILTER_SIZE, 1, border_mode='valid', init='normal', activation='relu')(conv2))
conv4 = ELU()(Convolution2D(NUM_FILTERS, FILTER_SIZE, 1, border_mode='valid', init='normal', activation='relu')(conv3))
reshape1 = Reshape((8, NUM_FILTERS * NUM_SENSOR_CHANNELS))(conv4)
gru1 = GRU(NUM_UNITS_LSTM, return_sequences=True, consume_less='mem')(reshape1)
gru2 = GRU(NUM_UNITS_LSTM, return_sequences=False, consume_less='mem')(gru1)
outputs = Dense(NUM_CLASSES, activation='softmax')(gru2)
# Hardcoded number of sensor channels employed in the OPPORTUNITY challenge
NUM_SENSOR_CHANNELS = 113
# Hardcoded number of classes in the gesture recognition problem
NUM_CLASSES = 18
# Hardcoded length of the sliding window mechanism employed to segment the data
SLIDING_WINDOW_LENGTH = 24
# Length of the input sequence after convolutional operations
FINAL_SEQUENCE_LENGTH = 8
# Hardcoded step of the sliding window mechanism employed to segment the data
SLIDING_WINDOW_STEP = 12
# Batch Size
BATCH_SIZE = 100
# Number filters convolutional layers
NUM_FILTERS = 64
# Size filters convolutional layers
FILTER_SIZE = 5
# Number of unit in the long short-term recurrent layers
NUM_UNITS_LSTM = 128
And these days I switched keras to keras 2. and the networks did not change. And my code like:
X_train = X_train.reshape((-1, 1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
X_test = X_test.reshape((-1, 1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
# network
inputs = Input(shape=(1, SLIDING_WINDOW_LENGTH, NUM_SENSOR_CHANNELS))
conv1 = ELU()(
Conv2D(filters=NUM_FILTERS, kernel_size=(1, FILTER_SIZE), strides=(1, 1), padding='valid', activation='relu',
kernel_initializer='normal', data_format='channels_last')(inputs))
conv2 = ELU()(
Conv2D(filters=NUM_FILTERS, kernel_size=(1, FILTER_SIZE), strides=(1, 1), padding='valid', activation='relu',
kernel_initializer='normal', data_format='channels_last')(conv1))
conv3 = ELU()(
Conv2D(filters=NUM_FILTERS, kernel_size=(1, FILTER_SIZE), strides=(1, 1), padding='valid', activation='relu',
kernel_initializer='normal', data_format='channels_last')(conv2))
conv4 = ELU()(
Conv2D(filters=NUM_FILTERS, kernel_size=(1, FILTER_SIZE), strides=(1, 1), padding='valid', activation='relu',
kernel_initializer='normal', data_format='channels_last')(conv3))
# permute1 = Permute((2, 1, 3))(conv4)
reshape1 = Reshape((SLIDING_WINDOW_LENGTH - (FILTER_SIZE - 1) * 4, NUM_FILTERS * 1))(conv4) # 4 for 4 convs
gru1 = GRU(NUM_UNITS_LSTM, return_sequences=True, implementation=0)(reshape1)
gru2 = GRU(NUM_UNITS_LSTM, return_sequences=False, implementation=0)(gru1) # implementation=2 for GPU
outputs = Dense(NUM_CLASSES, activation='softmax')(gru2)
and the speed seems faster but the shape is strange since I didn't know where is my channels ?
Is there anything wrong with my code and could someone help ? THX
It seems that Keras handles the channel parameter himself.

Resources