Flatten in Keras video answering machine example - keras

In Keras' video question answering example (https://keras.io/getting-started/functional-api-guide/), what does the vision_model.add(Flatten()) at the end of the convolutional neural net do and why is it needed?
Full source:
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential
# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(224, 224, 3)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Flatten())
Then later
from keras.layers import TimeDistributed
video_input = Input(shape=(100, 224, 224, 3))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input) # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence) # the output will be a vector

running:
vision_model.summary()
we get:
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
conv2d_2 (Conv2D) (None, 222, 222, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 111, 111, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 111, 111, 128) 73856
_________________________________________________________________
conv2d_4 (Conv2D) (None, 109, 109, 128) 147584
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 54, 54, 128) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 54, 54, 256) 295168
_________________________________________________________________
conv2d_6 (Conv2D) (None, 52, 52, 256) 590080
_________________________________________________________________
conv2d_7 (Conv2D) (None, 50, 50, 256) 590080
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 25, 25, 256) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 160000) 0
=================================================================
Total params: 1,735,488
Trainable params: 1,735,488
Non-trainable params: 0
vision_model.add(Flatten()) flattens vision_model.add(MaxPooling2D((2, 2))) from (None, 25, 25, 256) to (None, 160000)

Related

How to display the layers of a pretrained model instead of a single entry in model.summary() output?

As the title clearly describes the question, I want to display the layers of a pretained model instead of a single entry (please see the vgg19 (Functional) entry below) in model.summary() function output?
Here is a sample model that is implemented using the Keras Sequential API:
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))
And here is the output of the model.summary() function call:
Model: "sequential_15"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 512) 20024384
_________________________________________________________________
flatten_15 (Flatten) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_22 (Dense) (None, 10) 10010
=================================================================
Total params: 20,547,394
Trainable params: 523,010
Non-trainable params: 20,024,384
Edit: Here is the Functional API equivalent of the implemented Sequential API model - the result is the same:
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3), pooling='max', classes=10)
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
x = Flatten()(base_out)
x = Dense(1_000, activation='relu')(x)
m_outputs = Dense(10, activation='softmax')(x)
model = Model(inputs=m_inputs, outputs=m_outputs)
Instead of using the Sequential, I tried using the Functional API i.e. the tf.keras.models.Model class, like,
import tensorflow as tf
base_model = tf.keras.applications.VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
x = tf.keras.layers.Flatten()( base_model.output )
x = tf.keras.layers.Dense(1_000, activation='relu')( x )
outputs = tf.keras.layers.Dense(10, activation='softmax')( x )
model = tf.keras.models.Model( base_model.input , outputs )
model.summary()
The output of the above snippet,
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 32, 32, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 16, 16, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 16, 16, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 16, 16, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 8, 8, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 8, 8, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 4, 4, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 2, 2, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 1, 1, 512) 0
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_3 (Dense) (None, 10) 10010
=================================================================
Total params: 15,237,698
Trainable params: 15,237,698
Non-trainable params: 0
_________________________________________________________________
My understanding after going through the docs and running a few tests (via TF 2.5.0) is that when such a model is included in another model, Keras conceives of it as a "black box". It is not a simple layer, definitely no tensor, basically of complex type tensorflow.python.keras.engine.functional.Functional.
I reckon this is the underlying reason that you can not print it out in a detailed way as part of the model summary.
Now, if you'd like to just review the pre-trained model, have a sneak peak etc., you can simply run:
base_model.summary()
or after constructing your model (sequential or functional, doesn't matter at this point):
model.layers[i].summary() # i: the index of your pre-trained model
If you need to access the pre-trained model's layers, e.g. to use its weights separately etc., you can access them with this way as well.
If you'd like to print the layers of your model as a whole, then you need to trick Keras into beliving the "black box" is no stranger but just yet another KerasTensor. In order to do that, you can wrap the pre-trained model in another layer -in other words, connect them directly via Functional API-, which was suggested above and has worked fine for me.
x = tf.keras.layers.Flatten()( base_model.output )
I don't know if there is any specific reason that you'd like to pursue the new input route as in...
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
Whenever you locate the pre-trained model in the middle of your new model, as coming after the new Input layer or adding it to a Sequential model per se, the layers within would disappear from the summary output.
Generating a new Input layer or just feeding the pre-trained model's output as input to the current model didn't make any difference for me in this case.
Hope this clarifies the topic a wee bit more, and helps.
This should do what you want to do
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
for layer in base_model.layers:
layer.trainable = False
model.add(layer)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))

Shapes Incompatible in Keras with CNN

I am implementing a network that takes a 2d image and outputs a 3D binary voxels for it.
I am using an autoencoder with LSTM module.
The current shape of images and voxels are as follows:
print(x_train.shape)
print(y_train.shape)
>>> (792, 127, 127, 3)
>>> (792, 32, 32, 32)
792 RGB images 127 x 127
792 corresponding voxels with 3D Binary Tensor (32 x 32 x 32)
Running the following encoder model:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, LeakyReLU, MaxPooling2D, Dense, Flatten, Conv3D, MaxPool3D, GRU, Reshape, UpSampling3D
from tensorflow import keras
enc_filter = [96, 128, 256, 256, 256, 256]
fc_filters = [1024]
model = Sequential()
epochs = 5
batch_size = 24
input_shape=(127,127,3)
model.add(Conv2D(enc_filter[0], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs)
yields the following:
ValueError: Shapes (24, 32, 32, 32) and (24, 1024) are incompatible
Can someone address why the shapes are incompatible? I tried removing layers and test others but all yields compatibility issues.
Your model has a dense layer with 1024 output, but you are passing 32,32,32 shaped array.
You need to reshape your model output so that it has proper shape.
This is a dummy model, you need to change the parameters to find the suitable architecture.
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, LeakyReLU, MaxPooling2D, Dense, Flatten, Conv3D, MaxPool3D, GRU, Reshape, UpSampling3D
from tensorflow import keras
import numpy as np
# dummy data
x_train = np.random.randn(792, 127, 127, 3)
y_train = np.random.randn(792, 32, 32, 32)
enc_filter = [96, 128, 256, 2]
fc_filters = [1024]
model = Sequential()
epochs = 5
batch_size = 24
input_shape=(127,127,3)
model.add(Conv2D(enc_filter[0], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[1], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[2], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Conv2D(enc_filter[3], kernel_size=(7, 7), strides=(1,1),activation='relu',input_shape=input_shape)) # bottolneck
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(LeakyReLU(alpha=0.1))
model.add(Flatten())
model.add(Dense(32*32*32, activation='relu'))
model.add(Reshape((32,32,32)))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs)
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_24 (Conv2D) (None, 121, 121, 96) 14208
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 60, 60, 96) 0
_________________________________________________________________
leaky_re_lu_24 (LeakyReLU) (None, 60, 60, 96) 0
_________________________________________________________________
conv2d_25 (Conv2D) (None, 54, 54, 128) 602240
_________________________________________________________________
max_pooling2d_25 (MaxPooling (None, 27, 27, 128) 0
_________________________________________________________________
leaky_re_lu_25 (LeakyReLU) (None, 27, 27, 128) 0
_________________________________________________________________
conv2d_26 (Conv2D) (None, 21, 21, 256) 1605888
_________________________________________________________________
max_pooling2d_26 (MaxPooling (None, 10, 10, 256) 0
_________________________________________________________________
leaky_re_lu_26 (LeakyReLU) (None, 10, 10, 256) 0
_________________________________________________________________
conv2d_27 (Conv2D) (None, 4, 4, 2) 25090
_________________________________________________________________
max_pooling2d_27 (MaxPooling (None, 2, 2, 2) 0
_________________________________________________________________
leaky_re_lu_27 (LeakyReLU) (None, 2, 2, 2) 0
_________________________________________________________________
flatten_10 (Flatten) (None, 8) 0
_________________________________________________________________
dense_1 (Dense) (None, 32768) 294912
_________________________________________________________________
reshape_10 (Reshape) (None, 32, 32, 32) 0
=================================================================
Total params: 2,542,338
Trainable params: 2,542,338
Non-trainable params: 0
In the summary, you can see I add a dense layer with 32x32x32 neurons and then reshape it.

How to use repeat vector to predict a sequence of output?

I am trying to define a model that works for a sequence of images and tries to predict a sequence in turn. My problem is with the repeat vector, specifically is it the right use of it and secondly how to resolve the exception that I get.
input_frames=Input(shape=(None, 128, 64, 1))
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=True)(input_frames)
x=BatchNormalization()(x)
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=False)(x)
x=BatchNormalization()(x)
x=Conv2D(filters=1, kernel_size=(5, 5), activation='sigmoid',padding='same')(x)
x=RepeatVector(10)(x)
model=Model(inputs=input_frames,outputs=x)
Specifically, I am trying to forecast 10 frames in future. The above code throws the following exception:
assert_input_compatibility
str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer repeat_vector_5: expected ndim=2, found ndim=4
From the doc of RepeatVector, it only accept 2D inputs, that's what the error message is telling you.
Following is a workaround using Lambda layer:
from keras.layers import Input, ConvLSTM2D, BatchNormalization, RepeatVector, Conv2D
from keras.models import Model
from keras.backend import expand_dims, repeat_elements
from keras.layers import Lambda
input_frames=Input(shape=(None, 128, 64, 1))
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=True)(input_frames)
x=BatchNormalization()(x)
x=ConvLSTM2D(filters=64, kernel_size=(5, 5), padding='same', return_sequences=False)(x)
x=BatchNormalization()(x)
x=Conv2D(filters=1, kernel_size=(5, 5), activation='sigmoid',padding='same')(x)
#x=RepeatVector(10)(x)
x=Lambda(lambda x: repeat_elements(expand_dims(x, axis=1), 10, 1))(x)
model=Model(inputs=input_frames,outputs=x)
model.summary()
"""
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) (None, None, 128, 64, 1) 0
_________________________________________________________________
conv_lst_m2d_5 (ConvLSTM2D) (None, None, 128, 64, 64) 416256
_________________________________________________________________
batch_normalization_5 (Batch (None, None, 128, 64, 64) 256
_________________________________________________________________
conv_lst_m2d_6 (ConvLSTM2D) (None, 128, 64, 64) 819456
_________________________________________________________________
batch_normalization_6 (Batch (None, 128, 64, 64) 256
_________________________________________________________________
conv2d_3 (Conv2D) (None, 128, 64, 1) 1601
_________________________________________________________________
lambda_1 (Lambda) (None, 10, 128, 64, 1) 0
=================================================================
Total params: 1,237,825
Trainable params: 1,237,569
Non-trainable params: 256
_________________________________________________________________
"""
Note that I use a expand_dims here since from the doc of repeat_elements, it won't create new axis.

Adding Conv Layer in front of pretrained model gives ValueError

I want to combine a pretrained VGG16 model with a special input block, which is an input layer and a convolutional layer. The goal is to use a pre-trained RGB VGG16 imagenet model on grayscale images:
from keras.applications.vgg16 import VGG16
from keras.layers.convolutional import Conv2D
from keras.layers import Input
from keras.models import Model
img_height = 299
img_width = 299
def input_block(img_height = 299, img_width = 299):
input_shape = (img_height, img_width, 1)
img_input = Input(shape=input_shape, name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
return x
pretrained_model = VGG16(weights = 'imagenet', include_top=False, input_tensor = input_block(img_height, img_width))
When I set the weight initalization of VGG16() to 'None', the model builds correctly, with the following desired structure:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
grayscale_input_layer (Input (None, 299, 299, 1) 0
_________________________________________________________________
grayscale_RGB_layer (Conv2D) (None, 299, 299, 3) 30
_________________________________________________________________
block1_conv1 (Conv2D) (None, 299, 299, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 299, 299, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 149, 149, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 149, 149, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 149, 149, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 74, 74, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 74, 74, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 74, 74, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 74, 74, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 37, 37, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 37, 37, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 37, 37, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 37, 37, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 18, 18, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 9, 9, 512) 0
=================================================================
Total params: 14,714,718
Trainable params: 14,714,718
Non-trainable params: 0
_________________________________________________________________
None
However, when I set the weight initialization to 'imagenet',
I get the following error:
ValueError: You are trying to load a weight file containing 13 layers into a model with 14 layers.
This error makes sense, since I have added two layers in front of the VGG16 model instead of a single layer.
As a workaround, I have tried the following:
def input_block_model(img_height = 299, img_width = 299):
input_shape = (img_height, img_width, 1)
img_input = Input(shape=input_shape, name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
model = Model(img_input, x, name='input_block_model')
return model
input_model = input_block_model(299,299)
pretrained_model = VGG16(weights = "imagenet", include_top=False)
combined_model = Model(input_model.input,
pretrained_model(input_model.output))
print(combined_model.summary())
Then, the model structure is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
grayscale_input_layer (Input (None, 299, 299, 1) 0
_________________________________________________________________
grayscale_RGB_layer (Conv2D) (None, 299, 299, 3) 30
_________________________________________________________________
vgg16 (Model) multiple 14714688
=================================================================
Total params: 14,714,718
Trainable params: 14,714,718
Non-trainable params: 0
_________________________________________________________________
None
The disadvantage of this structure, is that I cannot set properties of layers within the VGG16 model. I want to freeze certain layers for example in this model, which I cannot access via combined_model.layers. Does anyone have a working solution, such that I get the model structure as with the 'None' initialization, but with pretrained ImageNet weights?
You can freeze or train layers using combined_model.layers[2].layers as mentioned in the comment above. You can may be simplify the model as follows:
```
img_input = Input(shape=(img_height, img_width, 1), name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
x = VGG16(weights = None, include_top=False)(x)
model = Model(img_input, x)
model.summary()
for layer in model.layers[2].layers:
layer.trainable = False
```

Keras to Pytorch model translation and input size

I am following a Keras tutorial and want to shadow it in Pytorch, so am translating. I'm not strongly familiar with either and am coming unstuck on the input size parameter especially, but also the final layer - do I need another Linear layer? Can anyone translate the following to a Pytorch sequential definition?
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
hidden1 = Dense(10, activation='relu')(pool2)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
This is the output of the model:
Layer (type) Output Shape Param #
_________________________________________________________________
input_1 (InputLayer) (None, 64, 64, 1) 0
conv2d_1 (Conv2D) (None, 61, 61, 32) 544
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 32) 0
conv2d_2 (Conv2D) (None, 27, 27, 16) 8208
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
dense_1 (Dense) (None, 13, 13, 10) 170
dense_2 (Dense) (None, 13, 13, 1) 11
Total params: 8,933
Trainable params: 8,933
Non-trainable params: 0
What I have worked out lacks a specification for the shape of the input, and I am also a bit perplexed at the translation of stride in the specified Keras model as it uses stride 2 in the MaxPooling2D but doesn't specify this elsewhere - it is perhaps a toy example.
model = nn.Sequential(
nn.Conv2d(1, 32, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(1, 16, 4),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Linear(10, 1),
nn.Sigmoid(),
)

Resources