tensorflow model gives "graph disconnected" error - python-3.x

I am experimenting/fiddling/learning with some small ML problems.
I have a loaded model based on a pre-trained convolution base with some self-trained dense layers (for model details see below).
I wanted to try to apply some visualizations like activations and the Grad CAM Visualization (https://www.statworx.com/de/blog/erklaerbbarkeit-von-deep-learning-modellen-mit-grad-cam/) on the model. But I was not able to do so.
I tried to create a new model based on mine (like in the article) with
grad_model = tf.keras.models.Model(model.inputs,
[model.get_layer('vgg16').output,
model.output])
but this already fails with the error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_5_12:0", shape=(None, None, None, 3), dtype=float32) at layer "block1_conv1". The following previous layers were accessed without issue: []
I do not understand what this means. the model surely works (i can evaluate it and make predictions with it).
The call does not fail if I omit the model.get_layer('vgg16').output from the outputs list but of course, this is required for the visualization.
What I am doing wrong?
In a model that I constructed and trained from scratch, I was able to create a similar model with the activations as outputs but here i get these errors.
My model's details
The model was created with the following code and then trained and saved.
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers
conv_base = keras.applications.vgg16.VGG16(
weights="vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5",
include_top=False)
conv_base.trainable = False
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal"),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.2),
]
)
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
later it was loaded:
model = keras.models.load_model("myModel.keras")
print(model.summary())
print(model.get_layer('sequential').summary())
print(model.get_layer('vgg16').summary())
output:
Model: "functional_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 180, 180, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 180, 180, 3) 0
_________________________________________________________________
vgg16 (Functional) (None, None, None, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 12800) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 3277056
_________________________________________________________________
dropout_1 (Dropout) (None, 256) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 257
=================================================================
Total params: 17,992,001
Trainable params: 10,356,737
Non-trainable params: 7,635,264
_________________________________________________________________
None
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
random_flip (RandomFlip) (None, 180, 180, 3) 0
_________________________________________________________________
random_rotation (RandomRotat (None, 180, 180, 3) 0
_________________________________________________________________
random_zoom (RandomZoom) (None, 180, 180, 3) 0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
None
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, None, None, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) multiple 1792
_________________________________________________________________
block1_conv2 (Conv2D) multiple 36928
_________________________________________________________________
block1_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block2_conv1 (Conv2D) multiple 73856
_________________________________________________________________
block2_conv2 (Conv2D) multiple 147584
_________________________________________________________________
block2_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block3_conv1 (Conv2D) multiple 295168
_________________________________________________________________
block3_conv2 (Conv2D) multiple 590080
_________________________________________________________________
block3_conv3 (Conv2D) multiple 590080
_________________________________________________________________
block3_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block4_conv1 (Conv2D) multiple 1180160
_________________________________________________________________
block4_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block4_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) multiple 0
_________________________________________________________________
block5_conv1 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv2 (Conv2D) multiple 2359808
_________________________________________________________________
block5_conv3 (Conv2D) multiple 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) multiple 0
=================================================================
Total params: 14,714,688
Trainable params: 7,079,424
Non-trainable params: 7,635,264

You can achieve what you want in the following way. First, define your model as follows:
inputs = tf.keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs, training=True)
x = keras.applications.VGG16(input_tensor=x,
include_top=False,
weights=None)
x.trainable = False
x = layers.Flatten()(x.output)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, x)
for i, layer in enumerate(model.layers):
print(i, layer.name, layer.output_shape, layer.trainable)
...
17 block5_conv2 (None, 11, 11, 512) False
18 block5_conv3 (None, 11, 11, 512) False
19 block5_pool (None, 5, 5, 512) False
20 flatten_2 (None, 12800) True
21 dense_4 (None, 256) True
22 dropout_2 (None, 256) True
23 dense_5 (None, 1) True
Now, build the grad-cam model with desired output layer as follows:
grad_model = keras.models.Model(
[model.inputs],
[model.get_layer('block5_pool').output,
model.output]
)
Test
image = np.random.rand(1, 180, 180, 3).astype(np.float32)
with tf.GradientTape() as tape:
convOutputs, predictions = grad_model(tf.cast(image, tf.float32))
loss = predictions[:, tf.argmax(predictions[0])]
grads = tape.gradient(loss, convOutputs)
print(grads)
tf.Tensor(
[[[[ 9.8454033e-04 3.6991197e-03 ... -1.2012678e-02
-1.7934230e-03 2.2925171e-03]
[ 1.6165405e-03 -1.9513096e-03 ... -2.5789393e-03
1.2443252e-03 -1.3931725e-03]
[-2.0554627e-04 1.2232144e-03 ... 5.2324748e-03
3.1955825e-04 3.4566019e-03]
[ 2.3650150e-03 -2.5699558e-03 ... -2.4103196e-03
5.8940407e-03 5.3285398e-03]
...

Related

How to display the layers of a pretrained model instead of a single entry in model.summary() output?

As the title clearly describes the question, I want to display the layers of a pretained model instead of a single entry (please see the vgg19 (Functional) entry below) in model.summary() function output?
Here is a sample model that is implemented using the Keras Sequential API:
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
model.add(base_model)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))
And here is the output of the model.summary() function call:
Model: "sequential_15"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 512) 20024384
_________________________________________________________________
flatten_15 (Flatten) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_22 (Dense) (None, 10) 10010
=================================================================
Total params: 20,547,394
Trainable params: 523,010
Non-trainable params: 20,024,384
Edit: Here is the Functional API equivalent of the implemented Sequential API model - the result is the same:
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3), pooling='max', classes=10)
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
x = Flatten()(base_out)
x = Dense(1_000, activation='relu')(x)
m_outputs = Dense(10, activation='softmax')(x)
model = Model(inputs=m_inputs, outputs=m_outputs)
Instead of using the Sequential, I tried using the Functional API i.e. the tf.keras.models.Model class, like,
import tensorflow as tf
base_model = tf.keras.applications.VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
x = tf.keras.layers.Flatten()( base_model.output )
x = tf.keras.layers.Dense(1_000, activation='relu')( x )
outputs = tf.keras.layers.Dense(10, activation='softmax')( x )
model = tf.keras.models.Model( base_model.input , outputs )
model.summary()
The output of the above snippet,
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 32, 32, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 32, 32, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 32, 32, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 16, 16, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 16, 16, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 16, 16, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 8, 8, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 8, 8, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 8, 8, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 4, 4, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 2, 2, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 1, 1, 512) 0
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 512) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 1000) 513000
_________________________________________________________________
dense_3 (Dense) (None, 10) 10010
=================================================================
Total params: 15,237,698
Trainable params: 15,237,698
Non-trainable params: 0
_________________________________________________________________
My understanding after going through the docs and running a few tests (via TF 2.5.0) is that when such a model is included in another model, Keras conceives of it as a "black box". It is not a simple layer, definitely no tensor, basically of complex type tensorflow.python.keras.engine.functional.Functional.
I reckon this is the underlying reason that you can not print it out in a detailed way as part of the model summary.
Now, if you'd like to just review the pre-trained model, have a sneak peak etc., you can simply run:
base_model.summary()
or after constructing your model (sequential or functional, doesn't matter at this point):
model.layers[i].summary() # i: the index of your pre-trained model
If you need to access the pre-trained model's layers, e.g. to use its weights separately etc., you can access them with this way as well.
If you'd like to print the layers of your model as a whole, then you need to trick Keras into beliving the "black box" is no stranger but just yet another KerasTensor. In order to do that, you can wrap the pre-trained model in another layer -in other words, connect them directly via Functional API-, which was suggested above and has worked fine for me.
x = tf.keras.layers.Flatten()( base_model.output )
I don't know if there is any specific reason that you'd like to pursue the new input route as in...
m_inputs = Input(shape=(32, 32, 3))
base_out = base_model(m_inputs)
Whenever you locate the pre-trained model in the middle of your new model, as coming after the new Input layer or adding it to a Sequential model per se, the layers within would disappear from the summary output.
Generating a new Input layer or just feeding the pre-trained model's output as input to the current model didn't make any difference for me in this case.
Hope this clarifies the topic a wee bit more, and helps.
This should do what you want to do
base_model = VGG16(include_top=False, weights=None, input_shape=(32, 32, 3), pooling='max', classes=10)
model = Sequential()
for layer in base_model.layers:
layer.trainable = False
model.add(layer)
model.add(Flatten())
model.add(Dense(1_000, activation='relu'))
model.add(Dense(10, activation='softmax'))

KERAS: Pretrained a CNN+Dense model. How to freeze CNN weights and substitute Dense with LSTM?

I trained and load a cnn+dense model:
# load model
cnn_model = load_model('my_cnn_model.h5')
cnn_model.summary()
The output is this (I have images dimension 2 X 3600):
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 2, 3600, 32) 128
_________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 1800, 32) 3104
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 2, 600, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 2, 600, 64) 6208
_________________________________________________________________
conv2d_4 (Conv2D) (None, 2, 300, 64) 12352
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 100, 64) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 2, 100, 128) 24704
_________________________________________________________________
conv2d_6 (Conv2D) (None, 2, 50, 128) 49280
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 2, 16, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4096) 0
_________________________________________________________________
dense_1 (Dense) (None, 1024) 4195328
_________________________________________________________________
dense_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
dense_3 (Dense) (None, 3) 3075
=================================================================
Total params: 5,343,779
Trainable params: 5,343,779
Non-trainable params: 0
Now, what I want is to leave weights up to flatten and replace dense layers with LSTM to train the added LSTM part.
I just wrote:
# freeze model
base_model = cnn_model(input_shape=(2, 3600, 1))
#base_model = cnn_model
base_model.trainable = False
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(base_model.output)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(x)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
But I obtained:
base_model = cnn_model(input_shape=(2, 3600, 1))
TypeError: __call__() missing 1 required positional argument: 'inputs'
I know I have to add TimeDistributed ideally in the Flatten layer, but I do not know how to do.
Moreover I'm not sure about base_model.trainable = False if it do exactly what I want.
Can you please help me to do the job?
Thank you very much!
You can't directly take the output from Flatten(), LSTM needs 2-d features (time, filters). You have to reshape your tensors.
You can take the output from the layer before flatten (max-pooling), let's say this layer has index i in the model, we can take the output from that layer and reshape it based on our needs and pass it to LSTM.
before_flatten = base_model.layers[i].output # i is the index of the layer from which you want to take the model output
conv2lstm_reshape = Reshape((-1, 2))(before_flatten) # you have to select it, the temporal dim and filters
# Adding the first lstm layer
x = LSTM(1024,activation='relu',return_sequences='True')(conv2lstm_reshape)
# Adding the second lstm layer
x = LSTM(1024, activation='relu',return_sequences='False')(x)
# Adding the output
output = Dense(3,activation='linear')(before_flatten)
# Final model creation
model = Model(inputs=[base_model.input], outputs=[output])
model.summary()

How to access immediate activations of custom model containing a pretrained-model?

I have a custom network of a Keras Xception base with added regression head:
pretrained_model = tf.keras.applications.Xception(input_shape=[244, 244, 3], include_top=False, weights='imagenet')
pretrained_model.trainable = True
model = tf.keras.Sequential([
pretrained_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='tanh')
])
The model summary:
Layer (type) Output Shape Param #
=================================================================
xception (Model) (None, 7, 7, 2048) 20861480
_________________________________________________________________
global_average_pooling2d_3 ( (None, 2048) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 2048) 0
_________________________________________________________________
dense_6 (Dense) (None, 32) 65568
_________________________________________________________________
dropout_5 (Dropout) (None, 32) 0
_________________________________________________________________
dense_7 (Dense) (None, 1) 33
=================================================================
Total params: 20,927,081
Trainable params: 20,872,553
Non-trainable params: 54,528
I want to get the last activations from the xception(model) layer.
The details of xception:
Model: "xception"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 111, 111, 32) 864 input_4[0][0]
__________________________________________________________________________________________________
...
__________________________________________________________________________________________________
block14_sepconv2 (SeparableConv (None, 7, 7, 2048) 3159552 block14_sepconv1_act[0][0]
__________________________________________________________________________________________________
block14_sepconv2_bn (BatchNorma (None, 7, 7, 2048) 8192 block14_sepconv2[0][0]
__________________________________________________________________________________________________
block14_sepconv2_act (Activatio (None, 7, 7, 2048) 0 block14_sepconv2_bn[0][0]
==================================================================================================
Total params: 20,861,480
Trainable params: 20,806,952
Non-trainable params: 54,528
To reference the last activation layer I have to use :
model.layers[0].get_layer('block14_sepconv2_act').output
since explicitly my 'model' does not contain the 'block14_sepconv2_act' layer.
To access activation I want to use the code below:
activations = tf.keras.Model(model.inputs,model.layers[0].get_layer('block14_sepconv2_act').output)
activations(sample)
but I get the error:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_4_1:0", shape=(None, 224, 224, 3), dtype=float32) at layer "input_4". The following previous layers were accessed without issue: []
My question is how can I access the intermediate layer outputs of a pretrained model if added in this way to the custom model?

Visualizing ConvNet filters using fine-tuned network resulting in a “NoneType” while calculating gradients using K.gradients

The following is the architecture of fine-tuned network with VGG16 as Base Model.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
model_1 (Model) (None, 25088) 14714688
_________________________________________________________________
dense_1 (Dense) (None, 512) 12845568
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 1) 513
=================================================================
Total params: 27,823,425
Trainable params: 26,087,937
Non-trainable params: 1,735,488
_________________________________________________________________
I am trying to visualize gradients of input with respect to loss and 'block5_conv3' wrt to output. Using the
def build_backprop(model, loss):
# Gradient of the input image with respect to the loss function
gradients = K.gradients(loss, model.input)[0]
# Normalize the gradients
gradients /= (K.sqrt(K.mean(K.square(gradients))) + 1e-5)
# Keras function to calculate the gradients and loss
return K.function([model.input], [loss, gradients])
# Input wrt to loss
# Loss function that optimizes one class
loss_function = K.mean(model.get_layer('dense_3').output)
# Backprop function
backprop = build_backprop(model.get_layer('model_1').get_layer('input_1'), loss_function)
# block5_conv3 wrt to output
K.gradients(model.get_layer("dense_3").output, model.get_layer("model_1").get_layer("block5_conv3").output)[0])
Both above return AttributeError: 'NoneType' object has no attribute 'dtype' implying that in both cases K.gradients output is None.
What could be cause for gradients to be result in None?
Any ways to resolve such error?
Update
The issue of None gets resolved only if we convert Sequential API to Functional API.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_10 (Dense) (None, 512) 12845568
_________________________________________________________________
dropout_7 (Dropout) (None, 512) 0
_________________________________________________________________
dense_11 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_8 (Dropout) (None, 512) 0
_________________________________________________________________
dense_12 (Dense) (None, 2) 1026
=================================================================
Total params: 27,823,938
Trainable params: 20,188,674
Non-trainable params: 7,635,264
_________________________________________________________________
New architecture after change. Now the error is all the gradients come 0s.
For e.g.
preds = model.predict(x)
class_idx = np.argmax(preds[0])
class_output = model.output[:, class_idx]
last_conv_layer = model.get_layer("block5_conv3")
grads = K.gradients(class_output, last_conv_layer.output)[0]
pooled_grads = K.mean(grads, axis=(0, 1, 2))
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([x])
for i in range(512):
conv_layer_output_value[:, :, i] *= pooled_grads_value[i]
The output of pooled_grads_value and conv_layer_output_value are all zeros.
I was able to solve both the questions.
Question 1: Both above return AttributeError: 'NoneType' object has no attribute 'dtype' implying that in both cases K.gradients output is None.
The problem here was the model was sequential but after converting from Sequential to Functional this problem vanished and new one appeared.
Question 2: The output of pooled_grads_value and conv_layer_output_value are all zeros.
I resolved this problem by converting last softmax layer to linear layer.
Here is the code
from vis.utils import utils
from keras import activations
# Utility to search for layer index by name.
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'dense_12')
# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)
This swap worked perfectly fine and I obtain desired results.
Although, now the only part is I don't understand as to why it doesn't work for softmax? Will it work if we replace last layer from softmax to sigmoid of 1 output?

Adding Conv Layer in front of pretrained model gives ValueError

I want to combine a pretrained VGG16 model with a special input block, which is an input layer and a convolutional layer. The goal is to use a pre-trained RGB VGG16 imagenet model on grayscale images:
from keras.applications.vgg16 import VGG16
from keras.layers.convolutional import Conv2D
from keras.layers import Input
from keras.models import Model
img_height = 299
img_width = 299
def input_block(img_height = 299, img_width = 299):
input_shape = (img_height, img_width, 1)
img_input = Input(shape=input_shape, name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
return x
pretrained_model = VGG16(weights = 'imagenet', include_top=False, input_tensor = input_block(img_height, img_width))
When I set the weight initalization of VGG16() to 'None', the model builds correctly, with the following desired structure:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
grayscale_input_layer (Input (None, 299, 299, 1) 0
_________________________________________________________________
grayscale_RGB_layer (Conv2D) (None, 299, 299, 3) 30
_________________________________________________________________
block1_conv1 (Conv2D) (None, 299, 299, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 299, 299, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 149, 149, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 149, 149, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 149, 149, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 74, 74, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 74, 74, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 74, 74, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 74, 74, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 37, 37, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 37, 37, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 37, 37, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 37, 37, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 18, 18, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 18, 18, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 9, 9, 512) 0
=================================================================
Total params: 14,714,718
Trainable params: 14,714,718
Non-trainable params: 0
_________________________________________________________________
None
However, when I set the weight initialization to 'imagenet',
I get the following error:
ValueError: You are trying to load a weight file containing 13 layers into a model with 14 layers.
This error makes sense, since I have added two layers in front of the VGG16 model instead of a single layer.
As a workaround, I have tried the following:
def input_block_model(img_height = 299, img_width = 299):
input_shape = (img_height, img_width, 1)
img_input = Input(shape=input_shape, name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
model = Model(img_input, x, name='input_block_model')
return model
input_model = input_block_model(299,299)
pretrained_model = VGG16(weights = "imagenet", include_top=False)
combined_model = Model(input_model.input,
pretrained_model(input_model.output))
print(combined_model.summary())
Then, the model structure is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
grayscale_input_layer (Input (None, 299, 299, 1) 0
_________________________________________________________________
grayscale_RGB_layer (Conv2D) (None, 299, 299, 3) 30
_________________________________________________________________
vgg16 (Model) multiple 14714688
=================================================================
Total params: 14,714,718
Trainable params: 14,714,718
Non-trainable params: 0
_________________________________________________________________
None
The disadvantage of this structure, is that I cannot set properties of layers within the VGG16 model. I want to freeze certain layers for example in this model, which I cannot access via combined_model.layers. Does anyone have a working solution, such that I get the model structure as with the 'None' initialization, but with pretrained ImageNet weights?
You can freeze or train layers using combined_model.layers[2].layers as mentioned in the comment above. You can may be simplify the model as follows:
```
img_input = Input(shape=(img_height, img_width, 1), name = 'grayscale_input_layer')
x = Conv2D(3, (3,3), padding= 'same', name = 'grayscale_RGB_layer')(img_input)
x = VGG16(weights = None, include_top=False)(x)
model = Model(img_input, x)
model.summary()
for layer in model.layers[2].layers:
layer.trainable = False
```

Resources