Tensorflow tape.gradient is NONE when applying Grad-CAM on transfer learning model in model - keras

I am using transfer learning to change my inner regression model with a outer model for classification.
The inner model is trained on the MNIST dataset where it had to predict ones from zeros. (Note that this is just an example to test the method before changing it to a more complex models and data)
I then freeze the inner model to prevent the regression output of the inner model from being changed while learning outer model.
The outer model classifies which instance the input is. So the first image of my batch should become the first class and with Tensorflow's Keras this works fine.
Then I remove the outer model's softmax activation function and then apply Grad-CAM.
Grad-CAM is applied on the final layer of the inner model (this is a 2D Conv layer).
But the grads = tape.gradient(class_channel, last_conv_layer_output) line in the code does produce only NONE.
Any help would be appreciated (I hope enough code is provided to reproduce the error ;-))
I have tried the steps from Grad-CAM Transfer learning error: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor but to no avail. In addition allowing the inner model to be retrained also produces the same problem. I am aware of the import problems with Keras and TF with regards with tape.gradient.
Libaries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
Preprocess dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train / 255
x_test = x_test / 255
# Select only ones and zeros.
x_train_zeros = x_train[y_train == 0]
x_train_ones = x_train[y_train == 1]
x_test_zeros = x_test[y_test == 0]
x_test_ones = x_test[y_test == 1]
# X and Y need to have same length
y_train_ones = x_train_ones[:len(x_train_zeros)]
y_test_ones = x_test_ones[:len(x_test_zeros)]
Model: "inner"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0 []
conv2d (Conv2D) (None, 28, 28, 32) 320 ['input_1[0][0]']
add (Add) (None, 28, 28, 32) 0 ['input_1[0][0]',
'conv2d[0][0]']
conv2d_1 (Conv2D) (None, 28, 28, 32) 9248 ['add[0][0]']
add_1 (Add) (None, 28, 28, 32) 0 ['add[0][0]',
'conv2d_1[0][0]']
conv2d_2 (Conv2D) (None, 28, 28, 32) 9248 ['add_1[0][0]']
add_2 (Add) (None, 28, 28, 32) 0 ['add_1[0][0]',
'conv2d_2[0][0]']
conv2d_3 (Conv2D) (None, 28, 28, 32) 9248 ['add_2[0][0]']
add_3 (Add) (None, 28, 28, 32) 0 ['add_2[0][0]',
'conv2d_3[0][0]']
inner_final (Conv2D) (None, 28, 28, 1) 289 ['add_3[0][0]']
==================================================================================================
Total params: 28,353
Trainable params: 28,353
Non-trainable params: 0
__________________________________________________________________________________________________
history = inner_model.fit(x_train_zeros, y_train_ones, batch_size = batch_size, epochs = epochs, validation_data = (x_test_zeros, y_test_ones))
One hot encode the output for the classification
batch_size_classify = 40 # I want to "classify" the first 40 images.
X_train_class = x_train_zeros[: batch_size_classify]
# Produce "class labels"
Y_test = np.arange(0, batch_size_classify, 1, dtype=int)
from numpy import array
from numpy import argmax
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
values = array(Y_test)
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
# binary encode
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
# invert first example
#inverted = label_encoder.inverse_transform([argmax(onehot_encoded[0, :])])
X_train_class = tf.convert_to_tensor(X_train_class)
onehot_encoded = tf.convert_to_tensor(onehot_encoded)
Build the outer model
# prevent changing the inner model
inner_model.trainable = False
inp = tf.keras.layers.Input(shape=(28, 28, 1))
x = inner_model(inp)
x = tf.keras.layers.Flatten()(x)
outputs = tf.keras.layers.Dense(batch_size_2, activation = 'softmax')(x)
Outer_model = tf.keras.Model(inp, outputs)
Outer_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
Model: "Outer"
__________________________________________________________________________________________________
Layer (type) Output Shape Param #
==================================================================================================
input_1 (InputLayer) [(None, 28, 28, 1)] 0
inner (Functional) (None, 28, 28, 1) 28353
|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|
| input_1 (InputLayer) [(None, 28, 28, 1)] 0 |
| |
| conv2d (Conv2D) (None, 28, 28, 32) 320 |
| |
| add (Add) (None, 28, 28, 32) 0 |
| |
| conv2d_1 (Conv2D) (None, 28, 28, 32) 9248 |
| |
| add_1 (Add) (None, 28, 28, 32) 0 |
| |
| conv2d_2 (Conv2D) (None, 28, 28, 32) 9248 |
| |
| add_2 (Add) (None, 28, 28, 32) 0 |
| |
| conv2d_3 (Conv2D) (None, 28, 28, 32) 9248 |
| |
| add_3 (Add) (None, 28, 28, 32) 0 |
| |
| inner_final (Conv2D) (None, 28, 28, 1) 289 |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
flatten_2 (Flatten) (None, 784) 0
dense_2 (Dense) (None, 40) 31400
==================================================================================================
Total params: 59,753
Trainable params: 31,400
Non-trainable params: 28,353
__________________________________________________________________________________________________
epochs = 20
# Fit the model to the training data.
Outer_model.fit(X_train_class, onehot_encoded, batch_size = 1, epochs = epochs)
Grad-CAM where the grads = tape.gradient(class_channel, last_conv_layer_output) line produces NONE.
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
# First, we create a model that maps the input image to the activations of the last conv layer as well as the output predictions
grad_model = tf.keras.models.Model([model.inputs], [model.get_layer(last_conv_layer_name).output, model.output])
# Then, we compute the gradient of the top predicted class for our input
# image with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
last_conv_layer_output, preds = grad_model(img_array)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
# This is the gradient of the output neuron (top predicted or chosen)
# with regard to the output feature map of the last conv layer
# The line in Grad-CAM where the code produces NONE...
grads = tape.gradient(class_channel, last_conv_layer_output)
# This is a vector where each entry is the mean intensity
#of the gradient over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis = (0, 1, 2))
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the top predicted class
# then sum all the channels to obtain the heatmap class activation
last_conv_layer_output = last_conv_layer_output[0]
heatmap = last_conv_layer_output # pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# For visualization purpose, we will also normalize the heatmap between 0 & 1
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()
last_conv_layer_name = "inner"
# select first image to test Grad-CAM onto.
img_array = X_train_classs[:1]
# Remove softmax activation function.
Outer_model.layers[-1].activation = None
img_array = tf.expand_dims(img_array, axis = 3)
heatmap = make_gradcam_heatmap(img_array, Outer_model, last_conv_layer_name)
Cell output:
ValueError Traceback (most recent call last)
<ipython-input-44-4dea4cd32c74> in <module>
7 img_array = tf.expand_dims(img_array, axis = 3)
8
----> 9 heatmap = make_gradcam_heatmap(img_array, U_model, last_conv_layer_name)
10
11 # Display heatmap
2 frames
<ipython-input-19-f9b93747c0ae> in make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index)
17
18 # This is a vector where each entry is the mean intensity of the gradient over a specific feature map channel
---> 19 pooled_grads = tf.reduce_mean(grads, axis = (0, 1, 2))
20
21 # We multiply each channel in the feature map array
/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py in error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
103
104
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

Related

Keras Scaled-Dot Attention Model Value Error

I am trying to create a Scaled Dot Attention model using keras utility layer collection found at https://github.com/zimmerrol/keras-utility-layer-collection. I am using the default model (code below) provided in the examples notebook in the github repo. My problem is I get the following error :
ValueError: Input 0 is incompatible with layer model_1: expected
shape=(None, 500, 3), found shape=(500, 3)
The code is below
# input: time series with 500 steps
# each step has a 3dim valuethe output sequences
# of a LSTM, RNN, etc.
net_input = Input(shape=(500, 3))
net = TimeDistributed(Dense(3))(net_input)
# queries
net_q = TimeDistributed(Dense(3))(net_input)
# values
net_v = TimeDistributed(Dense(3))(net_input)
# keys
net_k = TimeDistributed(Dense(3))(net_input)
# add one ScaledDotProductAttention layer
net = attention.ScaledDotProductAttention(name="attention", return_attention=False)([net_q, net_v, net_k])
net_output = TimeDistributed(Dense(2))(net)
model = Model(inputs=net_input, outputs=net_output)
model.summary()
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics='accuracy')
# dummy data
# x = np.random.rand(64, 500, 3)
# y = np.random.rand(64, 500, 2)
# X, y = make_multilabel_classification(n_samples=500, n_features=3, n_classes=3, n_labels=2, random_state=1)
X, y = make_classification(n_samples=500, n_features=3, n_redundant=0, n_repeated=0, n_informative=3,
n_classes=3, random_state=1)
print(X.shape, y.shape)
model.fit(X, y, batch_size=500, epochs=1)
EDIT :
The summary of the model is also added below
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 500, 3)] 0
__________________________________________________________________________________________________
time_distributed_5 (TimeDistrib (None, 500, 3) 12 input_2[0][0]
__________________________________________________________________________________________________
time_distributed_6 (TimeDistrib (None, 500, 3) 12 input_2[0][0]
__________________________________________________________________________________________________
time_distributed_7 (TimeDistrib (None, 500, 3) 12 input_2[0][0]
__________________________________________________________________________________________________
attention (ScaledDotProductAtte (None, 500, 3) 0 time_distributed_5[0][0]
time_distributed_6[0][0]
time_distributed_7[0][0]
__________________________________________________________________________________________________
time_distributed_8 (TimeDistrib (None, 500, 2) 8 attention[0][0]
==================================================================================================
Total params: 44
Trainable params: 44
Non-trainable params: 0
__________________________________________________________________________________________________
Any help would be appreciated. Thanks!

Specific options missing in keras layer class

I would like to implement operations on the results of two keras conv2d layers (Ix,Iy) in a deep learning architecture for a computer vision task. The operation looks as follows:
G = np.hypot(Ix, Iy)
G = G / G.max() * 255
theta = np.arctan2(Iy, Ix)
I've spent some time looking for operations provided by keras but did not have success so far. Among a few others, there's a "add" functionality that allows the user to add the results of two conv2d layers (tf.keras.layers.Add(Ix,Iy)). However, I would like to have a Pythagorean addition (first line) followed by a arctan2 operation (third line).
So ideally, if already implemented by keras it would look as follows:
tf.keras.layers.Hypot(Ix,Iy)
tf.keras.layers.Arctan2(Ix,Iy)
Does anyone know if it is possible to implement those functionalities within my deep learning architecture? Is it possible to write custom layers that meet my needs?
You could probably use simple Lambda layers for your use case, although they are not absolutely necessary:
import tensorflow as tf
inputs = tf.keras.layers.Input((16, 16, 1))
x = tf.keras.layers.Conv2D(32, (3, 3), padding='same')(inputs)
y = tf.keras.layers.Conv2D(32, (2, 2), padding='same')(inputs)
hypot = tf.keras.layers.Lambda(lambda z: tf.math.sqrt(tf.math.square(z[0]) + tf.math.square(z[1])))([x, y])
hypot = tf.keras.layers.Lambda(lambda z: z / tf.reduce_max(z) * 255)(hypot)
atan2 = tf.keras.layers.Lambda(lambda z: tf.math.atan2(z[0], z[1]))([x, y])
model = tf.keras.Model(inputs, [hypot, atan2])
print(model.summary())
model.compile(optimizer='adam', loss='mse')
model.fit(tf.random.normal((64, 16, 16, 1)), [tf.random.normal((64, 16, 16, 32)), tf.random.normal((64, 16, 16, 32))])
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 16, 16, 1)] 0 []
conv2d_2 (Conv2D) (None, 16, 16, 32) 320 ['input_3[0][0]']
conv2d_3 (Conv2D) (None, 16, 16, 32) 160 ['input_3[0][0]']
lambda_2 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
'conv2d_3[0][0]']
lambda_3 (Lambda) (None, 16, 16, 32) 0 ['lambda_2[0][0]']
lambda_4 (Lambda) (None, 16, 16, 32) 0 ['conv2d_2[0][0]',
'conv2d_3[0][0]']
==================================================================================================
Total params: 480
Trainable params: 480
Non-trainable params: 0
__________________________________________________________________________________________________
None
2/2 [==============================] - 1s 71ms/step - loss: 3006.0469 - lambda_3_loss: 3001.7981 - lambda_4_loss: 4.2489
<keras.callbacks.History at 0x7ffa93dc2890>

the shape of the output of keras.layers.concatenate

I have a list of dense layers with same output shapes [batch, 1]. If I combine the outputs of these layers with keras.layers.concatenate(), what would the shape be?
dense_layers = [Dense(1), Dense(1), Dense(1)] #some dense layers
merged_output = keras.layers.concatenate([dense_layers])
Would the shape of merged_output be (batch, 3) or(3, 1)?
The answer is (batch,3). To see this, you can build a model and print model.summary():
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import concatenate
batch = 30
# define three sets of inputs
input1 = Input(shape=(batch,1))
input2 = Input(shape=(batch,1))
input3 = Input(shape=(batch,1))
# define three dense layers
layer1 = Dense(1)(input1)
layer2 = Dense(1)(input2)
layer3 = Dense(1)(input3)
# concatenate layers
dense_layers = [layer1, layer2, layer3]
merged_output = concatenate(dense_layers)
# create a model and check for output shape
model = Model(inputs=[input1, input2, input3], outputs=merged_output)
model.summary()
Layer (type) Output Shape Param # Connected to
=============================================================================
input_1 (InputLayer) (None, 30, 1) 0
_______________________________________________________________________________
input_2 (InputLayer) (None, 30, 1) 0
_______________________________________________________________________________
input_3 (InputLayer) (None, 30, 1) 0
_______________________________________________________________________________
dense_1 (Dense) (None, 30, 1) 2 input_1[0][0]
_______________________________________________________________________________
dense_2 (Dense) (None, 30, 1) 2 input_2[0][0]
_______________________________________________________________________________
dense_3 (Dense) (None, 30, 1) 2 input_3[0][0]
_______________________________________________________________________________
concatenate_1 (Concatenate) (None, 30, 3) 0 dense_1[0][0]
dense_2[0][0]
dense_3[0][0]
==============================================================================
Total params: 6
Trainable params: 6
Non-trainable params: 0
______________________________________________________________________________

Keras LSTM and Embedding Concatenate Array Dimensions Issue

I am building an LSTM network for multivariate time series classification using 2 categorical features which I have created Embedding layers for in Keras. The model compiles and the architecture is displayed below with code. I am getting a ValueError: all the input array dimensions except for the concatenation axis must match exactly. This is strange to me because of model compiling and the output shapes seem to match (3D alignment concatenated along axis = -1). The model fit X parameters are a list of 3 inputs (first categorical variable array, second categorical variable array, and multivariate time series input 3-D for LSTM)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
input_5 (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
VAR_1 (Embedding) (None, 46, 5) 50 input_4[0][0]
__________________________________________________________________________________________________
VAR_2 (Embedding) (None, 46, 13) 338 input_5[0][0]
__________________________________________________________________________________________________
time_series (InputLayer) (None, 46, 11) 0
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 46, 18) 0 VAR_1[0][0]
VAR_2[0][0]
__________________________________________________________________________________________________
concatenate_4 (Concatenate) (None, 46, 29) 0 time_series[0][0]
concatenate_3[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) (None, 46, 100) 52000 concatenate_4[0][0]
__________________________________________________________________________________________________
attention_2 (Attention) (None, 100) 146 lstm_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 101 attention_2[0][0]
==================================================================================================
Total params: 52,635
Trainable params: 52,635
Non-trainable params: 0
n_timesteps = 46
n_features = 11
def EmbeddingNet(cat_vars,n_timesteps,n_features,embedding_sizes):
inputs = []
embed_layers = []
for (c, (in_size, out_size)) in zip(cat_vars, embedding_sizes):
i = Input(shape=(1,))
o = Embedding(in_size, out_size, input_length=n_timesteps, name=c)(i)
inputs.append(i)
embed_layers.append(o)
embed = Concatenate()(embed_layers)
time_series_input = Input(batch_shape=(None,n_timesteps,n_features ), name='time_series')
inputs.append(time_series_input)
concatenated_inputs = Concatenate(axis=-1)([time_series_input, embed])
lstm_layer1 = LSTM(units=100,return_sequences=True)(concatenated_inputs)
attention = Attention()(lstm_layer1)
output_layer = Dense(1, activation="sigmoid")(attention)
opt = Adam(lr=0.001)
model = Model(inputs=inputs, outputs=output_layer)
model.compile(loss='binary_crossentropy',optimizer=opt,metrics=['accuracy'])
model.summary()
return model
model = EmbeddingNet(cat_vars,n_timesteps,n_features,embedding_sizes)
history = model.fit(x=[x_train_cat_array[0],x_train_cat_array[1],x_train_input], y=y_train_input, batch_size=8, epochs=1, verbose=1, validation_data=([x_val_cat_array[0],x_val_cat_array[1],x_val_input], y_val_input),shuffle=False)
I'm trying to the the very same thing. You should concatenate over axis 2. Please check HERE
Let me know if this works in your dataset, because categorical features are not giving any benefit to me.

ValueError with Dense() layer in Keras

I'm trying to train a model that takes in two inputs, concatenates them, and feeds the result into an LSTM. The last layer is a Dense() call, and the targets are binary vectors (with more than one 1). The task is classification.
My input sequences are 50 rows of 23 timesteps with 5625 features (x_train), and my supplementary input (not really a sequence) is 50 one-hot rows of length 23 (total_hours)
The error I'm getting is:
ValueError: Error when checking target: expected dense_1 to have shape (1, 5625)
but got array with shape (5625, 1)
And my code is:
import numpy as np
from keras.layers import LSTM, Dense, Input, Concatenate
from keras.models import Model
#CREATING DUMMY INPUT
hours_input_1 = np.eye(23)
hours_input_2 = np.eye(23)
hours_input_3 = np.pad(np.eye(4), pad_width=((0, 19), (0, 19)), mode='constant')
hours_input_3 = hours_input_3[:4,]
total_hours = np.vstack((hours_input_1, hours_input_2, hours_input_3))
seq_input = np.random.normal(size=(50, 24, 5625))
y_train = np.array([seq_input[i, -1, :] for i in range(50)])
x_train = np.array([seq_input[i, :-1, :] for i in range(50)])
#print 'total_hours', total_hours.shape #(50, 23)
#print 'x_train', x_train.shape #(50, 23, 5625)
#print 'y_train shape', y_train.shape #(50, 5625)
#MODEL DEFINITION
seq_model_in = Input(shape=(1,), batch_shape=(1, 1, 5625))
hours_model_in = Input(shape=(1,), batch_shape=(1, 1, 1))
merged = Concatenate(axis=-1)([seq_model_in, hours_model_in])
#print merged.shape #(1, 1, 5626) = added the 'hour' on as an extra feature
merged_lstm = LSTM(10, batch_input_shape=(1, 1, 5625), return_sequences=False, stateful=True)(merged)
merged_dense = Dense(5625, activation='sigmoid')(merged_lstm)
model = Model(inputs=[seq_model_in, hours_model_in], outputs=merged_dense)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#TRAINING
for epoch in range(10):
for i in range(50):
y_true = y_train[i,:]
for j in range(23):
input_1 = np.expand_dims(np.expand_dims(x_train[i][j], axis=1), axis=1)
input_1 = np.reshape(input_1, (1, 1, x_train.shape[2]))
input_2 = np.expand_dims(np.expand_dims(np.array([total_hours[i][j]]), axis=1), axis=1)
tr_loss, tr_acc = model.train_on_batch([input_1, input_2], y_true)#np.array([y_true]))
model.reset_states()
My model.summary() looks like this:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (1, 1, 5625) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (1, 1, 1) 0
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (1, 1, 5626) 0 input_1[0][0]
input_2[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (1, 10) 225480 concatenate_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (1, 5625) 61875 lstm_1[0][0]
==================================================================================================
Total params: 287,355
Trainable params: 287,355
Non-trainable params: 0
__________________________________________________________________________________________________
I am working with Keras version 2.1.2 with the TensorFlow backend (TensorFlow version 1.4.0. How can I resolve the ValueError?
It turns out I needed to address the target, as the ValueError implied.
If you replace:
y_true = y_train[i,:]
with:
y_true_1 = np.expand_dims(y_train[i,:], axis=1)
y_true = np.swapaxes(y_true_1, 0, 1)
The code runs.

Resources