How to obtain the Jacobian Matrix with respect to the inputs of a keras model neural network? - tensorflow2.x

I recently started learning and using automatic differentiation to determine the gradients and jacobian matrix of a neural network with respect to a given input. The method suggested by tensorflow is the tape.gradient and tape.jacobian method. However, I am not able to obtain the jacobian matrix using this method due to some bug in tensorflow. It works when I calculated tape.gradient(y_pred, x), but not the jacobian matrix, which should have a shape of (200,3). I am open to other ways to calculate the jacobian matrix, but I am more inclined to use automatic differentiation methods within Tensorflow. The current version I am using is Tensorflow 2.1.0. Greatly appreciate any advice!
import tensorflow as tf
import numpy as np
# The neural network accepts 3 inputs and produces 200 outputs. The actual values of the inputs and outputs are not written in the code as it is too involved.
num_inputs = 3
num_outputs = 200
num_hidden_layers = 5
num_neurons = 50
kernel = 'he_uniform'
activation = tf.keras.layers.LeakyReLU(alpha=0.3)
# Details of model (MLP)
current_model = tf.keras.models.Sequential()
current_model.add(tf.keras.Input(shape=(num_inputs,)))
for i in range(num_hidden_layers):
current_model.add(tf.keras.layers.Dense(units=num_neurons, activation=activation, kernel_initializer=kernel))
current_model.add(tf.keras.layers.Dense(units=num_outputs, activation='linear', kernel_initializer=kernel))
# Finding the Jacobian matrix with respect to a given input of the neural network
# In this case, the inputs are [0.02, 0.4 and 0.12] (i.e. 3 inputs)
x = tf.Variable([[0.02, 0.4, 0.12]], dtype=tf.float32)
with tf.GradientTape() as tape:
y_pred = x
for layer in current_model.layers:
y_pred = layer(y_pred)
jacobian = tape.jacobian(y_pred, x)
print(jacobian)
Below is the error returned. I removed some parts for privacy purposes.
StagingError: in converted code:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:183 f *
return _pfor_impl(loop_fn, iters, parallel_iterations=parallel_iterations)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\control_flow_ops.py:256 _pfor_impl
outputs.append(converter.convert(loop_fn_output))
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1280 convert
output = self._convert_helper(y)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\ops\parallel_for\pfor.py:1453 _convert_helper
if flags.FLAGS.op_conversion_fallback_to_while_loop:
C:\Users\...\anaconda3\envs\tf\lib\site-packages\tensorflow_core\python\platform\flags.py:84 __getattr__
wrapped(_sys.argv)
C:\Users\...\anaconda3\envs\tf\lib\site-packages\absl\flags\_flagvalues.py:633 __call__
name, value, suggestions=suggestions)
UnrecognizedFlagError: Unknown command line flag 'f'

Related

mse loss function not compatible with regularization loss (add_loss) on hidden layer output

I would like to code in tf.Keras a Neural Network with a couple of loss functions. One is a standard mse (mean squared error) with a factor loading, while the other is basically a regularization term on the output of a hidden layer. This second loss is added through self.add_loss() in a user-defined class inheriting from tf.keras.layers.Layer. I have a couple of questions (the first is more important though).
1) The error I get when trying to combine the two losses together is the following:
ValueError: Shapes must be equal rank, but are 0 and 1
From merging shape 0 with other shapes. for '{{node AddN}} = AddN[N=2, T=DT_FLOAT](loss/weighted_loss/value, model/new_layer/mul_1)' with input shapes: [], [100].
So it comes from the fact that the tensors which should add up to make one unique loss value have different shapes (and ranks). Still, when I try to print the losses during the training, I clearly see that the vectors returned as losses have shape batch_size and rank 1. Could it be that when the 2 losses are summed I have to provide them (or at least the loss of add_loss) as scalar? I know the mse is usually returned as a vector where each entry is the mse from one sample in the batch, hence having batch_size as shape. I think I tried to do the same with the "regularization" loss. Do you have an explanation for this behavio(u)r?
The sample code which gives me error is the following:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
def rate_mse(rate=1e5):
#tf.function # also needed for printing
def loss(y_true, y_pred):
tmp = rate*K.mean(K.square(y_pred - y_true), axis=-1)
# tf.print('shape %s and rank %s output in mse'%(K.shape(tmp), tf.rank(tmp)))
tf.print('shape and rank output in mse',[K.shape(tmp), tf.rank(tmp)])
tf.print('mse loss:',tmp) # print when I put tf.function
return tmp
return loss
class newLayer(tf.keras.layers.Layer):
def __init__(self, rate=5e-2, **kwargs):
super(newLayer, self).__init__(**kwargs)
self.rate = rate
# #tf.function # to be commented for NN training
def call(self, inputs):
tmp = self.rate*K.mean(inputs*inputs, axis=-1)
tf.print('shape and rank output in regularizer',[K.shape(tmp), tf.rank(tmp)])
tf.print('regularizer loss:',tmp)
self.add_loss(tmp, inputs=True)
return inputs
tot_n = 10000
xx = np.random.rand(tot_n,1)
yy = np.pi*xx
train_size = int(0.9*tot_n)
xx_train = xx[:train_size]; xx_val = xx[train_size:]
yy_train = yy[:train_size]; yy_val = yy[train_size:]
reg_layer = newLayer()
input_layer = Input(shape=(1,)) # input
hidden = Dense(20, activation='relu', input_shape=(2,))(input_layer) # hidden layer
hidden = reg_layer(hidden)
output_layer = Dense(1, activation='linear')(hidden)
model = Model(inputs=[input_layer], outputs=[output_layer])
model.compile(optimizer='Adam', loss=rate_mse(), experimental_run_tf_function=False)
#model.compile(optimizer='Adam', loss=None, experimental_run_tf_function=False)
model.fit(xx_train, yy_train, epochs=100, batch_size = 100,
validation_data=(xx_val,yy_val), verbose=1)
#new_xx = np.random.rand(10,1); new_yy = np.pi*new_xx
#model.evaluate(new_xx,new_yy)
print(model.predict(np.array([[1]])))
2) I would also have a secondary question related to this code. I noticed that printing with tf.print inside the function rate_mse only works with tf.function. Similarly, the call method of newLayer is only taken into consideration if the same decorator is commented during training. Can someone explain why this is the case or reference me to a possible solution?
Thanks in advance to whoever can provide me help. I am currently using Tensorflow 2.2.0 and keras version is 2.3.0-tf.
I stuck with the same problem for a few days. "Standard" loss is going to be a scalar at the moment when we add it to the loss from add_loss. The only way how I get it working is to add one more axis while calculating mean. So we will get a scalar, and it will work.
tmp = self.rate*K.mean(inputs*inputs, axis=[0, -1])

How to use the input gradients as variables within a custom loss function in Keras?

I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18

how to enforce feature orthogonality in keras

I am new to Keras and Tensorflow. I want to add a penalty to my categorical cross entropy loss function based on some of the outputs in the network. Specifically, I decompose the outputs of a fully connected layer into 8 partitions want these outputs to be orthogonal. So I append the activations into a list and convert it into a stack by using Keras backend. Here is how I partitioned the activations:
for i in range(8):
x_sub = Lambda(lambda x: x[:,i*128:i*128+128])(x)
features.append(x_sub)
#convert batch of feature lists in to 8x(128*batch_size) keras tensor
outs.append(Lambda(lambda x: K.reshape(K.stack(x, axis=0), (8, -1)))(features))
net = Model(inputs=[net.input], outputs=outs)
And then I define the loss as follows:
def OrthLoss(features):
W = K.l2_normalize(features, axis=1)
diff = K.dot(W, K.transpose(W)) - K.eye(8)
return K.mean(diff)
However, this does not seem to converge. Is this a right way to accomplish this? I first tried to enforce the orthogonality as a regularizer on weights but as far as I understand Keras performs regularization on each layer separately and found no way to define a regularizer on multiple weights.

How to use the result of embedding with mask_zero=True in keras

In keras, I want to calculate the mean of nonzero embedding output.
I wonder what is the difference between mask_zero=True or False in Embedding Layer.
I tried the code below :
input_data = Input(shape=(5,), dtype='int32', name='input')
embedding_layer = Embedding(1000, 24, input_length=5,mask_zero=True,name='embedding')
out = word_embedding_layer(input_data)
def antirectifier(x):
x = K.mean(x, axis=1, keepdims=True)
return x
def antirectifier_output_shape(input_shape):
shape = list(input_shape)
return tuple(shape)
out = Lambda(antirectifier, output_shape=antirectifier_output_shape,name='lambda')(out)
But it seems that the result is the mean of all the elements, how can i just calculate the mean of all nonzero inputs?
From the function's doc :
If this is True then all subsequent layers in the model need to
support masking
Your lambda function doesn't support masking. For example Recurrent layers in Keras support masking. If you set mask_zero=True in your embeddings, then all the 0 indices that you feed to the embedding layer will be propagated as "masked" and the following layers that are able to understand the "masked" information will use them.
Basically, if you build a "mean" layer that grabs the mask and computes the average only for non-masked values, then you will get the desired results.
You can find here a way to build your lambda layers that support masking
I hope it helps.

how to implement Loss function of paper ''Semantic Image Inpainting with Deep Generative Models' in keras

I have trained GAN on celebA dataset. After that i separate G and D. Then i pick one image from celebA training dataset say yTrue and now i want to find the closest image to yTrue that G can generate say yPred. So the loss at output of G is ||yTrue - yPred||_2^{2} and i minimized it w.r.t generator input(latent variable from normal distribution). Below is code that is giving good results. Now the problem is i want to also add prior loss (log(1-D(G(z))) 1 in first line but i am not getting how to do it as D is not connected to G now and if i directly add k.mean(k.log(1-D.predict(G.output))) in first line it returns numpy array not tensor that is not allowed.
`loss = K.mean(K.square(yTrue - gf.output))
grad = K.gradients(loss,[gf.input])[0]
fn = K.function([gf.input], [grad])
generator_input = np.random.normal(0,1,[1,100])
for i in range(5000):
grad1 = fn([generator_input])
generator_input -= grads[0]*.01
recovered = gf.predict(generator_input)`
In keras, you get the final output to create loss functions. Then, you will have to train the full network to achieve that loss. (Train G+D joined as a single model).
In the loss function, you will have y_true and y_pred, and you use them to compare:
PS: if MSE is not taking the output of the discriminator, please detail your questoin better.
import keras.backend as K
def customLoss(yTrue,yPred):
mse = K.mean(K.square(yTrue-yPred)
prior = K.mean(K.log(1-yPred))
return mse + prior
Pass this function when compiling the model
discriminator.compile(loss=customLoss,optimizer=.....)

Resources