Keras Implementation of Customized Loss Function that need internal layer output as label - keras

in keras, I want to customize my loss function which not only takes (y_true, y_pred) as input but also need to use the output from the internal layer of the network as the label for an output layer.This picture shows the Network Layout
Here, the internal output is xn, which is a 1D feature vector. in the upper right corner, the output is xn', which is the prediction of xn. In other words, xn is the label for xn'.
While [Ax, Ay] is traditionally known as y_true, and [Ax',Ay'] is y_pred.
I want to combine these two loss components into one and train the network jointly.
Any ideas or thoughts are much appreciated!

I have figured out a way out, in case anyone is searching for the same, I posted here (based on the network given in this post):
The idea is to define the customized loss function and use it as the output of the network. (Notation: A is the true label of variable A, and A' is the predicted value of variable A)
def customized_loss(args):
#A is from the training data
#S is the internal state
A, A', S, S' = args
#customize your own loss components
loss1 = K.mean(K.square(A - A'), axis=-1)
loss2 = K.mean(K.square(S - S'), axis=-1)
#adjust the weight between loss components
return 0.5 * loss1 + 0.5 * loss2
def model():
#define other inputs
A = Input(...) # define input A
#construct your model
cnn_model = Sequential()
...
# get true internal state
S = cnn_model(prev_layer_output0)
# get predicted internal state output
S' = Dense(...)(prev_layer_output1)
# get predicted A output
A' = Dense(...)(prev_layer_output2)
# customized loss function
loss_out = Lambda(customized_loss, output_shape=(1,), name='joint_loss')([A, A', S, S'])
model = Model(input=[...], output=[loss_out])
return model
def train():
m = model()
opt = 'adam'
model.compile(loss={'joint_loss': lambda y_true, y_pred:y_pred}, optimizer = opt)
# train the model
....

First of all you should be using the Functional API. Then you should define the network output as the output plus the result from the internal layer, merge them into a single output (by concatenating), and then make a custom loss function that then splits the merged output into two parts and does the loss computations on its own.
Something like:
def customLoss(y_true, y_pred):
#loss here
internalLayer = Convolution2D()(inputs) #or other layers
internalModel = Model(input=inputs, output=internalLayer)
tmpOut = Dense(...)(internalModel)
mergedOut = merge([tmpOut, mergedOut], mode = "concat", axis = -1)
fullModel = Model(input=inputs, output=mergedOut)
fullModel.compile(loss = customLoss, optimizer = "whatever")

I have my reservations regarding this implementation. The loss computed at the merged layer is propagated back to both the merged branches. Generally you would like to propagate it through just one layer.

Related

How to add a custom loss function to Keras that solves an ODE?

I'm new to Keras, sorry if this is a silly question!
I am trying to get a single-layer neural network to find the solution to a first-order ODE. The neural network N(x) should be the approximate solution to the ODE. I defined the right-hand side function f, and a transformed function g that includes the boundary conditions. I then wrote a custom loss function that only minimises the residual of the approximate solution. I created some empty data for the optimizer to iterate over, and set it going. The optimizer does not seem to be able to adjust the weights to minimize this loss function. Am I thinking about this wrong?
# Define initial condition
A = 1.0
# Define empty training data
x_train = np.empty((10000, 1))
y_train = np.empty((10000, 1))
# Define transformed equation (forced to satisfy boundary conditions)
g = lambda x: N(x.reshape((1000,))) * x + A
# Define rhs function
f = lambda x: np.cos(2 * np.pi * x)
# Define loss function
def OdeLoss(g, f):
epsilon=sys.float_info.epsilon
def loss(y_true, y_pred):
x = np.linspace(0, 1, 1000)
R = K.sum(((g(x+epsilon)-g(x))/epsilon - f(x))**2)
return R
return loss
# Define input tensor
input_tensor = tf.keras.Input(shape=(1,))
# Define hidden layer
hidden = tf.keras.layers.Dense(32)(input_tensor)
# Define non-linear activation layer
activate = tf.keras.activations.relu(hidden)
# Define output tensor
output_tensor = tf.keras.layers.Dense(1)(activate)
# Define neural network
N = tf.keras.Model(input_tensor, output_tensor)
# Compile model
N.compile(loss=OdeLoss(g, f), optimizer='adam')
N.summary()
# Train model
history = N.fit(x_train, y_train, batch_size=1, epochs=1, verbose=1)
The method is based on Lecture 3.2 of MIT course 18.337J, by Chris Rackaukas, who does this in Julia. Cheers!

calculate Entropy for each class of the test set to measure uncertainty on pytorch

I am trying to calculate Entropy of each class of the dataset for an image classification task to measure uncertainty on pytorch,using the MC Dropout method and the solution proposed in this link
Measuring uncertainty using MC Dropout on pytorch
First,I have calculated the mean of each class per batch across different forward passes (class_mean_batch) and then for all the testloader (classes_mean) and then did some transformations to get (total_mean) to use it for calculating Entropy as shown in the code below
def mcdropout_test(batch_size,n_classes,model,T):
#set non-dropout layers to eval mode
model.eval()
#set dropout layers to train mode
enable_dropout(model)
softmax = nn.Softmax(dim=1)
classes_mean = []
for images,labels in testloader:
images = images.to(device)
labels = labels.to(device)
classes_mean_batch = []
with torch.no_grad():
output_list = []
#getting outputs for T forward passes
for i in range(T):
output = model(images)
output = softmax(output)
output_list.append(torch.unsqueeze(output, 0))
concat_output = torch.cat(output_list,0)
# getting mean of each class per batch across multiple MCD forward passes
for i in range (n_classes):
mean = torch.mean(concat_output[:, : , i])
classes_mean_batch.append(mean)
# getting mean of each class for the testloader
classes_mean.append(torch.stack(classes_mean_batch))
total_mean = []
concat_classes_mean = torch.stack(classes_mean)
for i in range (n_classes):
concat_classes = concat_classes_mean[: , i]
total_mean.append(concat_classes)
total_mean = torch.stack(total_mean)
total_mean = np.asarray(total_mean.cpu())
epsilon = sys.float_info.min
# Calculating entropy across multiple MCD forward passes
entropy = (- np.sum(total_mean*np.log(total_mean + epsilon), axis=-1)).tolist()
for i in range(n_classes):
print(f'The uncertainty of class {i+1} is {entropy[i]:.4f}')
Can anyone please correct or confirm the implementation i have used to calculate Entropy of each class.

gradients of custom loss

Suppose a model as in:
model = Model(inputs=[A, B], outputs=C)
With custom loss:
def actor_loss(y_true, y_pred):
log_lik = y_true * K.log(y_pred)
loss = -K.sum(log_lik * K.stop_gradient(B))
return loss
Now I'm trying to define a function that returns the gradients of the loss wrt to the weights for a given pair of input and target output and expose it as such.
Here is an idea of what I mean in pseudocode
def _get_grads(inputs, targets):
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
model.input[0] (aka 'A') <----inputs[0]
model.input[1] (aka 'B') <----inputs[1]
return K.function(model.input, grads)
self.get_grads = _get_grads
My question is how do I feed inputs argument to the graph inside said function.
(So far I've only worked with .fit and not with .gradients and I can't find any decent documentation with custom loss or multiple inputs)
If you call K.function, you get an actual callable function, so you should just call it with some parameter values. The format is exactly the same as model.fit, in your case it should be two arrays of values, including the batch dimension:
self.get_grads = _get_grads(inputs, targets)
grad_value = self.get_grads([input1, input2])
Where input1 and input2 are numpy arrays that include the batch dimension.
My understanding of K.function ,K.gradients and custom loss was fundamentally wrong. You use the function to construct a mini-graph that computes gradients of loss wrt to weights. No need for the function itself to have arguments.
def _get_grads():
targets = Input(shape=...)
loss = model.loss(targets, model.output)
weights = model.trainable_weights
grads = K.gradients(loss, weights)
return K.function(model.input + [targets], grads)
I was under the impression that _get_grads was itself K.function but that was wrong. _get_grads() returns K.function. And then you use that as
f = _get_grads() # constructs the mini-graph that gives gradients
grads = f([inputs, labels])
inputs is fed to model.inputs, labels to targets and it returns grads.

How to compute gradient of the error with respect to the model input?

Given a simple 2 layer neural network, the traditional idea is to compute the gradient w.r.t. the weights/model parameters. For an experiment, I want to compute the gradient of the error w.r.t the input. Are there existing Pytorch methods that can allow me to do this?
More concretely, consider the following neural network:
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self, n_features, n_hidden, n_classes, dropout):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(n_features, n_hidden)
self.sigmoid = nn.Sigmoid()
self.fc2 = nn.Linear(n_hidden, n_classes)
self.dropout = dropout
def forward(self, x):
x = self.sigmoid(self.fc1(x))
x = F.dropout(x, self.dropout, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
I instantiate the model and an optimizer for the weights as follows:
import torch.optim as optim
model = NeuralNet(n_features=args.n_features,
n_hidden=args.n_hidden,
n_classes=args.n_classes,
dropout=args.dropout)
optimizer_w = optim.SGD(model.parameters(), lr=0.001)
While training, I update the weights as usual. Now, given that I have values for the weights, I should be able to use them to compute the gradient w.r.t. the input. I am unable to figure out how.
def train(epoch):
t = time.time()
model.train()
optimizer.zero_grad()
output = model(features)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train])
loss_train.backward()
optimizer_w.step()
# grad_features = loss_train.backward() w.r.t to features
# features -= 0.001 * grad_features
for epoch in range(args.epochs):
train(epoch)
It is possible, just set input.requires_grad = True for each input batch you're feeding in, and then after loss.backward() you should see that input.grad holds the expected gradient. In other words, if your input to the model (which you call features in your code) is some M x N x ... tensor, features.grad will be a tensor of the same shape, where each element of grad holds the gradient with respect to the corresponding element of features. In my comments below, I use i as a generalized index - if your parameters has for instance 3 dimensions, replace it with features.grad[i, j, k], etc.
Regarding the error you're getting: PyTorch operations build a tree representing the mathematical operation they are describing, which is then used for differentiation. For instance c = a + b will create a tree where a and b are leaf nodes and c is not a leaf (since it results from other expressions). Your model is the expression, and its inputs as well as parameters are the leaves, whereas all intermediate and final outputs are not leaves. You can think of leaves as "constants" or "parameters" and of all other variables as of functions of those. This message tells you that you can only set requires_grad of leaf variables.
Your problem is that at the first iteration, features is random (or however else you initialize) and is therefore a valid leaf. After your first iteration, features is no longer a leaf, since it becomes an expression calculated based on the previous ones. In pseudocode, you have
f_1 = initial_value # valid leaf
f_2 = f_1 + your_grad_stuff # not a leaf: f_2 is a function of f_1
to deal with that you need to use detach, which breaks the links in the tree, and makes the autograd treat a tensor as if it was constant, no matter how it was created. In particular, no gradient calculations will be backpropagated through detach. So you need something like
features = features.detach() - 0.01 * features.grad
Note: perhaps you need to sprinkle a couple more detaches here and there, which is hard to say without seeing your whole code and knowing the exact purpose.

How to obtain gradients with respect to parameters when using Lambda layer as output

I am trying to implement a neural network with one hidden layer that can represent the solution to a PDE (let's say the Laplace equation). The objective function therefore depends on the gradient of the neural network w.r.t its input.
Now, I have implemented the calculation of the second derivatives using Lambda layers. However when I try to compute the gradient of the output with respect to the parameters of the model, I get an error.
def grad(y, x, nameit):
return Lambda(lambda z: K.gradients(z[0], z[1]), output_shape = [1], name = nameit)([y,x])
def network(i):
m = Dense(100, activation='sigmoid')(i)
j = Dense(1, name="networkout")(m)
return j
x1 = Input(shape=(1,))
a = network(x1)
b = grad(a, x1, "dudx1")
c = grad(b, x1, "dudx11")
model = Model(inputs = [x1], outputs=[c])
model.compile(optimizer='rmsprop',
loss='mean_squared_error',
metrics=['accuracy'])
x1_data = np.random.random((20, 1))
labels = np.zeros((20,1))
model.fit(x1_data,labels)
This is the error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Why can't Keras compute the gradients w.r.t the trainable parameters?
Problem is in networkout layer. It maintains a linear activation which prevents gradients from pass through it, and therefore, return 'None' gradients error. In this case you need to add any activation function except linear to networkout layer.
def network(i):
m = layers.Dense(100)(i)
j = layers.Dense(1, name="networkout", activation='relu')(m)
return j
However, previous layer can have a linear activation.

Resources