How to add a custom loss function to Keras that solves an ODE? - keras

I'm new to Keras, sorry if this is a silly question!
I am trying to get a single-layer neural network to find the solution to a first-order ODE. The neural network N(x) should be the approximate solution to the ODE. I defined the right-hand side function f, and a transformed function g that includes the boundary conditions. I then wrote a custom loss function that only minimises the residual of the approximate solution. I created some empty data for the optimizer to iterate over, and set it going. The optimizer does not seem to be able to adjust the weights to minimize this loss function. Am I thinking about this wrong?
# Define initial condition
A = 1.0
# Define empty training data
x_train = np.empty((10000, 1))
y_train = np.empty((10000, 1))
# Define transformed equation (forced to satisfy boundary conditions)
g = lambda x: N(x.reshape((1000,))) * x + A
# Define rhs function
f = lambda x: np.cos(2 * np.pi * x)
# Define loss function
def OdeLoss(g, f):
epsilon=sys.float_info.epsilon
def loss(y_true, y_pred):
x = np.linspace(0, 1, 1000)
R = K.sum(((g(x+epsilon)-g(x))/epsilon - f(x))**2)
return R
return loss
# Define input tensor
input_tensor = tf.keras.Input(shape=(1,))
# Define hidden layer
hidden = tf.keras.layers.Dense(32)(input_tensor)
# Define non-linear activation layer
activate = tf.keras.activations.relu(hidden)
# Define output tensor
output_tensor = tf.keras.layers.Dense(1)(activate)
# Define neural network
N = tf.keras.Model(input_tensor, output_tensor)
# Compile model
N.compile(loss=OdeLoss(g, f), optimizer='adam')
N.summary()
# Train model
history = N.fit(x_train, y_train, batch_size=1, epochs=1, verbose=1)
The method is based on Lecture 3.2 of MIT course 18.337J, by Chris Rackaukas, who does this in Julia. Cheers!

Related

Custom activation function dependant on other output nodes in Keras

I would like to predict a multi-dimensional array using Long Short-Term Memory (LSTM) networks while imposing restrictions on the shape of the surface of interest.
I thought to accomplish this by setting some elements of the output (regions of the surface) in a functional relationship to others (simple scaling conditions).
Is it possible to set such custom activation functions for the output, whose argument are other output nodes, in Keras?
If not, is there any other interface that allows this? Do you have any source to a manual?
The keras-team on the GitHub answered the question about how to make a custom activation function.
There also is a question with a code with a custom activation function.
These pages may help you!
Additional comment
These pages were not enough for this question so I add the comment below;
Maybe PyTorch is better for customization than Keras. I tried to write such a network, though it is a very simple one, based on PyTorch tutorials and "Extending PyTorch with Custom Activation Functions"
I made a custom activation function in which the 1-th(counting from 0) elements of the output vector are equal to twice the 0-th elements. A very simple network with one layer was used for the training. After training, I checked that the condition was satisfied.
import torch
import matplotlib.pyplot as plt
# Define the custom activation function
# reference: https://towardsdatascience.com/extending-pytorch-with-custom-activation-functions-2d8b065ef2fa
def silu(input):
input[:,1] = input[:,0] * 2
return input
class SiLU(torch.nn.Module):
def __init__(self):
super().__init__() # init the base class
def forward(self, input):
return silu(input) # simply apply already implemented SiLU
# Training
# reference: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
k = 10
x = torch.rand([k,3])
y = x * 2
model = torch.nn.Sequential(
torch.nn.Linear(3, 3),
SiLU() # custom activation function
)
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-3
for t in range(2000):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if t % 100 == 99:
print(t, loss.item())
model.zero_grad()
loss.backward()
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
# check the behaviour
yy = model(x) # predicted
print('ground truth')
print(y)
print('predicted')
print(yy)
# examples for the first five data
colorlist = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00']
plt.figure()
for i in range(5):
plt.plot(y[i,:].detach().numpy(), linestyle = "solid", label = "ground truth_" + str(i), color=colorlist[i])
plt.plot(yy[i,:].detach().numpy(), linestyle = "dotted", label = "predicted_" + str(i), color=colorlist[i])
plt.legend()
# check if the custom activation works correctly
plt.figure()
plt.plot(yy[:,0].detach().numpy()*2, label = '0th * 2')
plt.plot(yy[:,1].detach().numpy(), label = '1th')
plt.legend()
print(yy[:,0]*2)
print(yy[:,1])

Regression Model with 3 Hidden DenseVariational Layers in Tensorflow-Probability returns nan as loss during training

I am getting acquainted with Tensorflow-Probability and here I am running into a problem. During training, the model returns nan as the loss (possibly meaning a huge loss that causes overflowing). Since the functional form of the synthetic data is not overly complicated and the ratio of data points to parameters is not frightening at first glance at least I wonder what is the problem and how it could be corrected.
The code is the following --accompanied by some possibly helpful images:
# Create and plot 5000 data points
x_train = np.linspace(-1, 2, 5000)[:, np.newaxis]
y_train = np.power(x_train, 3) + 0.1*(2+x_train)*np.random.randn(5000)[:, np.newaxis]
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
# Define the prior weight distribution -- all N(0, 1) -- and not trainable
def prior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n) , scale_diag = tf.ones(n)
))
])
return(prior_model)
# Define variational posterior weight distribution -- multivariate Gaussian
def posterior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n) , dtype = dtype), # The parameters of the model are declared Variables that are trainable
tfpl.MultivariateNormalTriL(n) # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
# as the parameters of the Variational Dense Layer. That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
# by a mu and sigma learned from the data, independently of all the other weights. The output of this Variablelayer will become the input to the
# MultivariateNormalTriL object.
# The shape of the VariableLayer object will be defined by the number of paramaters needed to create the MultivariateNormalTriL object given
# that it will live in a Space of n dimensions (event_size = n). This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
])
return(posterior_model)
x_in = Input(shape = (1,))
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x_in)
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x)
x = tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
y_out = tfpl.IndependentNormal(1)(x)
model = Model(inputs = x_in, outputs = y_out)
def nll(y_true, y_pred):
return -y_pred.log_prob(y_true)
model.compile(loss=nll, optimizer= 'Adam')
model.summary()
Train the model
history = model.fit(x_train1, y_train1, epochs=500)
The problem seems to be in the loss function: negative log-likelihood of the independent normal distribution without any specified location and scale leads to the untamed variance which leads to the blowing up the final loss value. Since you're experimenting with the variational layers, you must be interested in the estimation of the epistemic uncertainty, to that end, I'd recommend to apply the constant variance.
I tried to make a couple of slight changes to your code within the following lines:
first of all, the final output y_out comes directly from the final variational layer without any IndpendnetNormal distribution layer:
y_out = tfpl.DenseVariational(units=1,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
second, the loss function now contains the necessary calculations with the normal distribution you need but with the static variance in order to avoid the blowing up of the loss during training:
def nll(y_true, y_pred):
dist = tfp.distributions.Normal(loc=y_pred, scale=1.0)
return tf.reduce_sum(-dist.log_prob(y_true))
then the model is compiled and trained in the same way as before:
model.compile(loss=nll, optimizer= 'Adam')
history = model.fit(x_train, y_train, epochs=3000)
and finally let's sample 100 different predictions from the trained model and plot these values to visualize the epistemic uncertainty of the model:
predicted = [model(x_train) for _ in range(100)]
for i, res in enumerate(predicted):
plt.plot(x_train, res , alpha=0.1)
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
After 3000 epochs the result looks like this (with the reduced number of training points to 3000 instead of 5000 to speed-up the training):
The model has 38,589 trainable parameters but you have only 5,000 points as data; so, effective training is impossible with so many parameters.

How to use the input gradients as variables within a custom loss function in Keras?

I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18

How to obtain gradients with respect to parameters when using Lambda layer as output

I am trying to implement a neural network with one hidden layer that can represent the solution to a PDE (let's say the Laplace equation). The objective function therefore depends on the gradient of the neural network w.r.t its input.
Now, I have implemented the calculation of the second derivatives using Lambda layers. However when I try to compute the gradient of the output with respect to the parameters of the model, I get an error.
def grad(y, x, nameit):
return Lambda(lambda z: K.gradients(z[0], z[1]), output_shape = [1], name = nameit)([y,x])
def network(i):
m = Dense(100, activation='sigmoid')(i)
j = Dense(1, name="networkout")(m)
return j
x1 = Input(shape=(1,))
a = network(x1)
b = grad(a, x1, "dudx1")
c = grad(b, x1, "dudx11")
model = Model(inputs = [x1], outputs=[c])
model.compile(optimizer='rmsprop',
loss='mean_squared_error',
metrics=['accuracy'])
x1_data = np.random.random((20, 1))
labels = np.zeros((20,1))
model.fit(x1_data,labels)
This is the error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Why can't Keras compute the gradients w.r.t the trainable parameters?
Problem is in networkout layer. It maintains a linear activation which prevents gradients from pass through it, and therefore, return 'None' gradients error. In this case you need to add any activation function except linear to networkout layer.
def network(i):
m = layers.Dense(100)(i)
j = layers.Dense(1, name="networkout", activation='relu')(m)
return j
However, previous layer can have a linear activation.

Keras Implementation of Customized Loss Function that need internal layer output as label

in keras, I want to customize my loss function which not only takes (y_true, y_pred) as input but also need to use the output from the internal layer of the network as the label for an output layer.This picture shows the Network Layout
Here, the internal output is xn, which is a 1D feature vector. in the upper right corner, the output is xn', which is the prediction of xn. In other words, xn is the label for xn'.
While [Ax, Ay] is traditionally known as y_true, and [Ax',Ay'] is y_pred.
I want to combine these two loss components into one and train the network jointly.
Any ideas or thoughts are much appreciated!
I have figured out a way out, in case anyone is searching for the same, I posted here (based on the network given in this post):
The idea is to define the customized loss function and use it as the output of the network. (Notation: A is the true label of variable A, and A' is the predicted value of variable A)
def customized_loss(args):
#A is from the training data
#S is the internal state
A, A', S, S' = args
#customize your own loss components
loss1 = K.mean(K.square(A - A'), axis=-1)
loss2 = K.mean(K.square(S - S'), axis=-1)
#adjust the weight between loss components
return 0.5 * loss1 + 0.5 * loss2
def model():
#define other inputs
A = Input(...) # define input A
#construct your model
cnn_model = Sequential()
...
# get true internal state
S = cnn_model(prev_layer_output0)
# get predicted internal state output
S' = Dense(...)(prev_layer_output1)
# get predicted A output
A' = Dense(...)(prev_layer_output2)
# customized loss function
loss_out = Lambda(customized_loss, output_shape=(1,), name='joint_loss')([A, A', S, S'])
model = Model(input=[...], output=[loss_out])
return model
def train():
m = model()
opt = 'adam'
model.compile(loss={'joint_loss': lambda y_true, y_pred:y_pred}, optimizer = opt)
# train the model
....
First of all you should be using the Functional API. Then you should define the network output as the output plus the result from the internal layer, merge them into a single output (by concatenating), and then make a custom loss function that then splits the merged output into two parts and does the loss computations on its own.
Something like:
def customLoss(y_true, y_pred):
#loss here
internalLayer = Convolution2D()(inputs) #or other layers
internalModel = Model(input=inputs, output=internalLayer)
tmpOut = Dense(...)(internalModel)
mergedOut = merge([tmpOut, mergedOut], mode = "concat", axis = -1)
fullModel = Model(input=inputs, output=mergedOut)
fullModel.compile(loss = customLoss, optimizer = "whatever")
I have my reservations regarding this implementation. The loss computed at the merged layer is propagated back to both the merged branches. Generally you would like to propagate it through just one layer.

Resources