I have a 2 branch network where one branch outputs regression value and another branch outputs classification label.
model = Model(inputs=inputs, outputs=[output1, output2])
model.compile(loss=[my_loss_reg, my_loss_class], optimizer='adam')
I want to implement a custom loss function (my_loss_reg()) for the regression branch such that at the regression end I want to add a fraction of the classification loss as follows,
def my_loss_reg(y_true, y_pred):
loss_mse=K.mean(K.sum(K.square(y_true-y_pred)))
#loss_reg = calculate_classification_loss() # How to implement this?
final_loss = some_function(loss_mse, loss_reg) # Can calculate only if loss_reg is available
return final_loss
The y_true and y_pred are true and predicted regression values at the regression branch. To calculate the classifcation loss I need the true and predicted classifcation labels, which is not available in my_loss_reg().
My question is how to calculate or access the classifcation loss at the regression end of the network? Similarly, I want to get the regression loss at the classification end while calulating the custom loss function my_loss_class() for the classification.
How can I do that? Any code snippets will be helpful. I found this solution but this is no longer valid with the latest version of Tensorflow and Keras.
All you need is simply available in native keras
you can automatically combine multiple losses using loss_weights parameter
In the example below I tried to reproduce a task where I combined an mse loss for the regression and a sparse_categorical_crossentropy for the classification task
features,n_sample,n_class = 10, 200, 3
X = np.random.uniform(0,1, (n_sample,features))
y = np.random.randint(0,n_class, n_sample)
inp = Input(shape=(features,))
x = Dense(64, activation='relu')(inp)
hidden = Dense(16, activation='relu')(x)
x = Dense(64, activation='relu')(hidden)
out_reg = Dense(features, name='out_reg')(x) # output regression
x = Dense(32, activation='relu')(hidden)
out_class = Dense(n_class, activation='softmax', name='out_class')(x) # output classification
model = Model(inp, [out_reg,out_class])
model.compile(optimizer='adam',
loss = {'out_reg':'mse', 'out_class':'sparse_categorical_crossentropy'},
loss_weights = {'out_reg':1., 'out_class':0.5})
model.fit(X, [X,y], epochs=10)
In this specific case, the loss is the result of 1*out_reg + 0.5*out_class
if you want to put your custom losses you simply have to do in this way
def my_loss_reg(y_true, y_pred):
return ...
def my_loss_class(y_true, y_pred):
return ...
model.compile(optimizer='adam',
loss = {'out_reg':my_loss_reg, 'out_class':my_loss_class},
loss_weights = {'out_reg':1., 'out_class':0.5})
model.fit(X, [X,y], epochs=10)
Related
I am trying to use weighted mean squared error loss function for the regression task with imbalanced dataset. Basically, I have different weight assigned to each example and I am using the weighted MSE loss function. Is there a way to sample the weight tensor using TensorDataset along with input and output batch samples?
def weighted_mse_loss(inputs, targets, weights=None):
loss = (inputs - targets) ** 2
if weights is not None:
loss *= weights.expand_as(loss)
loss = torch.mean(loss)
return loss
train_dataset = torch.utils.data.TensorDataset(x_train, y_train)
weights = torch.rand(len(train_dataset))
for x, y in train_loader:
optimizer.zero_grad()
out = model(x)
loss = weighted_mse_loss(y, out, weights)
loss.backward()
If you can get the weights before creating the train dataset:
train_dataset = TensorDataset(x_train, y_train, weights)
for x, y, w in train_dataset:
...
Otherwise:
train_dataset = TensorDataset(x_train, y_train)
for (x, y), w in zip(train_dataset, weights):
...
You can also use a DataLoader but be carefull about shuffling with the second method
I have a trained Tensorflow 2.0 model (from tf.keras.Sequential()) that takes an input layer with 26 columns (X) and produces an output layer with 1 column (Y).
In TF 1.x I was able to calculate the gradient of the output with respect to the input with the following:
model = load_model('mymodel.h5')
sess = K.get_session()
grad_func = tf.gradients(model.output, model.input)
gradients = sess.run(grad_func, feed_dict={model.input: X})[0]
In TF2 when I try to run tf.gradients(), I get the error:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
In the question In TensorFlow 2.0 with eager-execution, how to compute the gradients of a network output wrt a specific layer?, we see an answer on how to calculate gradients with respect to intermediate layers, but I don't see how to apply this to gradients with respect to the inputs. On the Tensorflow help for tf.GradientTape, there are examples with calculating gradients for simple functions, but not neural networks.
How can tf.GradientTape be used to calculate the gradient of the output with respect to the input?
This should work in TF2:
inp = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
with tf.GradientTape() as tape:
preds = model(inp)
grads = tape.gradient(preds, inp)
Basically you do it the same way as TF1, but using GradientTape.
I hope this is what you're looking for. This will give the gradients of the output w.r.t. the inputs.
# Whatever the input you like goes in as the initial_value
x = tf.Variable(np.random.normal(size=(25, 120)), dtype=tf.float32)
y_true = np.random.choice([0,1], size=(25,10))
print(model.output)
print(model.predict(x))
with tf.GradientTape() as tape:
pred = model.predict(x)
grads = tape.gradients(pred, x)
In the above case, we should use tape.watch()
for (x, y) in test_dataset:
with tf.GradientTape() as tape:
tape.watch(x)
pred = model(x)
grads = tape.gradient(pred, x)
but the grads will just have the grads of the inputs
The following method is better, you can use model to predict the prediction results and compute the loss, then use the loss to calculate the grads of all trainable variables
with tf.GradientTape() as tape:
predictions = model(x, training=True)
loss = loss_function(y, predictions)
grads = tape.gradient(loss, model.trainable_variables)
I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18
I am trying to create the custom loss function using Keras. I want to compute the loss function based on the input and predicted the output of the neural network.
I tried using the customloss function in Keras. I think y_true is the output that we give for training and y_pred is the predicted output of the neural network. The below loss function is same as "mean_squared_error" loss in Keras.
def customloss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
I would like to use the input to the neural network also to compute the custom loss function in addition to mean_squared_error loss. Is there a way to send an input to the neural network as an argument to the customloss function.
Thank you.
I have come across 2 solutions to the question you asked.
You can pass your input (scalar only) as an argument to the custom loss wrapper function.
def custom_loss(i):
def loss(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1) + something with i...
return loss
def baseline_model():
# create model
i = Input(shape=(5,))
x = Dense(5, kernel_initializer='glorot_uniform', activation='linear')(i)
o = Dense(1, kernel_initializer='normal', activation='linear')(x)
model = Model(i, o)
model.compile(loss=custom_loss(i), optimizer=Adam(lr=0.0005))
return model
This solution is also mentioned in the accepted answer here
You can pad your label with extra data columns from input and write a custom loss. This is helpful if you just want one/few feature column(s) from your input.
def custom_loss(data, y_pred):
y_true = data[:, 0]
i = data[:, 1]
return K.mean(K.square(y_pred - y_true), axis=-1) + something with i...
def baseline_model():
# create model
i = Input(shape=(5,))
x = Dense(5, kernel_initializer='glorot_uniform', activation='linear')(i)
o = Dense(1, kernel_initializer='normal', activation='linear')(x)
model = Model(i, o)
model.compile(loss=custom_loss, optimizer=Adam(lr=0.0005))
return model
model.fit(X, np.append(Y_true, X[:, 0], axis =1), batch_size = batch_size, epochs=90, shuffle=True, verbose=1)
This solution can be found also here in this thread.
I have only used the 2nd method when I had to use input feature columns in the loss. The first method can be only used with scalar arguments as mentioned in the comments.
You could wrap your custom loss with another function that takes the input tensor as an argument:
def customloss(x):
def loss(y_true, y_pred):
# Use x here as you wish
err = K.mean(K.square(y_pred - y_true), axis=-1)
return err
return loss
And then compile your model as follows:
model.compile('sgd', customloss(x))
where x is your input tensor.
NOTE: Not tested.
The goal is to predict a timeseries Y of 87601 timesteps (10 years) and 9 targets. The input features X (exogenous input) are 11 timeseries of 87600 timesteps. The output has one more timestep, as this is the initial value.
The output Yt at timestep t depends on the input Xt and on the previous output Yt-1.
Hence, the model should look like this: Model layout
I could only find this thread on this: LSTM: How to feed the output back to the input? #4068.
I tried to implemented this with Keras as follows:
def build_model():
# Input layers
input_x = layers.Input(shape=(features,), name='input_x')
input_y = layers.Input(shape=(targets,), name='input_y-1')
# Merge two inputs
merge = layers.concatenate([input_x,input_y], name='merge')
# Normalise input
norm = layers.Lambda(normalise, name='scale')(merge)
# Hidden layers
x = layers.Dense(128, input_shape=(features,))(norm)
# Output layer
output = layers.Dense(targets, activation='relu', name='output')(x)
model = Model(inputs=[input_x,input_y], outputs=output)
model.compile(loss='mean_squared_error', optimizer=Adam())
return model
def make_prediction(model, X, y):
y_pred = [y[0,None,:]]
for i in range(len(X)):
y_pred.append(model.predict([X[i,None,:],y_pred[i]]))
y_pred = np.asarray(y_pred)
y_pred = y_pred.reshape(y_pred.shape[0],y_pred.shape[2])
return y_pred
# Fit
model = build_model()
model.fit([X_train, y_train[:-1]], [y_train[1:]]], epochs=200,
batch_size=24, shuffle=False)
# Predict
y_hat = make_prediction(model, X_train, y_train)
This works, but is it not what I want to achieve, as there is no connection between input and output. Hence, the model doesn't learn how to correct for an error in the fed-back output, which results in poor accuracy when predicting as the error on the output is accumulated at every timestep.
Is there a way in Keras to implement the output-input feed-back during training stage?
Also, as the initial value of Y is always known, I want to feed this to the network as well.