The following two models/compilations behave differently:
def custom_loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred)
optimizer = Adam(lr=5e-3)
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['accuracy'])
And:
optimizer = Adam(lr=5e-3)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=optimizer, metrics=['accuracy'])
What can be the reason?
If you implement a custom binary cross-entropy loss, you should also specify the right accuracy metric. This is because if you use Keras' binary cross-entropy, then Keras will automatically adjust which accuracy metric to use (between binary and categorical accuracy).
This doesn't happen if you use a custom loss, and then Keras will default to categorical accuracy, which is actually wrong, producing incorrect accuracy values. For example:
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['binary_accuracy'])
Related
I an trying implement both classification problem and the regression problem with Keras tuner. here is my code for the regression problem:
def build_model(hp):
model = keras.Sequential()
for i in range(hp.Int('num_layers', 2, 20)):
model.add(layers.Dense(units=hp.Int('units_' + str(i),
min_value=32,
max_value=512,
step=32),
activation='relu'))
if hp.Boolean("dropout"):
model.add(layers.Dropout(rate=0.5))
# Tune whether to use dropout.
model.add(layers.Dense(1, activation='linear'))
model.compile(
optimizer=keras.optimizers.Adam(
hp.Choice('learning_rate', [1e-4, 1e-3, 1e-5])),
loss='mean_absolute_error',
metrics=['mean_absolute_error'])
return model
tuner = RandomSearch(
build_model,
objective='val_mean_absolute_error',
max_trials=5,
executions_per_trial=2,
# overwrite=True,
directory='projects',
project_name='Air Quality Index')
In order to apply this code for a classification problem, which parameters(loss, objective, metrices etc.) have to be changed?
To use this code for a classification problem, you will have to change the loss function, the objective function and the activation function of your output layer. Depending on the number of classes, you will use different functions:
Number of classes
Two
More than two
Loss
binary_crossentropy
categorical_crossentropy
Tuner objective
val_binary_crossentropy
val_categorical_crossentropy
Last layer activation
sigmoid
softmax
I have a neural network with one hidden layer implemented in both Keras and scikit-learn for solving a regression problem. In scikit-learn I used the MLPregressor class with mostly default parameters and in Keras I have a hidden Dense layer with parameters set to the same defaults as scikit-learn (which uses Adam with same learning rate and epsilon and a batch_size of 200). When I train the networks the scikit-learn model has a loss value that is about half of keras and its accuracy (measured in mean absolute error) is also better. Shouldn't the loss values be similar if not identical and the accuracies also be similar? Has anyone experienced something similar and able to make the Keras model more accurate?
Scikit-learn model:
clf = MLPRegressor(hidden_layer_sizes=(1600,), max_iter=1000, verbose=True, learning_rate_init=.001)
Keras model:
inputs = keras.Input(shape=(cols,))
x = keras.layers.Dense(1600, activation='relu', kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(inputs)
outputs = keras.layers.Dense(1,kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(epsilon=1e-8, learning_rate=.001),loss="mse")
model.fit(x=X, y=y, epochs=1000, batch_size=200)
It is because the formula of mean squared loss(MSE) from scikit-learn is different from that of tensorflow.
From the source code of scikit-learn:
def squared_loss(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean() / 2
while MSE from tensorflow:
backend.mean(math_ops.squared_difference(y_pred, y_true), axis=-1)
As you can see the scikit-learn one is divided by 2, coherent with what you said:
the scikit-learn model has a loss value that is about half of keras
That implied the models from keras and scikit-learn actually achieved similar performance. That also implied learning rate 0.001 in scikit-learn is not equivalent to the same learning rate in tensorflow.
Also, another smaller but significant difference is the formula of L2 regularization.
From the source code of scikit-learn,
# Add L2 regularization term to loss
values = 0
for s in self.coefs_:
s = s.ravel()
values += np.dot(s, s)
loss += (0.5 * self.alpha) * values / n_samples
while that of tensorflow is loss = l2 * reduce_sum(square(x)).
Therefore, with the same l2 regularization parameter, tensorflow one has stronger regularization, which will result in poorer fit to the training data.
I have unbalanced training dataset, thats why I built custom weighted categorical cross entropy loss function. But the problem is my validation set is balanced one and I want to use the regular categorical cross entropy loss. So can I pass different loss function for validation set within Keras? I mean the wighted one for training and regular one for validation set?
def weighted_loss(y_pred, y_ture):
'
'
'
return loss
model.compile(loss= weighted_loss, metric='accuracy')
You can try the backend function K.in_train_phase(), which is used by the Dropout and BatchNormalization layers to implement different behaviors in training and validation.
def custom_loss(y_true, y_pred):
weighted_loss = ... # your implementation of weighted crossentropy loss
unweighted_loss = K.sparse_categorical_crossentropy(y_true, y_pred)
return K.in_train_phase(weighted_loss, unweighted_loss)
The first argument of K.in_train_phase() is the tensor used in training phase, and the second is the one used in test phase.
For example, if we set weighted_loss to 0 (just to verify the effect of K.in_train_phase() function):
def custom_loss(y_true, y_pred):
weighted_loss = 0 * K.sparse_categorical_crossentropy(y_true, y_pred)
unweighted_loss = K.sparse_categorical_crossentropy(y_true, y_pred)
return K.in_train_phase(weighted_loss, unweighted_loss)
model = Sequential([Dense(100, activation='relu', input_shape=(100,)), Dense(1000, activation='softmax')])
model.compile(optimizer='adam', loss=custom_loss)
model.outputs[0]._uses_learning_phase = True # required if no dropout or batch norm in the model
X = np.random.rand(1000, 100)
y = np.random.randint(1000, size=1000)
model.fit(X, y, validation_split=0.1)
Epoch 1/10
900/900 [==============================] - 1s 868us/step - loss: 0.0000e+00 - val_loss: 6.9438
As you can see, the loss in training phase is indeed the one multiplied by 0.
Note that if there's no dropout or batch norm in your model, you'll need to manually "turn on" the _uses_learning_phase boolean switch, otherwise K.in_train_phase() will have no effect by default.
The validation loss function is just a metric and actually not needed for training. It's there because it make sense to compare the metrics which your network is actually optimzing on.
So you can add any other loss function as metric during compilation and you'll see it during training.
Suppose, for example, a regression problem with five scalars as output, where each output has approximately the same range. In Keras, we can model this using a 5-output dense layer without activation function (vector regression):
output_layer = layers.Dense(5, activation=None)(previous_layer)
model = models.Model(input_layer, output_layer)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is the total loss (metric) simply the sum of the individual losses (metrics)? Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? In my experiments, I haven't observed any significant differences but want to make sure that I didn't miss anything fundamental.
output_layer_list = []
for _ in range(5):
output_layer_list.append(layers.Dense(1, activation=None)(previous_layer))
model = models.Model(input_layer, output_layer_list)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is there an easy way to attach weights to the outputs in the first solution similar to specifying loss_weights in case of multi-output models?
Those models are the same. To answer your questions let's look at the mse loss:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
Is the total loss (metric) simply the sum of the individual losses (metrics)? Yes, because the mse loss applies the K.mean function so you can argue it is the sum of all the elements in the output vector.
Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? Yes, because subtraction and squaring are done element wise in vector form, so scalar outputs will produce the same as a single vector output. And a multi-output model loss is the sum of losses of individual outputs.
Yes, both are equivalent. To replicate the loss_weights functionality with your first model, you can define your own custom loss function. Something along these lines:
import tensorflow as tf
weights = K.variable(value=np.array([[0.1, 0.1, 0.1, 0.1, 0.6]]))
def custom_loss(y_true, y_pred):
return tf.matmul(K.square(y_true - y_pred), tf.transpose(weights))
and pass this function to the loss argument upon compiling:
model.compile(optimizer='rmsprop', loss=custom_loss, metrics=['mse'])
I was to use a simple BiLSTM model with my own custom loss function in Keras.
See below.
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True), input_shape=(1,8)))
model.add(Bidirectional(LSTM(128)))
model.add(Dense(64, activation='relu'))
model.add(Dense(20, activation='softmax'))
def my_loss_np(y_true, y_pred):
labels = [np.argmax(y_pred[i]) for i in range(y_pred.shape[1])]
loss = np.mean(labels)
return loss
import keras.backend as K
def my_loss(y_true, y_pred):
loss = K.eval(my_loss_np(K.eval(y_true), K.eval(y_pred)))
return loss
When I compile this model, I get an error -
model.compile(loss=my_loss, optimizer='adam')
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dense_95_target' with dtype float and shape [?,?]
[[Node: dense_95_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
There are several issues here with your loss function:
You are using NumPy on tensors, unfortunately though it is an intuitive this doesn't work. You need to use tensor operators from the Keras backend, they are very similar.
To that end you are calling K.eval but at this stage you are still constructing a symbolic computation graph which will be run in TensorFlow or Theano. So the tensors don't have a value to compute per say, you need to keep it symbolic, you can get any values like you do in NumPy.
Even if you fix the problems above, you are using a non-differentiable operation argmax which will not work with gradient descent algorithms.
Your model looks like a multi-label classification problem, 20 classes as your final layer is 20 with softmax. In this case, the literature uses categorical-crossentropy loss to train the classifier network.