Can I use keras.losses.binary_crossentropy(y_true,y_pred) without training process? - keras

I am new to Keras. I want to know the loss of certain instances. So I got the y_true and y_pred of these data instances. I want to call the loss function to calculate the loss but only get Tensor("Mean_5:0",shape=(),dtype=float32). How can I evaluate the value of the tensor? Is it similar to tensorflow by calling los.eval()?
y_pred is calcualted by:
y_pred = self.model.predict(x, batch_size=self.batch_size)
y_true is also an available list.
How to use binary_crossentropy()?

You almost had the answer.
from keras import backend
from keras.losses import binary_crossentropy
y_true = backend.variable(y_true)
y_pred = backend.variable(y_pred)
# calculate the average cross-entropy
mean_ce = backend.eval(binary_crossentropy(y_true, y_pred))
print('Average Cross Entropy: %.3f nats' % mean_ce)

Related

Different loss values and accuracies of MLP regressor in keras and scikit-learn

I have a neural network with one hidden layer implemented in both Keras and scikit-learn for solving a regression problem. In scikit-learn I used the MLPregressor class with mostly default parameters and in Keras I have a hidden Dense layer with parameters set to the same defaults as scikit-learn (which uses Adam with same learning rate and epsilon and a batch_size of 200). When I train the networks the scikit-learn model has a loss value that is about half of keras and its accuracy (measured in mean absolute error) is also better. Shouldn't the loss values be similar if not identical and the accuracies also be similar? Has anyone experienced something similar and able to make the Keras model more accurate?
Scikit-learn model:
clf = MLPRegressor(hidden_layer_sizes=(1600,), max_iter=1000, verbose=True, learning_rate_init=.001)
Keras model:
inputs = keras.Input(shape=(cols,))
x = keras.layers.Dense(1600, activation='relu', kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(inputs)
outputs = keras.layers.Dense(1,kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(epsilon=1e-8, learning_rate=.001),loss="mse")
model.fit(x=X, y=y, epochs=1000, batch_size=200)
It is because the formula of mean squared loss(MSE) from scikit-learn is different from that of tensorflow.
From the source code of scikit-learn:
def squared_loss(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean() / 2
while MSE from tensorflow:
backend.mean(math_ops.squared_difference(y_pred, y_true), axis=-1)
As you can see the scikit-learn one is divided by 2, coherent with what you said:
the scikit-learn model has a loss value that is about half of keras
That implied the models from keras and scikit-learn actually achieved similar performance. That also implied learning rate 0.001 in scikit-learn is not equivalent to the same learning rate in tensorflow.
Also, another smaller but significant difference is the formula of L2 regularization.
From the source code of scikit-learn,
# Add L2 regularization term to loss
values = 0
for s in self.coefs_:
s = s.ravel()
values += np.dot(s, s)
loss += (0.5 * self.alpha) * values / n_samples
while that of tensorflow is loss = l2 * reduce_sum(square(x)).
Therefore, with the same l2 regularization parameter, tensorflow one has stronger regularization, which will result in poorer fit to the training data.

Pytorch Categorical Cross Entropy loss function behaviour

I have question regarding the computation made by the Categorical Cross Entropy Loss from Pytorch.
I have made this easy code snippet and because I use the argmax of the output tensor as the targets, I cannot understand why the loss is still high.
import torch
import torch.nn as nn
ce_loss = nn.CrossEntropyLoss()
output = torch.randn(3, 5, requires_grad=True)
targets = torch.argmax(output, dim=1)
loss = ce_loss(outputs, targets)
print(loss)
Thanks for the help understanding it.
Best regards
Jerome
So here is a sample data from your code with the output, label and loss having the following values
outputs = tensor([[ 0.5968, -0.8249, 1.5018, 2.7888, -0.6125],
[-1.1534, -0.4921, 1.0688, 0.2241, -0.0257],
[ 0.3747, 0.8957, 0.0816, 0.0745, 0.2695]], requires_grad=True)requires_grad=True)
labels = tensor([3, 2, 1])
loss = tensor(0.7354, grad_fn=<NllLossBackward>)
So let's examine the values,
If you compute the softmax output of your logits (outputs), using something like this torch.softmax(outputs,axis=1) you will get
probs = tensor([[0.0771, 0.0186, 0.1907, 0.6906, 0.0230],
[0.0520, 0.1008, 0.4801, 0.2063, 0.1607],
[0.1972, 0.3321, 0.1471, 0.1461, 0.1775]], grad_fn=<SoftmaxBackward>)
So these will be your prediction probabilities.
Now cross-entropy loss is nothing but a combination of softmax and negative log likelihood loss. Hence, your loss can simply be computed using
loss = (torch.log(1/probs[0,3]) + torch.log(1/probs[1,2]) + torch.log(1/probs[2,1])) / 3
, which is the average of the negative log of the probabilities of your true labels. The above equation evaluates to 0.7354, which is equivalent to the value returned from the nn.CrossEntropyLoss module.

keras: how to add weights to loss evaluation

Todo :
I would like to add a weight for each pattern loss in a given Keras loss function.
For example: if the error on pattern i is l_i, I would like to consider, instead, an error l_i * c_i, where c_i is an input scalar.
def customloss(y_true, y_pred):
c_i = ...
loss = ...(only use tensor operations on y_true and y_pred or use built in keras losses)
return c_i*loss
Now compile your model passing the loss function.
model.compile(loss = customloss)

Vector regression with Keras

Suppose, for example, a regression problem with five scalars as output, where each output has approximately the same range. In Keras, we can model this using a 5-output dense layer without activation function (vector regression):
output_layer = layers.Dense(5, activation=None)(previous_layer)
model = models.Model(input_layer, output_layer)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is the total loss (metric) simply the sum of the individual losses (metrics)? Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? In my experiments, I haven't observed any significant differences but want to make sure that I didn't miss anything fundamental.
output_layer_list = []
for _ in range(5):
output_layer_list.append(layers.Dense(1, activation=None)(previous_layer))
model = models.Model(input_layer, output_layer_list)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is there an easy way to attach weights to the outputs in the first solution similar to specifying loss_weights in case of multi-output models?
Those models are the same. To answer your questions let's look at the mse loss:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
Is the total loss (metric) simply the sum of the individual losses (metrics)? Yes, because the mse loss applies the K.mean function so you can argue it is the sum of all the elements in the output vector.
Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? Yes, because subtraction and squaring are done element wise in vector form, so scalar outputs will produce the same as a single vector output. And a multi-output model loss is the sum of losses of individual outputs.
Yes, both are equivalent. To replicate the loss_weights functionality with your first model, you can define your own custom loss function. Something along these lines:
import tensorflow as tf
weights = K.variable(value=np.array([[0.1, 0.1, 0.1, 0.1, 0.6]]))
def custom_loss(y_true, y_pred):
return tf.matmul(K.square(y_true - y_pred), tf.transpose(weights))
and pass this function to the loss argument upon compiling:
model.compile(optimizer='rmsprop', loss=custom_loss, metrics=['mse'])

Scikit-learn MLPRegressor - How to get results independent from random seed?

I try to learn some a sine function by using a MLP. Unfortunately, the results drastically depend on the random seed.
How could I adjust the MLPRegressor, so that the results get less dependent on the random seed?
Code:
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.neural_network import MLPRegressor
LOOK_BACK = 10
x = np.linspace(-10,10,1000)
y = np.sin(x)
dataX,dataY = [],[]
for i in range(len(y)-LOOK_BACK-1):
dataX.append(y[i:(i+LOOK_BACK)])
dataY.append(y[i+LOOK_BACK])
x_train = np.array(dataX)
y_train = np.array(dataY)
for i in range(10):
print "np.random.seed(%d)"%(i)
np.random.seed(i)
model = MLPRegressor(activation='tanh',solver='adam')
model.fit(x_train,y_train)
train_predict = model.predict(x_train)
print 'MSE train:', mean_squared_error(train_predict,y_train)
Output:
np.random.seed(0)
MSE train: 0.00167560534562
np.random.seed(1)
MSE train: 0.0050531872206
np.random.seed(2)
MSE train: 0.00279393534973
np.random.seed(3)
MSE train: 0.00224293537385
np.random.seed(4)
MSE train: 0.00154350859516
np.random.seed(5)
MSE train: 0.00383997358155
np.random.seed(6)
MSE train: 0.0265389606087
np.random.seed(7)
MSE train: 0.00195637404624
np.random.seed(8)
MSE train: 0.000590823529864
np.random.seed(9)
MSE train: 0.00393172460516
The seeds 6,9 and 8 produce different orders of the MSE. How could I prevent this?
Multi layer perceptron as well as other neural network architectures suffer from the fact that their corresponding loss functions have numerous local optima. Thus all gradient algorithms are heavily dependent on what initialization is chosen. And rather than seeing this as undesirable you can view the initialization (determined through random_state) as an additional hyperparameter that gives you flexibility.
Just for the record, the differences in your MSE are not that big and if your goal is to perfectly overfit then change the regularization parameter alpha to zero (the default value is alpha=0.0001)

Resources