I was to use a simple BiLSTM model with my own custom loss function in Keras.
See below.
model = Sequential()
model.add(Bidirectional(LSTM(128, return_sequences=True), input_shape=(1,8)))
model.add(Bidirectional(LSTM(128)))
model.add(Dense(64, activation='relu'))
model.add(Dense(20, activation='softmax'))
def my_loss_np(y_true, y_pred):
labels = [np.argmax(y_pred[i]) for i in range(y_pred.shape[1])]
loss = np.mean(labels)
return loss
import keras.backend as K
def my_loss(y_true, y_pred):
loss = K.eval(my_loss_np(K.eval(y_true), K.eval(y_pred)))
return loss
When I compile this model, I get an error -
model.compile(loss=my_loss, optimizer='adam')
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'dense_95_target' with dtype float and shape [?,?]
[[Node: dense_95_target = Placeholder[dtype=DT_FLOAT, shape=[?,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
There are several issues here with your loss function:
You are using NumPy on tensors, unfortunately though it is an intuitive this doesn't work. You need to use tensor operators from the Keras backend, they are very similar.
To that end you are calling K.eval but at this stage you are still constructing a symbolic computation graph which will be run in TensorFlow or Theano. So the tensors don't have a value to compute per say, you need to keep it symbolic, you can get any values like you do in NumPy.
Even if you fix the problems above, you are using a non-differentiable operation argmax which will not work with gradient descent algorithms.
Your model looks like a multi-label classification problem, 20 classes as your final layer is 20 with softmax. In this case, the literature uses categorical-crossentropy loss to train the classifier network.
Related
I have a neural network with one hidden layer implemented in both Keras and scikit-learn for solving a regression problem. In scikit-learn I used the MLPregressor class with mostly default parameters and in Keras I have a hidden Dense layer with parameters set to the same defaults as scikit-learn (which uses Adam with same learning rate and epsilon and a batch_size of 200). When I train the networks the scikit-learn model has a loss value that is about half of keras and its accuracy (measured in mean absolute error) is also better. Shouldn't the loss values be similar if not identical and the accuracies also be similar? Has anyone experienced something similar and able to make the Keras model more accurate?
Scikit-learn model:
clf = MLPRegressor(hidden_layer_sizes=(1600,), max_iter=1000, verbose=True, learning_rate_init=.001)
Keras model:
inputs = keras.Input(shape=(cols,))
x = keras.layers.Dense(1600, activation='relu', kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(inputs)
outputs = keras.layers.Dense(1,kernel_initializer="glorot_uniform", bias_initializer="glorot_uniform", kernel_regularizer=keras.regularizers.L2(.0001))(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=keras.optimizers.Adam(epsilon=1e-8, learning_rate=.001),loss="mse")
model.fit(x=X, y=y, epochs=1000, batch_size=200)
It is because the formula of mean squared loss(MSE) from scikit-learn is different from that of tensorflow.
From the source code of scikit-learn:
def squared_loss(y_true, y_pred):
return ((y_true - y_pred) ** 2).mean() / 2
while MSE from tensorflow:
backend.mean(math_ops.squared_difference(y_pred, y_true), axis=-1)
As you can see the scikit-learn one is divided by 2, coherent with what you said:
the scikit-learn model has a loss value that is about half of keras
That implied the models from keras and scikit-learn actually achieved similar performance. That also implied learning rate 0.001 in scikit-learn is not equivalent to the same learning rate in tensorflow.
Also, another smaller but significant difference is the formula of L2 regularization.
From the source code of scikit-learn,
# Add L2 regularization term to loss
values = 0
for s in self.coefs_:
s = s.ravel()
values += np.dot(s, s)
loss += (0.5 * self.alpha) * values / n_samples
while that of tensorflow is loss = l2 * reduce_sum(square(x)).
Therefore, with the same l2 regularization parameter, tensorflow one has stronger regularization, which will result in poorer fit to the training data.
I am using a really simple neural network with the latest version of tensorflow 2.0, on a jupyter notebook running python 3.7.0.
The NN has Xip, a float as output, which I use as a parameter in my function MainGaussian_1_U, which approximates an image. When I try to compute the loss using MeanSquareError between the real image img and the approximation mk, I am given an error in which the loss function seems to take img as a reduction key. After searches, I still have no idea what this key is supposed to be, and can't find a way to debug my code:
model = tf.keras.models.Sequential()
# Add the layers
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(32, activation="relu"))
model.add(tf.keras.layers.Dense(1, activation="relu"))
# The loss method
loss_object = tf.keras.losses.MeanSquaredError()
# The optimize
optimizer = tf.keras.optimizers.Adam()
# This metrics is used to track the progress of the training loss during the training
train_loss = tf.keras.metrics.Mean(name='train_loss')
def train_step(Data, img):
MainGaussian_init(Data)
for _ in range (5):
with tf.GradientTape() as tape:
Xip= model( (sizeh**-2 * np.ones((sizeh, sizeh))).reshape(-1, 49))
MainGaussian_1_U ()
print ("img=", img)
loss= tf.keras.losses.MeanSquaredError(img, mk)
print ("loss=", loss)
gradients = tape.gradient(loss, model.trainable_variables)
print (gradients)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_step (TestFile, TestFile[4])
The error given is:
c:\program files\python37\lib\site-packages\tensorflow_core\python\ops\losses\loss_reduction.py:67: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if key not in cls.all():
...
ValueError: Invalid Reduction Key [[21.05224609 20.79420471 34.9659729 ... 48.09233093 68.83874512
83.10766602]
[20.93516541 17.0511322 39.00476074 ... 56.74258423 47.75274658
98.57067871]
[38.18562317 22.70791626 24.37176514 ... 64.9606781 47.65338135
67.61506653]
...
[85.76565552 79.45443726 73.64129639 ... 73.66456604 47.06422424
49.44664001]
[87.14616394 82.38183594 77.00856018 ... 66.21652222 71.32862854
58.39285278]
[36.74142456 37.27145386 34.52891541 ... 29.58699036 37.37667847
30.25990295]].
This is my first question here on Stack Overflow: please let me know if I can make it any clearer!
You correctly create the "loss object", but never use it. Instead, your code tries to create a new "loss object" with the images as parameters (which doesn't work). Instead, you want to put the images into the already-created loss object. You just have to change this line
loss= tf.keras.losses.MeanSquaredError(img, mk)
to
loss= loss_object(img, mk)
The following two models/compilations behave differently:
def custom_loss(y_true, y_pred):
return keras.losses.binary_crossentropy(y_true, y_pred)
optimizer = Adam(lr=5e-3)
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['accuracy'])
And:
optimizer = Adam(lr=5e-3)
model.compile(loss=keras.losses.binary_crossentropy, optimizer=optimizer, metrics=['accuracy'])
What can be the reason?
If you implement a custom binary cross-entropy loss, you should also specify the right accuracy metric. This is because if you use Keras' binary cross-entropy, then Keras will automatically adjust which accuracy metric to use (between binary and categorical accuracy).
This doesn't happen if you use a custom loss, and then Keras will default to categorical accuracy, which is actually wrong, producing incorrect accuracy values. For example:
model.compile(loss=custom_loss, optimizer=optimizer, metrics=['binary_accuracy'])
Todo :
I would like to add a weight for each pattern loss in a given Keras loss function.
For example: if the error on pattern i is l_i, I would like to consider, instead, an error l_i * c_i, where c_i is an input scalar.
def customloss(y_true, y_pred):
c_i = ...
loss = ...(only use tensor operations on y_true and y_pred or use built in keras losses)
return c_i*loss
Now compile your model passing the loss function.
model.compile(loss = customloss)
Suppose, for example, a regression problem with five scalars as output, where each output has approximately the same range. In Keras, we can model this using a 5-output dense layer without activation function (vector regression):
output_layer = layers.Dense(5, activation=None)(previous_layer)
model = models.Model(input_layer, output_layer)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is the total loss (metric) simply the sum of the individual losses (metrics)? Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? In my experiments, I haven't observed any significant differences but want to make sure that I didn't miss anything fundamental.
output_layer_list = []
for _ in range(5):
output_layer_list.append(layers.Dense(1, activation=None)(previous_layer))
model = models.Model(input_layer, output_layer_list)
model.compile(optimizer='rmsprop', loss='mse', metrics=['mse'])
Is there an easy way to attach weights to the outputs in the first solution similar to specifying loss_weights in case of multi-output models?
Those models are the same. To answer your questions let's look at the mse loss:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
Is the total loss (metric) simply the sum of the individual losses (metrics)? Yes, because the mse loss applies the K.mean function so you can argue it is the sum of all the elements in the output vector.
Is this equivalent to the following multi-output model, where the outputs have the same implicit loss weights? Yes, because subtraction and squaring are done element wise in vector form, so scalar outputs will produce the same as a single vector output. And a multi-output model loss is the sum of losses of individual outputs.
Yes, both are equivalent. To replicate the loss_weights functionality with your first model, you can define your own custom loss function. Something along these lines:
import tensorflow as tf
weights = K.variable(value=np.array([[0.1, 0.1, 0.1, 0.1, 0.6]]))
def custom_loss(y_true, y_pred):
return tf.matmul(K.square(y_true - y_pred), tf.transpose(weights))
and pass this function to the loss argument upon compiling:
model.compile(optimizer='rmsprop', loss=custom_loss, metrics=['mse'])