kernel MSE custom loss function for Keras model - keras

I found this loss function in a paper
loss function equation
and I tried to implement it in python as follows:
import keras.backend as K
import math
sigma=math.sqrt(2)/2
s=2*sigma**2
def kernel_MSE(actual, predicted):
actual, predicted = K.flatten(actual), K.flatten(predicted)
sum= 0.0
for i in range(len(actual)):
sum += 1-(K.exp(-((predicted[i]-actual[i]) **2) /s))
return sum
then using this loss function in the model would be like this:
model = Sequential()
model.add(LSTM(units=256, return_sequences=True,input_shape(X_train.shape[1],X_train.shape[2]),activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(64))
model.add(Dense(32))
model.add(Dense(1))
optimizer= tf.keras.optimizers.Adam(lr=0.001,decay=0.00001)
model.compile(loss=kernel_MSE, optimizer=optimizer)
the code is working fine, but I am not sure if my implementation is correct. Could anyone check it?

Related

how to implement hamming loss as a custom metric in keras model

how to implement hamming loss as a custom metric in keras model
I have a multilabel classification with 6 classes
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy',hamming_loss])
I tried using
from sklearn.metrics import hamming_loss
def custom_hl(y_true, y_pred):
return hamming_loss(y_true, y_pred)
which doesn't work because i have the y_true , y_pred as follow
YTRUE
Tensor("Cast_10:0", shape=(None, 6), dtype=float32)
YPRED
Tensor("model_1/dense_1/Sigmoid:0", shape=(None, 6), dtype=float32)
also tried the function in this question and it doesn't work
Getting the accuracy for multi-label prediction in scikit-learn
is there any way I can get the hamming loss as metric in keras
thanks for any help
so i found a way and
def Custom_Hamming_Loss(y_true, y_pred):
return K.mean(y_true*(1-y_pred)+(1-y_true)*y_pred)
def Custom_Hamming_Loss1(y_true, y_pred):
tmp = K.abs(y_true-y_pred)
return K.mean(K.cast(K.greater(tmp,0.5),dtype=float))
source:https://groups.google.com/g/keras-users/c/_sjndHbejTY?pli=1

Tensorflow custom loss function - can't get samples of y_pred and y_true in loss function

I'm running an LSTM network that works fine (TF 2.0). My problem starts when trying to modify the loss function.
I planed to adjust some data manipulation over 'y_true' and 'y_pred' but since TF force to maintain the data as tensors (and not convert it to Pandas or NumPy) it is challenging.
To get better control of the data inside the loss function I've simulated tf.keras.losses.mae function.
My goal was to be able to see the data ('y_true' and 'y_pred') so I can make my desire adjustments.
The original function:
def mean_absolute_error(y_true, y_pred):
y_pred = ops.convert_to_tensor(y_pred)
y_true = math_ops.cast(y_true, y_pred.dtype)
return K.mean(math_ops.abs(y_pred - y_true), axis=-1)
And after adjustments for debugging:
from tensorflow.python.framework import ops
from tensorflow.python.ops import math_ops
import tensorflow.keras.backend as K
def mean_absolute_error_test(y_true, y_pred):
global temp_true
temp_true=y_true
print(y_true)
y_pred = ops.convert_to_tensor(y_pred)
y_true = math_ops.cast(y_true, y_pred.dtype)
return K.mean(math_ops.abs(y_pred - y_true), axis=-1)
when I run model.compile and print y_true I get:
Tensor("dense_target:0", shape=(None, None), dtype=float32)
type=tensorflow.python.framework.ops.Tensor
Does anyone know how can I see 'y_pred' and 'y_true' or what am I missing?
Seems like I can't see samples of y_true or the data is empty.
The main code part:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dropout,Dense
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential, load_model
from tensorflow.python.keras.layers.recurrent import LSTM
from tensorflow.keras.callbacks import EarlyStopping
K.clear_session()
model = Sequential()
model.add(LSTM(20,activation='relu',input_shape=(look_back,len(training_columns)),recurrent_dropout=0.4))
model.add(Dropout(0.1))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss=test2,experimental_run_tf_function=False)# mse,mean_squared_logarithmic_error
num_epochs = 20
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
history=model.fit(X_train_lstm, y_train_lstm, epochs = num_epochs, batch_size = 128,shuffle=False,verbose=1,validation_data=[X_test_lstm,y_test_lstm],callbacks=[es])

Evaluate keras model on part of output

I have a multi output regression model trained using Keras. Following is my network architecture:
model.add(Dense(4048, input_dim=16128,, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(3))
By calling:
score = model.evaluate(X_test, y_test)
I can get accuracy and mean absolute error over my test data and predicted values which is a array size of 3 by comparing to ground truth of array size 3.
My question is how can I evaluate the test data only on one output value, ignoring other two.
I somehow want to evaluate on average mean error and also individual absolute errors.
I would recommend one of the following two options:
a) Use the Keras functional API to define two different models model1 and model2 that are used to evaluate and train the network, respectively:
from keras.layers import Input, Dense, Concatenate
from keras.models import Model
a = Input((16128,))
h = Dense(4048, activation='relu')(a)
h = Dense(128, activation='relu')(h)
h1 = Dense(1)(h)
model1 = Model(a, h1)
h = Dense(2)(h)
h2 = Concatenate()([h1, h])
model2 = Model(a, h2)
# ... train on model2
# Evaluate on model1, which outputs the unit of interest
score = model1.evaluate(X_test, y_test)
b) Define your custom Keras metrics to exclusively select the unit of interest when computing the metrics.
Thanks for the hint. I took the option b and implemented my custom metrics as follows:
def MAE_ROLL(y_true, y_pred):
return K.mean(K.abs(y_pred[:, 0] - y_true[:, 0]))
def MAE_PITCH(y_true, y_pred):
return K.mean(K.abs(y_pred[:, 1] - y_true[:, 1]))
def MAE_YAW(y_true, y_pred):
return K.mean(K.abs(y_pred[:, 2] - y_true[:, 2]))
model.compile(loss=mean_absolute_error, optimizer='adam',metrics=[MAE_ROLL,MAE_PITCH,MAE_YAW])

Why does sigmoid function outperform tanh and softmax in this case?

The sigmoid function gives better results than tanh or softmax for the below neural network.
If I change the activation function from sigmoid to tanh or softmax the error increases an accuracy decreases. Although I have learned that tanh and softmax are better compared to sigmoid. Could someone help me understand this?
The datasets I used are iris and Pima Indians Diabetes Database. I have used TensorFlow 1.5 and Keras 2.2.4
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy as np
dataset = np.genfromtxt('diabetes.csv', dtype=float, delimiter=',')
X = dataset[1:, 0:8]
Y = dataset[1:, 8]
xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size=0.2, random_state=42)
model = Sequential()
model.add(Dense(10, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(xtrain, ytrain, epochs=50, batch_size=20)
print(model.metrics_names)
print(model.evaluate(xtest, ytest))
The value range is between -1 and 1, but that's not necessarily a problem as far as the Tanh is concerned. By learning suitable weights, the Tanh can fit to the value range [0,1] using the bias. Therefore both the Sigmoid and the Tangh can be used here. Only Softmax is not possible for the reasons mentioned. See the code below:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
X = np.hstack((np.linspace(0, 0.45, num=50), np.linspace(0.55, 1, num=50)))
Y = (X > 0.5).astype('float').T
model = Sequential()
model.add(Dense(1, input_dim=1, activation='tanh'))
model.compile(loss='binary_crossentropy', optimizer='SGD', metrics=['accuracy'])
model.fit(X, Y, epochs=100)
print(model.evaluate(X, Y, verbose=False))
Whenever someone says you should always prefer foo over bar in machine learning, it's probably an inadmissible simplification. There are anti-patterns that one can explain to people, things that never work, like the Softmax in the example above. If the rest were that simple, AutoML would be a very boring field of research ;) . PS: I'm not exactly working on AutoML.
Softmax activation function is generally used as a categorical activation. This is because softmax squashes the outputs between the range (0,1) so that the sum of the outputs is always 1. If your output layer only has one unit/neuron, it will always have a constant 1 as an output.
Tanh, or hyperbolic tangent is a logistic function that maps the outputs to the range of (-1,1). Tanh can be used in binary classification between two classes. When using tanh, remember to label the data accordingly with [-1,1].
Sigmoid function is another logistic function like tanh. If the sigmoid function inputs are restricted to real and positive values, the output will be in the range of (0,1). This makes sigmoid a great function for predicting a probability for something.
So, all in all, the output activation function is usually not a choice of model performance but actually is dependent on the task and network architecture you are working with.

Minimizing and maximizing the loss

I would like to train an autoencoder in such a way that the reconstruction error will be low on some observations, and high on the others.
from keras.model import Sequential
from keras.layers import Dense
import keras.backend as K
def l1Loss(y_true, y_pred):
return K.mean(K.abs(y_true - y_pred))
model = Sequential()
model.add(Dense(5, input_dim=10, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.compile(optimizer='adam', loss=l1Loss)
for i in range(1000):
model.train_on_batch(x_good, x_good) # minimize on low
model.train_on_batch(x_bad, x_bad, ???) # need to maximize this part, so that mse(x_bad, x_bad_reconstructed is high)
I saw something about replacing ??? with sample_weight=-np.ones(batch_size), but I have no idea if this is fitting for my goal.
If you set sample weight to negative numbers, then minimizing it would in fact lead to maximization of its absolute value.

Resources