MAE loss function used for classification

MAE loss function used for classification - keras

For my classification problem I want to use loss function used prettly for regression, such as Mean Absolute Error.
Consider "y_pred" and "y_true" are in one-hot-encoding, but for MAE i need them in real number representation.
In this first case I'me getting error: ValueError: No gradients provided for any variable
def AgeAccuracyRegularity(y_pred,y_true):
mae_func = tf.keras.losses.MeanAbsoluteError()
y_pred_ages = K.cast(K.argmax(y_pred, axis=-1)+1,dtype='float32')
y_true_ages = K.cast(K.argmax(y_true, axis=-1)+1,dtype='float32')
res = mae_func(y_pred_ages, y_true_ages)
return res
But if I manipulate the result with no sense in this way
def AgeAccuracyRegularity(y_pred,y_true):
mae_func = tf.keras.losses.MeanAbsoluteError()
y_pred_ages = K.cast(K.argmax(y_pred, axis=-1)+1,dtype='float32')
y_true_ages = K.cast(K.argmax(y_true, axis=-1)+1,dtype='float32')
res = mae_func(y_pred_ages, y_true_ages)
mae = mae_func(y_pred, y_true)
return res-mae+mae
it works. I check output of classificator and bot "mae" and "res" in the the custom loss function and they are the same size and type.

For classification problem you should use "sparse_categorical_crossentropy" loss function. MAE is intended to be used for regression problems.

Related

How to update PyTorch model parameters (tensors) after averaging them?

I'm currently working on a distributed federated learning infrastructure and am trying to implement PyTorch. For this I also need federated averaging which averages the retrieved parameters from all the nodes and then passes those to a next training round.
The gathering of the parameters looks like this:
def RPC_get_parameters(data, model):
"""
Get parameters from nodes
"""
with torch.no_grad():
for parameters in model.parameters():
# store parameters in dict
return {"parameters": parameters}
The averaging function which happens at the central server looks like this:
# stores results from RPC_get_parameters() in results
results = client.get_results(task_id=task.get("id"))
# averaging of returned parameters
global_sum = 0
global_count = 0
for output in results:
global_sum += output["parameters"]
global_count += len(global_sum)
#
averaged_parameters = global_sum/global_count
#
new_params = {'averaged_parameters': averaged_parameters}
Now my question is, how do you update all the parameters (tensors) in Pytorch from this? I tried a few things and they usually returned errors like "Value Error: can't optimize a non-leaf tensor" when inserting new_params into the optimizer where usually model.parameters() go optimizerD = optim.SGD(new_params, lr=0.01, momentum = 0.5). So how do I actually update the model so it uses the averaged parameters?
Thank you!
https://github.com/simontkl/torch-vantage6/blob/fed_avg-w/local_dp/v6-ppsdg-py/master.py

I think the most convenient way to work with parameters (outside the SGD context) is using the state_dict of the model.
new_params = OrderedDict()
n = len(clients) # number of clients
for client_model in clients:
sd = client_model.state_dict() # get current parameters of one client
for k, v in sd.items():
new_params[k] = new_params.get(k, 0) + v / n
After that new_params is a state_dict (you can load it using .load_state_dict) with the average weights of the clients.

Teacher force training PyTorch

I am trying to do a seq2seq prediction. For this, I have a LSTM layer followed by a fully connected layer. I employ Teacher training during the training phase and would like to skip this (I maybe wrong here) during testing phase. I have not found a direct way of doing this so I have taken the approach shown below.
def forward(self, inputs, future=0, teacher_force_ratio=0.2, target=None):
outputs = []
for idx in range(future):
rnn_out, _ = self.rnn(inputs)
output = self.fc1(rnn_out)
if self.teacher_training:
new_input = output if np.random.random() >= teacher_force_ratio else target[idx]
else:
new_input = output
inputs = new_input
I use a bool variable teacher_training to check if Teacher training is needed or not. Is this correct? If yes, is there a better way to do this? Thanks.

In PyTorch all classes that extend nn.Module have a kwarg boolean param called training . So instead of teacher_training we should simply use training param. This param is automatically set depending on your model training mode (model.train() and model.eval()).

custom loss function in Keras combining multiple outputs

I did a lot of searching and am still unable to figure out writing a custom loss function with multiple outputs where they interact.
I have a Neural Network defined as :
def NeuralNetwork():
inLayer = Input((2,));
layers = [Dense(numNeuronsPerLayer,activation = 'relu')(inLayer)];
for i in range(10):
hiddenLyr = Dense(5,activation = 'tanh',name = "layer"+ str(i+1))(layers[i]);
layers.append(hiddenLyr);
out_u = Dense(1,activation = 'linear',name = "out_u")(layers[i]);
out_k = Dense(1,activation = 'linear',name = "out_k")(layers[i]);
outLayer = Concatenate(axis=-1)([out_u,out_k]);
model = Model(inputs = [inLayer], outputs = outLayer);
return model
I am now trying to define a custom loss function as follows :
def computeLoss(true,prediction):
u_pred = prediction[:,0];
k_pred = prediction[:,1];
loss = f(u_pred)*k_pred;
return loss;
Where f(u_pred) is some manipulation of u_pred. The code seems to work correct and produce correct results when I use only u_pred (i.e., single output from the neural network only). However, the moment I try to include another output for k_pred and perform the slice of my prediction tensor in the loss function, I start getting wrong results. I feel I am doing something wrong in handling multiple outputs in Keras but am not sure where my mistake lies. Any help on how I may proceed is welcome.

I figured out that you can't just use indexing ( i.e., [:,0] or [:,1] ) to slice tensors in tf. The operation doesn't seem to work. Instead, use the built in function in tensorflow as
detailed in https://www.tensorflow.org/api_docs/python/tf/split?version=stable
So the code that worked was:
(u_pred, k_pred) = tf.split(prediction, num_or_size_splits=2, axis=1)

scalar custom loss function in keras for end-to-end time series prediction resulting in NaN loss and predictions

I am working on a denoising autoencoder for audio, feeding raw time-series audio to the network and receiving time-series audio as output from the network. The mean_square_error loss objective function returns values of shape (batch_size, audio_sequence_length), which (I hope I understood correctly) is further processed by Keras internally to reach the final single-valued loss used for backprop by computing the mean over time bins and batches.
My current efforts are focused on creating a custom loss function using signal power instead of the error of individual samples, returning values of shape (batch_size, ). The model compiles nicely but returns only NaN loss at training time. Trying to predict anything using such a model results in output vectors consisting of NaN as well.
This is the loss function:
def SI_SNR(yTrue,yPred):
yTarget = K.batch_dot(yTrue,yPred,axes=0)
yTarget = K.batch_dot(yTrue,yTarget,axes=None)
yNorm = K.batch_dot(yTrue,yTrue, axes = 0)
yTarget = yTarget/yNorm
eNoise = yPred - yTarget
losses = -(10.*K.log(K.batch_dot(yTarget,yTarget,axes=0)/
K.batch_dot(eNoise,eNoise,axes=0))/K.log(10.))
return K.reshape(losses,([-1]))
When using the function on actual numbers (either using a subset of the training data or randomly filled arrays) I do get non NaN results:
x=K.variable(np.random.rand(8,1024,1))
y=K.variable(np.random.rand(8,1024,1))
K.eval(SI_SNR(y,x))
Is the training behavior due to the shape of the loss or is there perhaps some other problem with the internal structure of the loss function?

To answer my own question: the output shape of the cost was not the issue. Tested this hypothesis using a different (dummy) loss:
def meanMSE(yTrue,yPred):
return K.mean(mean_squared_error(yTrue,yPred),axis=1)
If yPred is a vector of zeros, the previous cost function has Div0 issues, using backend.clip and modifying the function slightly, the problem is resolved:
def SDR(yTrue,yPred):
return(K.batch_dot(yPred,yPred,axes=1)/
K.clip(K.square(K.batch_dot(yPred,yTrue,axes=1)),1e-7,1e12))

New to theano. Trying to add a term to a loss function to penalize negative weights

To be clear, by weights I mean the entries in the matrices (Ws) of the affine transformation in a node of a neural net.
I start with categorical_crossentropy as my loss function. And I want to add an additional term to penalize negative weights.
To this end I want to introduce a term of the form
theano.tensor.sum(theano.tensor.exp(-10 * ws))
Where "ws" are the weights.
If I follow the source code of categorical_crossentropy:
if true_dist.ndim == coding_dist.ndim:
return -tensor.sum(true_dist *tensor.log(coding_dist), axis=coding_dist.ndim - 1)
elif true_dist.ndim == coding_dist.ndim - 1:
return crossentropy_categorical_1hot(coding_dist, true_dist)
else:
raise TypeError('rank mismatch between coding and true distributions')
Seems like I should update the third line (from the bottom) to read
crossentropy_categorical_1hot(coding_dist, true_dist) + theano.tensor.sum(theano.tensor.exp(- 10 * ws))
And change the declaration of the function to be
my_categorical_crossentropy(coding_dist, true_dist, ws) Where in calling for my_categorical_crossentropy I write
loss = my_categorical_crossentropy(net_output, true_output, l_layers[1].W)
with, for a start, l_layers[1].W to be the weights coming from the first layer of my neural net.
With those updates, I go on writing:
loss = aggregate(loss, mode = 'mean')
updates = sgd(loss, all_params, learning_rate = 0.005)
train = theano.function([l_input.input_var, true_output], loss, updates = updates)
[...]
This passes the compiler and everything runs smoothly, the training of the network completes. However, for some reason the additional term " theano.tensor.sum(theano.tensor.exp(- 10 * ws)) is ignored, it seems not to effect the loss value.
I was trying to look into Theano documentation, but so far I could not figure out what might be wrong? The weighs l_layers[1].W are shared variables, so I could not pass those as
train = theano.function([l_input.input_var, true_output, l_layers[1].W], loss, updates = updates)
Any comments are welcome. Thanks!
Solution
Though, I didn't find why what I did, didn't work, adding the penalty term outside the 'categorical_crossentropy' as suggested in the comments did solve the problem:
loss = aggregate(categorical_crossentropy(net_output, true_output) + theano.tensor.sum(theano.tensor.exp(- 10 * l_layers[1].W))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

MAE loss function used for classification - keras

For classification problem you should use "sparse_categorical_crossentropy" loss function. MAE is intended to be used for regression problems.

Related

How to update PyTorch model parameters (tensors) after averaging them?

Teacher force training PyTorch

custom loss function in Keras combining multiple outputs

scalar custom loss function in keras for end-to-end time series prediction resulting in NaN loss and predictions

New to theano. Trying to add a term to a loss function to penalize negative weights

Categories

Resources