how to implement the loss function of the output for neural network when some target is missing? - pytorch

I am building a neural network for regression problem and the loss is defined as the MSE loss between the output and target. I have multiple loss functions, and for one of the loss function not every output has a target to calculate the mse_loss with.
I tried to use the output itself as the target with the true target is missing. So that the loss will be 0 when there is no true label thus it has no effect for the loss backward. However it does not work and the test loss just kept increasing as the training goes. Is there a better way to implement this? Thanks.

Related

Do I need to apply the Softmax Function ANYWHERE in my multi-class classification Model?

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.
Most of what I state here, I know from the following video.
https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s
What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.
What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.
So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.
Thank you guys for every answer this gets.
For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.
Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.
Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss
Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites
Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

I get different validation loss for almost same type of accuracy

I made two different convolution neural networks for a multi-class classification. And I tested the performance of the two networks using evaluate_generator function in keras. Both models give me comparable accuracies. One gives me 55.9% and the other one gives me 54.8%. However, the model that gives me 55.9% gives a validation loss of 5.37 and the other 1.24.
How can these test losses be so different when the accuracies are
similar. If anything I would expect the loss for the model with
55.9% accuracy to be lower but it's not.
Isn't loss the total sum of errors the network is making?
Insights would be appreciated.
Isn't loss the total sum of errors the network is making?
Well, not really. Loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.
For exaple, in regression tasks loss function can be mean squared error. In classification - binary or categorical crossentropy. These loss functions measure how your model understanding of data is close to the reality.
Why both loss and accuracy are high?
High loss doesn't mean you model don't know anything. In basic case you can think about it that the smaller the loss, the more confident the model is in its choice.
So model with the higher loss not really sure about its answers.
You can also read this discussion about high loss and accuracy
Even though the accuracies are similar, the loss value is not correlated when comparing different models.
Accuracy measures the fraction of correctly classified samples over the Total Population of your samples.
With regards to the loss value, from keras documentation:
Return value
For scalars, the loss value of the test (if the model does not have a merit function) or > a list of scalars (if the model computes another merit function).
If this doesn't help on your case (I don't have a way to reproduce the issue), please check the following known issues in keras, with regards to the evaluate_generator function:
evaluate_generator

Loss function negative log likelihood giving loss despite perfect accuracy

I am debugging a sequence-to-sequence model and purposely tried to perfectly overfit a small dataset of ~200 samples (sentence pairs of length between 5-50). I am using negative log-likelihood loss in pytorch. I get low loss (~1e^-5), but the accuracy on the same dataset is only 33%.
I trained the model on 3 samples as well and obtained 100% accuracy, yet during training I had loss. I was under the impression that negative log-likelihood only gives loss (loss is in the same region of ~1e^-5) if there is a mismatch between predicted and target label?
Is a bug in my code likely?
There is no bug in your code.
The way things usually work in deep nets is that the networks predicts the logits (i.e., log-likelihoods). These logits are then transformed to probability using soft-max (or a sigmoid function). Cross-entropy is finally evaluated based on the predicted probabilities.
The advantage of this approach is that is numerically stable, and easy to train with. On the other side, because of the soft-max you can never have "perfect" 0/1 probabilities for your predictions: That is, even when your network has perfect accuracy it will never assign probability 1 to the correct prediction, but "close to one". As a result, the loss will always be positive (albeit small).

Keras "acc" metrics - an algorithm

In Keras I often see people compile a model with mean square error function and "acc" as metrics.
model.compile(optimizer=opt, loss='mse', metrics=['acc'])
I have been reading about acc and I can not find an algorithm for it?
What if I would change my loss function to binary crossentropy for an example and use 'acc' as metrics? Would this be the same metrics as in first case or Keras changes this acc based on loss function - so binary crossentropy in this case?
Check the source code from line 375. The metric_fn change dependent on loss function, so it is automatically handled by keras.
If you want to compare models using different loss function it could in some cases be necessary to specify what accuracy method you want to grade your model with, such that the models actually are tested with the same tests.

Keras: How is the loss evaluated during optimization for a network with multiple output layers?

I am using functional API in Keras to build a neural network model with multiple output layers.
I was wondering how the loss is evaluated when updating the weights during optimization (When doing back-prop). Assuming that the same loss function is used, is then the average loss of all outputs used to minimize the cost function or is each output evaluated separately to update the weights?
Thanks in advance!
There is always only one loss that is used to backpropagate the errors, when a model has multiple outputs, then each output is associated one loss, and then a "global" loss is constructed by weighting the loss for each output. You can set the weight for each loss when you compile the model.

Resources