Do I need to apply the Softmax Function ANYWHERE in my multi-class classification Model? - pytorch

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.
Most of what I state here, I know from the following video.
https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s
What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.
What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.
So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.
Thank you guys for every answer this gets.

For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.
Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.
Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss
Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites

Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

Related

Multi class classifcation with Pytorch

I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax on the network output during evaluation.

Keras "acc" metrics - an algorithm

In Keras I often see people compile a model with mean square error function and "acc" as metrics.
model.compile(optimizer=opt, loss='mse', metrics=['acc'])
I have been reading about acc and I can not find an algorithm for it?
What if I would change my loss function to binary crossentropy for an example and use 'acc' as metrics? Would this be the same metrics as in first case or Keras changes this acc based on loss function - so binary crossentropy in this case?
Check the source code from line 375. The metric_fn change dependent on loss function, so it is automatically handled by keras.
If you want to compare models using different loss function it could in some cases be necessary to specify what accuracy method you want to grade your model with, such that the models actually are tested with the same tests.

Pre-training for multi label classification

I have to pre train a model for multi label classification. I'm pretraining with cifar10 dataset and I wonder if I have to use for the pre training
'categorical_crossentrpy' (softmax) or 'binary_crossentropy' (sigmoid), since in the first case I have a multi classification problem
You should use softmax because it gives you the probabilities for every class, no matter how many of them are there. Sigmoid, as you have written is used with binary_crossentropy and is used in binary classification (hence binary in the name). I hope it's clearer now.

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

What is the generator method for probability prediction in Keras?

I want to predict probabilities for a bi-classification problem. Previously I was using model.predict_proba or predict_on_batch for this issue. Now I want to use generators in my scripts, but I can't find a generator such as evaluate_generator or predict_generator. Both of evaluate_generator or predict_generator won't generate probabilities. What is the generator method for probability prediction in Keras?
Whatever the output is a probability or not depends on the actual neural network model, not on predict_generator. If your model already outputs probabilities, meaning it has a softmax activation at the output, then using predict_generator should give you probability values.

Resources