Should I use from_logits on AUC keras metric with sigmoid activation function? - keras

I'm currently implementing a convolutional neural network model that outputs binary classification (true or false), and the labels are all either 0 or 1. When using the "sigmoid" activation function for the final dense layer, I was wondering whether to set from_logits to true or false in the "binary_crossentropy" loss and the AUC keras metric. I.E. does the sigmoid activation function output logits?
The accuracy with either seems somewhat similar, although it is different, I'm wondering which is most accurate to the parameters of the model. I'm using ResNet50 at the moment.

Related

Why can't I use softmax in regression task for probabilities?

I have a supervised learning task f(X)=y where X is a 2-dimentional np.array of np.int8 and y is a 1-dimentional array of np.float64 containing probabilities (so numbers between 0 and 1). I want to build a Neural Network model that performs regression in order to predict said probabilities y given X.
As the output of my Network is one real value (i.e. the output layer has one neuron) and is a probability (so in the range [0, 1]), I believe I should use softmax as the activation function of the output layer (i.e. output neuron) in order to squash the network's output to [0, 1].
As it is a regression task, I opted for using the mean_squared_error loss (instead of cross_entropy_loss that is typically used in classification tasks and often paired with softmax).
However, as I am trying to fit(X, y) the loss does not change at all between epochs and remains constant. Any ideas why? Is the combination of softmax and mean_squared_error loss wrong for some reason and why?
If I remove the softmax it does work, but then my model would also predict non probabilities which I do not want. Yes, I could squash it myself later but it doesn't seem right.
My code basically is (after removing some irrelevant additional callbacks for EarlyStopping and learning rate scheaduling):
model = Sequential()
model.add(Dense(W1_size, input_shape=(input_dims,), activation='relu'))
model.add(Dense(1, activation='softmax'))
# compile model
model.compile(optimizer=Adam(), loss='mse') # mse is the standard loss for regression
# fit
model.fit(X, y, batch_size=batch_size, epochs=MAX_EPOCHS)
Edit: Turns out I needed the sigmoid function to squash one real value to [0, 1] as the accepted answer suggests. The softmax function for a vector of size 1 is always 1.
As you stated you want to perform a regression task. (Which means, finding a continuous mapping between your input and desired output).
The softmax function creates a pseudo-probability distribution for multi-dimensional outputs (all values sum up to 1). This is the reason why the softmax function perfectly fits for classification tasks (predicting probabilities for different classes).
As you want to perform a regression task and your output is one-dimensional, softmax would not work properly because it is always 1 for a one-dimensional input.
A function which maps a one-dimensional input continuously to [0,1] works fine here (e.g Sigmoid).
Note that you can also interpret both the output of the sigmoid and the softmax function as probabilities. But be careful: these are only pseudo-probabilities and it is not representing the certainty or uncertainty of your model in making predictions.

Multi class classifcation with Pytorch

I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax on the network output during evaluation.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

Accuracy on middle layer of autoencoder implemente using Keras

I have implemented an autoencoder using Keras. I understand that I can add accuracy performance metric as follows:
autoencoder.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
My question is:
Is the accuracy metric applied on the last layer of the decoder by default? If so, how can I set it so that it would get the representations from middle (hidden) layer to compute accuracy performance? Do I need to define a custom metric? How would that work?
It seems that what you really want is a multiple output network.
So on top of your middle layer that defines your embedding, add a layer (or more) to do your classification.
Then have a look at Multiple outputs in Keras to create your global cost.
You may also want to start by training the autoendoder only, then the classifier additional layers only to see the performance, you can also balance the accuracy of the encoder vs the accuracy of the classifier as a loss, training "both" networks at the same time.

What is the generator method for probability prediction in Keras?

I want to predict probabilities for a bi-classification problem. Previously I was using model.predict_proba or predict_on_batch for this issue. Now I want to use generators in my scripts, but I can't find a generator such as evaluate_generator or predict_generator. Both of evaluate_generator or predict_generator won't generate probabilities. What is the generator method for probability prediction in Keras?
Whatever the output is a probability or not depends on the actual neural network model, not on predict_generator. If your model already outputs probabilities, meaning it has a softmax activation at the output, then using predict_generator should give you probability values.

Resources