neural network binary classification softmax logsofmax and loss function - pytorch

I am building a binary classification where the class I want to predict is present only <2% of times. I am using pytorch
The last layer could be logosftmax or softmax.
self.softmax = nn.Softmax(dim=1) or self.softmax = nn.LogSoftmax(dim=1)
my questions
I should use softmax as it will provide outputs that sum up to 1 and I can check performance for various prob thresholds. is that understanding correct?
if I use softmax then can I use cross_entropy loss? This seems to suggest that it is okay to use
if i use logsoftmax then can I use cross_entropy loss? This seems to suggest that I shouldnt.
if I use softmax then is there any better option than cross_entropy loss?
` cross_entropy = nn.CrossEntropyLoss(weight=class_wts)`

Related

Should I use from_logits on AUC keras metric with sigmoid activation function?

I'm currently implementing a convolutional neural network model that outputs binary classification (true or false), and the labels are all either 0 or 1. When using the "sigmoid" activation function for the final dense layer, I was wondering whether to set from_logits to true or false in the "binary_crossentropy" loss and the AUC keras metric. I.E. does the sigmoid activation function output logits?
The accuracy with either seems somewhat similar, although it is different, I'm wondering which is most accurate to the parameters of the model. I'm using ResNet50 at the moment.

Multi class classifcation with Pytorch

I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax on the network output during evaluation.

Find top layers for a fine-tuned model

I want to use a fine-tuned model, based on MobileNetV2 (pre-trained on Keras). But I need to add top layers in order to classify my images into 2 classes. I would like to know how to choose the "architecture" of layers that I need ?
In some examples, people use SVM Classifer or series of Dense layer with a specific number of neurons as top layers.
The following code (by default), it works :
self.base_model = base_model
x = self.base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
Is there any methodology to find the best solution ?
I'll recommend either Dropout or BatchNormalization. Dense can be easily overfitted because it has too many parameters in a layer. Both layers can regularize the model well. GlobalAveragePooling2D is a good choice because it also acts like regularizer itself.
I'll also suggest that, for the binary classification problem, you can change the output layer to be Dense(1, activation='sigmoid') to predict only P(class1), where you can calculate P(class2) by 1-P(class1). The loss you should use in this case will be binary_crossentropy instead of categorical_crossentropy.

What does a loss list in Keras mean?

I've just seen
# Combined model trains generators to fool discriminators
self.combined = Model(inputs=[img_A, img_B],
outputs=[ valid_A, valid_B,
reconstr_A, reconstr_B,
img_A_id, img_B_id ])
self.combined.compile(loss=['mse', 'mse',
'mae', 'mae',
'mae', 'mae'],
loss_weights=[ 1, 1,
self.lambda_cycle, self.lambda_cycle,
self.lambda_id, self.lambda_id ],
optimizer=optimizer)
in CycleGan - what does the list of losses mean / do? Before, I only worked with exactly one loss function per model.
This model has multiple inputs (two) and multiple outputs (six), so you need to specify one loss function for each output. That's why there is a list of losses.
Additionally a model can only be trained with a single loss function, and for a multi-output model, this is accomplished by creating a virtual loss that is a weighted combination of all per-output losses, and this is what the loss_weights parameter is for.
It is also worth noting that you need to provide a list of loss functions for a multi-output model only if the loss functions used are NOT all the same. For example, if all the output layers use the Mean Squared Error loss, then you only need to provide loss='mse'.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

Resources