Pytorch how use a linear activation function - keras

In Keras, I can create any network layer with a linear activation function as follows (for example, a fully-connected layer is taken):
model.add(keras.layers.Dense(outs, input_shape=(160,), activation='linear'))
But I can't find the linear activation function in the PyTorch documentation. ReLU is not suitable, because there are negative values in my sample. How do I create a layer with a linear activation function in PyTorch?

If you take a look at the Keras documentation, you will see tf.keras.layers.Dense's activation='linear' corresponds to the a(x) = x function. Which means no non-linearity.
So in PyTorch, you just define the linear function without adding any activation layer:
torch.nn.Linear(160, outs)

activation='linear' is equivavlent to no activation at all.
As can be seen here, it is also called "passthrough", meaning the it does nothing.
So in pytorch you can simply not apply any activation at all, to be in parity.
However, as already told by #Minsky, hidden layer without real activation, i.e. some non-linear activation is useless. It is like changing the weights which is anyway done during the network taining.

As already answered you don't need a linear activation layer in pytorch. But if you need to include it, you can write a custom one, that passes the output as follows.
class linear(torch.nn.Module):
# a linear activation function based on y=x
def forward(self, output):return output
The you can call it like any other activation function.
linear()(torch.tensor([1,2,3])) == nn.ReLU()(torch.tensor([1,2,3]))

Related

Do I need to apply the Softmax Function ANYWHERE in my multi-class classification Model?

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.
Most of what I state here, I know from the following video.
https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s
What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.
What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.
So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.
Thank you guys for every answer this gets.
For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.
Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.
Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss
Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites
Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

Multi class classifcation with Pytorch

I'm new with Pytorch and I need a clarification on multiclass classification.
I'm fine-tuning the DenseNet neural network, so it can recognize 3 different classes.
Because it's a multiclass problem, I have to replace the classification layer in this way:
kernelCount = self.densenet121.classifier.in_features
self.densenet121.classifier = nn.Sequential(nn.Linear(kernelCount, 3), nn.Softmax(dim=1))
And use CrossEntropyLoss as the loss function:
loss = torch.nn.CrossEntropyLoss(reduction='mean')
By reading on Pytorch forum, I found that CrossEntropyLoss applys the softmax function on the output of the neural network. Is this true? Should I remove the Softmax activation function from the structure of the network?
And what about the test phase? If it's included, I have to call the softmax function on the output of the model?
Thanks in advance for your help.
Yes, CrossEntropyLoss applies softmax implicitly. You should remove the softmax layer at the end of the network since softmax is not idempotent, therefore applying it twice would be a semantic error.
As far as evaluation/testing goes. Remember that softmax is a monotonically increasing operation (meaning the relative order of outputs doesn't change when you apply it). Therefore the result of argmax before and after softmax will give the same result.
The only time you may want to perform softmax explicitly during evaluation would be if you need the actual confidence value for some reason. If needed you can apply softmax explicitly using torch.softmax on the network output during evaluation.

Few questions about Keras documentation

In Keras documentation named activations.md, it says "Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers.". Then what is the meaning of forward layers? I think some layers don't have an activation parameter.(ex. Dropout layer)
And "Activations that are more complex than a simple TensorFlow/Theano/CNTK function (eg. learnable activations, which maintain a state) are available as Advanced Activation layers, and can be found in the module keras.layers.advanced_activations. These include PReLU and LeakyReLU.". Then what is the meaning of state in this case?
I am not sure there is a strict definition of "forward layers" in this context, but basically what it means is that the "classic", keras-built-in types of layers comprising one or more sets of weights used to transform an input matrix into an output one have a activation argument. Typically, Dense layers have one, as well as the various kinds of RNN and CNN layers.
It would not make sense for Dropout layers to have an activation function : they simply add a mechanism triggered at training to (hopefully) improve convergence rate and decrease overfitting chances.
As for the idea of "maintaining a state", it refers to activation functions that would not behave independently on each and every fed-in sample, but would instead retain some learnable information (the so-called state). Typically, for a LeakyReLU activation, you could adjust the leak parameter through training (and it would, in the documentation's terminology, be referred to as a state of this activation function).

What kind of activation is used by ScikitLearn's MLPClasssifier in output layer?

I am currently working on a classification task with given class labels 0 and 1. For this I am using ScikitLearn's MLPClassifier providing an output of either 0 or 1 for each training example. However, I can not find any documentation, what the output layer of the MLPClassifier is exactly doing (which activation function? encoding?).
Since there is an output of only one class I assume something like One-hot_encoding is used. Is this assumption correct? Is there any documentation tackling this question for the MLPClassifier?
out_activation_ attribute would give you the type of activation used in the output layer of your MLPClassifier.
From Documentation:
out_activation_ : string
Name of the output activation function.
The activation param just sets the hidden layer's activation function.
activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’
Activation function for the hidden layer.
The output layer is decided internally in this piece of code.
# Output for regression
if not is_classifier(self):
self.out_activation_ = 'identity'
# Output for multi class
elif self._label_binarizer.y_type_ == 'multiclass':
self.out_activation_ = 'softmax'
# Output for binary class and multi-label
else:
self.out_activation_ = 'logistic'
Hence, for binary classification it would be logistic and for multi-class it would be softmax.
To know more details about these activations, see here.
You have most of the information in the docs. The MLP is a simple neural network. It can use several activation functions, the default is relu.
It doesn't use one-hot encoding, rather you need to feed in a y (target) vector with class labels.
My understanding is that the last activation function is the logistic function, and the output is set to 1 if the probability is >0.5 and to 0 otherwise.
However, you can output the probability if you want.

How to use MC Dropout on a variational dropout LSTM layer on keras?

I'm currently trying to set up a (LSTM) recurrent neural network with Keras (tensorflow backend).
I would like to use variational dropout with MC Dropout on it.
I believe that variational dropout is already implemented with the option "recurrent_dropout" of the LSTM layer but I don't find any way to set a "training" flag to put on to true like a classical Dropout layer.
This is quite easy in Keras, first you need to define a function that takes both model input and the learning_phase:
import keras.backend as K
f = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[-1].output])
For a Functional API model with multiple inputs/outputs you can use:
f = K.function([model.inputs, K.learning_phase()],
[model.outputs])
Then you can call this function like f([input, 1]) and this will tell Keras to enable the learning phase during this call, executing Dropout. Then you can call this function multiple times and combine the predictions to estimate uncertainty.
The source code for "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015) is located at https://github.com/yaringal/DropoutUncertaintyExps/blob/master/net/net.py. They also use Keras and the code is quite easy to understand. The Dropout layers are used without the Sequential api in order to pass the training parameter. This is a different approach to the suggestion from Matias:
inter = Dropout(dropout_rate)(inter, training=True)

Resources