Wrapping All Parameters of a Pytorch Model with A Function - pytorch

I want to be able to wrap all convolutional parameters (weight and bias) of my PyTorch model with a function e.g. softmax(weight), sigmoid(weight), or alpha*weight, etc. Is there any way of making this function part of the graph using hooks etc.?
For instance: Let say I have a model called net which has trainable parameters from those I am interested in say only in convolutional layers:
for module in net.modules():
if is_conv_layer(module):
module.weight # This is the weight I am interested in.
So, replacing the value of the weights are easy:
for module in net.modules():
if is_conv_layer(module):
module.weight.data = sigmoid(module.weight.data)
But, I want to also add sigmoid to the graph so that when backprop is computed, sigmoid is also taken into account.

Related

Pytorch how use a linear activation function

In Keras, I can create any network layer with a linear activation function as follows (for example, a fully-connected layer is taken):
model.add(keras.layers.Dense(outs, input_shape=(160,), activation='linear'))
But I can't find the linear activation function in the PyTorch documentation. ReLU is not suitable, because there are negative values in my sample. How do I create a layer with a linear activation function in PyTorch?
If you take a look at the Keras documentation, you will see tf.keras.layers.Dense's activation='linear' corresponds to the a(x) = x function. Which means no non-linearity.
So in PyTorch, you just define the linear function without adding any activation layer:
torch.nn.Linear(160, outs)
activation='linear' is equivavlent to no activation at all.
As can be seen here, it is also called "passthrough", meaning the it does nothing.
So in pytorch you can simply not apply any activation at all, to be in parity.
However, as already told by #Minsky, hidden layer without real activation, i.e. some non-linear activation is useless. It is like changing the weights which is anyway done during the network taining.
As already answered you don't need a linear activation layer in pytorch. But if you need to include it, you can write a custom one, that passes the output as follows.
class linear(torch.nn.Module):
# a linear activation function based on y=x
def forward(self, output):return output
The you can call it like any other activation function.
linear()(torch.tensor([1,2,3])) == nn.ReLU()(torch.tensor([1,2,3]))

TensorFlow 2.4.0 - Parameters associated with BatchNorm and Activation

I am printing a tensorflow.keras.Model instance summary. The type is tensorflow.python.keras.engine.functional.Functional object.
This model has layers with activations and batch normalization associated. When I print the list of parameters, I see
weights
bias
4 items co-dimensional with the bias
These four items are (I guess) the batch normalization and activations.
My question is: why do we have parameters associated with batch normalization and activations? And what could be the other two items?
My aim is to transpose this Keras model to a PyTorch counterpart, so I need to know the order of the parameters and what these parameters represent
there are no parameters associated with activations, those are simply some element-wise nonlinear function. So no matter how many activations you have they don't account for any parameter counts. However, your guess is right, there are in fact parameters associated with BatchNorm layer, 2 parameters in each BatchNorm layer to be precise (lambda and beta). So those BatchNorm layer does add additional parameters in your network.

Monitor F1 Score (or a custom metric in general) in a keras callback

Keras 2.0 removed F1 score, but I would like to monitor its value. I am using a sequential model to train a Neural Net.
I defined a function, as suggested here How to calculate F1 Macro in Keras?.
This function works fine only if used it inside model.compile. In this way I see its value at each step. The problem is that I don't want just to see its value but I would like my training to behave differently according to its value, using the callbacks of Keras.
If I try to insert my custom metric in the callbacks then I get this error:
'function object is not iterable'
Do you know how to define a function such that it can be used as an argument in the callbacks?
Callback of Keras will enable us to retrieve the model at different period, based on the metric which we keep track of. This will not affect the training procedure of the model.
You can train your model only with respect to some loss function. For example, cross entropy for classification problem. The readily available loss function in keras are given here
Precision, recall or f1-score are not differentialable functions. Hence, we cannot use that as a loss function for model training.
May be, if you want to tune your hyperparameter (such as learning rate, class weights) for improving f1 score, then you can be do that.
For tuning hyper parameters you can use hyperopt, tutorials

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

keras/theano loss or op that uses full network

I am looking for a way to implement the following network structure (currently using Keras, might be theano however):
Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
How can this be done? I think the operation could be performed either in a custom layer on top of the network or in a custom loss function - but for neither version I see a way to access the full network. Any suggestions?
Assume we're given some simple network, but it is not possible to compute the desired loss based on this output directly, instead another operation is needed and the loss will be defined based on the output of this operation. However, this operation does not only need the output of the network but the full network object (eg its gradient).
Say, you have the following model.
import keras.applications.vgg16 as vgg16
model = vgg16.VGG16(weights='imagenet')
model.summary()
For example, now you want to delete the last layer of this model which is actually predicts a category (a vector of length 1000 because imagenet has 1000 categories) for the input image.
# Remove last Linear/Dense layer.
model.layers.pop()
model.outputs = [model.layers[-1].output]
model.layers[-1].outbound_nodes = []
model.summary()
Now, lets add a linear layer (with output size 10) to this model and use the output of the modified neural network model.
model.add(Dense(10, activation='softmax'))
model.summary()
You will get a vector (of length 10) as an output from this model.
You can compile and train the model using model.compile() and model.fit() functions. You can set what type of loss function you want to use to train the model.

Resources