How to perform Multi output regression using RoBERTa? - pytorch

I have a problem statement where I want to predict multiple continuous outputs using a text input. I tried using 'robertaforsequenceclassification' from HuggingFace library. But the documentation states that when the number of outputs in the final layer is more than 1, a cross entropy loss is used automatically as mentioned here: https://huggingface.co/transformers/v2.2.0/model_doc/bert.html#transformers.BertForSequenceClassification.
But I want to use an RMSE loss in a regression setting with two classes in the final layer. How would one go about modifying it?

BertForSequenceClassification is a small wrapper that wraps the BERTModel.
It calls the models, takes the pooled output (the second member of the output tuple), and applies a classifier over it. The code is here https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py#L1168
The simplest solution is writing your own simple wrapper class (based on the BertForSequenceClassification class) hat will do the regression that will do the regression with the loss you like.

Related

PyTorch LogSoftmax vs Softmax for CrossEntropyLoss

I understand that PyTorch's LogSoftmax function is basically just a more numerically stable way to compute Log(Softmax(x)). Softmax lets you convert the output from a Linear layer into a categorical probability distribution.
The pytorch documentation says that CrossEntropyLoss combines nn.LogSoftmax() and nn.NLLLoss() in one single class.
Looking at NLLLoss, I'm still confused...Are there 2 logs being used? I think of negative log as information content of an event. (As in entropy)
After a bit more looking, I think that NLLLoss assumes that you're actually passing in log probabilities instead of just probabilities. Is this correct? It's kind of weird if so...
Yes, NLLLoss takes log-probabilities (log(softmax(x))) as input. Why?. Because if you add a nn.LogSoftmax (or F.log_softmax) as the final layer of your model's output, you can easily get the probabilities using torch.exp(output), and in order to get cross-entropy loss, you can directly use nn.NLLLoss. Of course, log-softmax is more stable as you said.
And, there is only one log (it's in nn.LogSoftmax). There is no log in nn.NLLLoss.
nn.CrossEntropyLoss() combines nn.LogSoftmax() (that is, log(softmax(x))) and nn.NLLLoss() in one single class. Therefore, the output from the network that is passed into nn.CrossEntropyLoss needs to be the raw output of the network (called logits), not the output of the softmax function.

Monitor F1 Score (or a custom metric in general) in a keras callback

Keras 2.0 removed F1 score, but I would like to monitor its value. I am using a sequential model to train a Neural Net.
I defined a function, as suggested here How to calculate F1 Macro in Keras?.
This function works fine only if used it inside model.compile. In this way I see its value at each step. The problem is that I don't want just to see its value but I would like my training to behave differently according to its value, using the callbacks of Keras.
If I try to insert my custom metric in the callbacks then I get this error:
'function object is not iterable'
Do you know how to define a function such that it can be used as an argument in the callbacks?
Callback of Keras will enable us to retrieve the model at different period, based on the metric which we keep track of. This will not affect the training procedure of the model.
You can train your model only with respect to some loss function. For example, cross entropy for classification problem. The readily available loss function in keras are given here
Precision, recall or f1-score are not differentialable functions. Hence, we cannot use that as a loss function for model training.
May be, if you want to tune your hyperparameter (such as learning rate, class weights) for improving f1 score, then you can be do that.
For tuning hyper parameters you can use hyperopt, tutorials

How to adopt multiple different loss functions in each steps of LSTM in Keras

I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy for the language model and MSE for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/

Image Classification using Tensorflow

I am doing transfer-learning/retraining using Tensorflow Inception V3 model. I have 6 labels. A given image can be one single type only, i.e, no multiple class detection is needed. I have three queries:
Which activation function is best for my case? Presently retrain.py file provided by tensorflow uses softmax? What are other methods available? (like sigmoid etc)
Which Optimiser function I should use? (GradientDescent, Adam.. etc)
I want to identify out-of-scope images, i.e. if users inputs a random image, my algorithm should say that it does not belong to the described classes. Presently with 6 classes, it gives one class as a sure output but I do not want that. What are possible solutions for this?
Also, what are the other parameters that we may tweak in tensorflow. My baseline accuracy is 94% and I am looking for something close to 99%.
Since you're doing single label classification, softmax is the best loss function for this, as it maps your final layer logit values to a probability distribution. Sigmoid is used when it's multilabel classification.
It's always better to use a momentum based optimizer compared to vanilla gradient descent. There's a bunch of such modified optimizers like Adam or RMSProp. Experiment with them to see what works best. Adam is probably going to give you the best performance.
You can add an extra label no_class, so your task will now be a 6+1 label classification. You can feed in some random images with no_class as the label. However the distribution of your random images must match the test image distribution, else it won't generalise.

Feed an unseen example to a pre-trained model made in Keras

I've implemented a neural network using Keras. Once trained and tested for final test accuracy, using a matrix with a bunch of rows containing features (plus corresponding labels), I have a model which I should be able to use for prediction.
How can I feed a single unseen example, meaning a feature vector to the model, to obtain a class prediction?
I've looked at their documentation here but could not find a method for it.
What you want is the predict method, it takes a batch of input samples and produces predictions, which are the outputs computer by your network. To feed a single example you can just put it inside a numpy ndarray wrapper.

Resources