Adding Custom Loss for Validation to the Keras.History object - keras

I have a loss function which includes several contributions, i.e.
L=L1+L2+... .
I am in particular interested in the individual development of L1,L2... on both the training and validation data set during learning.
If I generate my model via subclassing (and Functional API) and perform the training via model.fit(), how can I add the validation losses maybe called "val_L1", "Val_L2"... to the History-Object?
Thanks for any help

I figured it out by myself. I hope this will help someone in future who is also struggling with this issue.
If you define your customized model as the subclass of tf.keras.Model you have to use functions "train_step" and "test_step" via
def train_step (....): and def test_step (...):.
"train_step" is the function which is used to describe the training procedure according model.fit().
if both functions return:
return {'L1':L1, 'L2':L2}
the history-object will contain automatically 'val_L1' and 'val_L2'

Related

How to get the logits for the T5 model when using the `generate` method for inference?

I'm currently using HuggingFace's T5 implementation for text generation purposes. More specifically, I'm using the T5ForConditionalGeneration to solve a text classification problem as generation.
The model's performance is overall very satisfactory after training, but what I am wondering is how I can get the logits for generation?
I'm currently performing inference as is suggested in the documentation via model.generate(**tokenizer_outputs), but this simply outputs the IDs themselves without anything else.
The reason why I want the logits is because I want to measure the model's confidence of generation. I'm not 100% certain if my approach is correct, but I'm thinking that if I can get the logit values of each generated token and average them, I could get the overall confidence score of the generated sequence.
Would anybody know how I could do this? Thanks.
I was struggling with this because I wasn't familiar with how the Transformers library works, but after looking at the source code all you have to do is set the arguments output_scores and return_dict_in_generate to True.
For more information, take a look at the method transformers.generation.utils.GenerationMixin.generate.

How to save model architecture in PyTorch?

I know I can save a model by torch.save(model.state_dict(), FILE) or torch.save(model, FILE). But both of them don't save the architecture of model.
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ? I want to apply different tweaks to my model. Do I have any better way than copying the whole class definition every time and creating a new class if I can't save the architecture of a model?
You can refer to this article to understand how to save the classifier. To make a tweaks to a model, what you can do is create a new model which is a child of the existing model.
class newModel( oldModelClass):
def __init__(self):
super(newModel, self).__init__()
With this setup, newModel has all the layers as well as the forward function of oldModelClass. If you need to make tweaks, you can define new layers in the __init__ function and then write a new forward function to define it.
Saving all the parameters (state_dict) and all the Modules is not enough, since there are operations that manipulates the tensors, but are only reflected in the actual code of the specific implementation (e.g., reshapeing in ResNet).
Furthermore, the network might not have a fixed and pre-determined compute graph: You can think of a network that has branching or a loop (recurrence).
Therefore, you must save the actual code.
Alternatively, if there are no branches/loops in the net, you may save the computation graph, see, e.g., this post.
You should also consider exporting your model using onnx and have a representation that captures both the trained weights as well as the computation graph.
Regarding the actual question:
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ?
The answer is: You cannot
Is there any way to load a trained model without declaring the class definition before ?
I want the model architecture as well as parameters to be loaded.
no, you have to load the class definition before, this is a python pickling limitation.
https://discuss.pytorch.org/t/how-to-save-load-torch-models/718/11
Though, there are other options (probably you have already seen most of those) that are listed at this PyTorch post:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
PyTorch's way of serializing a model for inference is to use torch.jit to compile the model to TorchScript.
PyTorch's TorchScript supports more advanced control flows than TensorFlow, and thus the serialization can happen either through tracing (torch.jit.trace) or compiling the Python model code (torch.jit.script).
Great references:
Video which explains this: https://www.youtube.com/watch?app=desktop&v=2awmrMRf0dA
Documentation: https://pytorch.org/docs/stable/jit.html

XGBoost get classifier object form booster object?

I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.

Keras: better way to implement layer-wise training model?

I'm currently learning implementing layer-wise training model with Keras. My solution is complicated and time-costing, could someone give me some suggestions to do it in a easy way? Also could someone explain the topology of Keras especially the relations among nodes.outbound_layer, nodes.inbound_layer and how did they associated with tensors: input_tensors and output_tensors? From the topology source codes on github, I'm quite confused about:
input_tensors[i] == inbound_layers[i].inbound_nodes[node_indices[i]].output_tensors[tensor_indices[i]]
Why the inbound_nodes contain output_tensors, I'm not clear about the relations among them....If I wanna remove layers in certain positions of the API model, what should I firstly remove? Also, when adding layers to some certain places, what shall I do first?
Here is my solution to a layerwise training model. I can do it on Sequential model and now trying to implement in on the API model:
To do it, I'm simply add a new layer after finish previous training and re-compile (model.compile()) and re-fit (model.fit()).
Since Keras model requires output layer, I would always add an output layer. As a result, each time when I wanna add a new layer, I have to remove the output layer then add it back. This can be done using model.pop(), in this case model has to be a keras.Sequential() model.
The Sequential() model supports many useful functions including model.add(layer). But for customised model using model API: model=Model(input=...., output=....), those pop() or add() functions are not supported and implement them takes some time and maybe not convenient.

How am I supposed to use RandomizedLogisticRegression in Scikit-learn?

I simply have failed to understand the documentation for this class.
I can fit data using it, and get the scores for features, but it this all this class is supposed to do?
I can't see how I can use it to actually perform regression using the model that was fit. The example in the documentation above is simply creating an instance of the class, so I can't see how that is supposed to help.
There are methods that perform 'transform' operation, but no mention of what kind of transform that is.
so is it possible to use this class to get actual predictions on new test data, and is it possible to use it in cross fold validation to compare performance with other methods I'm using?
I've used the highest ranking features in other classifiers, but I'm not sure if more than that is possible with this classifier.
Update: I've found the use for fit_transform under feature selection part of the documentation:
When the goal is to reduce the dimensionality of the data to use with another classifier, they expose a transform method to select the non-zero coefficient
Unless I get an answer that says I'm wrong, I'll assume that this classifier indeed does not do prediction. I'll wait before I answer my own question.
Randomized LR is supposed to be a feature selection method, not a classifier in and of itself. Its API matches that of a standard scikit-learn transformer:
randomlr = RandomizedLogisticRegression()
X_train = randomlr.fit_transform(X_train)
X_test = randomlr.transform(X_test)
Then fit a model to X_train and do classification on X_test as usual.

Resources