Fine-tuning PyTorch: No deepcopy - pytorch

Regarding fine-tuning CNNs in PyTorch, as per SAVING AND LOADING MODELS:
If you only plan to keep the best performing model (according to the acquired validation loss), … You must serialize best_model_state or use best_model_state = deepcopy(model.state_dict()) otherwise your best best_model_state will keep getting updated by the subsequent training iterations. As a result, the final model state will be the state of the overfitted model.
However, I have done something like this:
def train_model(model, ...):
...
if validation_loss improves:
delete previous best model
torch.save(model.state_dict(), best_model_path)
else:
....
...
return model
def test_model(model, best_model_path, ...):
model.load_state_dict(torch.load(best_model_path))
model.eval()
...
...
my_model = train_model(my_model, ...)
test_model(my_model, my_path, ...)
In other words, the model returned by the training phase is the final one which is likely to present overfitting (I did not use deepcopy). But since I saved the best model during training, I have no problem during the test/inference phase because I load the best model, overloading the final model obtained during testing.
Is something wrong with this solution?
Thank you.

You’re still following the tutorial’s instructions. Note this part of the tutorial:
You must serialize best_model_state or use best_model_state = deepcopy(model.state_dict())
You serialized the best model’s state (wrote it to disk), so you don’t need to use deepcopy.
If you kept the model in memory, you’d use deepcopy to make sure it’s not altered during training. But because you’re keeping it on disk instead, it won’t be altered.

Related

How can/should we weight classes in HuggingFace token classification (entity recognition)?

I'm training a token classification (AKA named entity recognition) model with the HuggingFace Transformers library, with a customized data loader.
Like most NER datasets (I'd imagine?) there's a pretty significant class imbalance: A large majority of tokens are other - i.e. not an entity - and of course there's a little variation between the different entity classes themselves.
As we might expect, my "accuracy" metrics are getting distorted quite a lot by this: It's no great achievement to get 80% token classification accuracy if 90% of your tokens are other... A trivial model could have done better!
I can calculate some additional and more insightful evaluation metrics - but it got me wondering... Can/should we somehow incorporate these weights into the training loss? How would this be done using a typical *ForTokenClassification model e.g. BERTForTokenClassification?
This is actually a really interesting question, since it seems there is no intention (yet) to modify losses in the models yourself. Specifically for BertForTokenClassification, I found this code segment:
loss_fct = CrossEntropyLoss()
# ...
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
To actually change the loss computation and add other parameters, e.g., the weights you mention, you can go about either one of two ways:
You can modify a copy of transformers locally, and install the library from there, which makes this only a small change in the code, but potentially quite a hassle to change parts during different experiments, or
You return your logits (which is the case by default), and calculate your own loss outside of the actual forward pass of the huggingface model. In this case, you need to be aware of any potential propagation from the loss calculated within the forward call, but this should be within your power to change.

example of doing simple prediction with pytorch-lightning

I have an existing model where I load some pre-trained weights and then do prediction (one image at a time) in pytorch. I am trying to basically convert it to a pytorch lightning module and am confused about a few things.
So currently, my __init__ method for the model looks like this:
self._load_config_file(cfg_file)
# just creates the pytorch network
self.create_network()
self.load_weights(weights_file)
self.cuda(device=0) # assumes GPU and uses one. This is probably suboptimal
self.eval() # prediction mode
What I can gather from the lightning docs, I can pretty much do the same, except not to do the cuda() call. So something like:
self.create_network()
self.load_weights(weights_file)
self.freeze() # prediction mode
So, my first question is whether this is the correct way to use lightning? How would lightning know if it needs to use the GPU? I am guessing this needs to be specified somewhere.
Now, for the prediction, I have the following setup:
def infer(frame):
img = transform(frame) # apply some transformation to the input
img = torch.from_numpy(img).float().unsqueeze(0).cuda(device=0)
with torch.no_grad():
output = self.__call__(Variable(img)).data.cpu().numpy()
return output
This is the bit that has me confused. Which functions do I need to override to make a lightning compatible prediction?
Also, at the moment, the input comes as a numpy array. Is that something that would be possible from the lightning module or do things always have to use some sort of a dataloader?
At some point, I want to extend this model implementation to do training as well, so want to make sure I do it right but while most examples focus on training models, a simple example of just doing prediction at production time on a single image/data point might be useful.
I am using 0.7.5 with pytorch 1.4.0 on GPU with cuda 10.1
LightningModule is a subclass of torch.nn.Module so the same model class will work for both inference and training. For that reason, you should probably call the cuda() and eval() methods outside of __init__.
Since it's just a nn.Module under the hood, once you've loaded your weights you don't need to override any methods to perform inference, simply call the model instance. Here's a toy example you can use:
import torchvision.models as models
from pytorch_lightning.core import LightningModule
class MyModel(LightningModule):
def __init__(self):
super().__init__()
self.resnet = models.resnet18(pretrained=True, progress=False)
def forward(self, x):
return self.resnet(x)
model = MyModel().eval().cuda(device=0)
And then to actually run inference you don't need a method, just do something like:
for frame in video:
img = transform(frame)
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
output = model(img).data.cpu().numpy()
# Do something with the output
The main benefit of PyTorchLighting is that you can also use the same class for training by implementing training_step(), configure_optimizers() and train_dataloader() on that class. You can find a simple example of that in the PyTorchLightning docs.
Even though above answer suffices, if one takes note of following line
img = torch.from_numpy(img).float().unsqueeze(0).cuda(0)
One has to put both the model as well as image to the right GPU. On multi-gpu inference machine, this becomes a hassle.
To solve this, .predict was also recently produced, see more at https://pytorch-lightning.readthedocs.io/en/stable/deploy/production_basic.html

How to save model architecture in PyTorch?

I know I can save a model by torch.save(model.state_dict(), FILE) or torch.save(model, FILE). But both of them don't save the architecture of model.
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ? I want to apply different tweaks to my model. Do I have any better way than copying the whole class definition every time and creating a new class if I can't save the architecture of a model?
You can refer to this article to understand how to save the classifier. To make a tweaks to a model, what you can do is create a new model which is a child of the existing model.
class newModel( oldModelClass):
def __init__(self):
super(newModel, self).__init__()
With this setup, newModel has all the layers as well as the forward function of oldModelClass. If you need to make tweaks, you can define new layers in the __init__ function and then write a new forward function to define it.
Saving all the parameters (state_dict) and all the Modules is not enough, since there are operations that manipulates the tensors, but are only reflected in the actual code of the specific implementation (e.g., reshapeing in ResNet).
Furthermore, the network might not have a fixed and pre-determined compute graph: You can think of a network that has branching or a loop (recurrence).
Therefore, you must save the actual code.
Alternatively, if there are no branches/loops in the net, you may save the computation graph, see, e.g., this post.
You should also consider exporting your model using onnx and have a representation that captures both the trained weights as well as the computation graph.
Regarding the actual question:
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ?
The answer is: You cannot
Is there any way to load a trained model without declaring the class definition before ?
I want the model architecture as well as parameters to be loaded.
no, you have to load the class definition before, this is a python pickling limitation.
https://discuss.pytorch.org/t/how-to-save-load-torch-models/718/11
Though, there are other options (probably you have already seen most of those) that are listed at this PyTorch post:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
PyTorch's way of serializing a model for inference is to use torch.jit to compile the model to TorchScript.
PyTorch's TorchScript supports more advanced control flows than TensorFlow, and thus the serialization can happen either through tracing (torch.jit.trace) or compiling the Python model code (torch.jit.script).
Great references:
Video which explains this: https://www.youtube.com/watch?app=desktop&v=2awmrMRf0dA
Documentation: https://pytorch.org/docs/stable/jit.html

XGBoost get classifier object form booster object?

I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.

Keras: does save_model really save all optimizer weights?

Suppose you have a Keras model with an optimizer like Adam that you save via save_model.
If you load the model again with load_model, does it really load ALL optimizer parameters + weights?
Based on the code of save_model(Link), Keras saves the config of the optimizer:
f.attrs['training_config'] = json.dumps({
'optimizer_config': {
'class_name': model.optimizer.__class__.__name__,
'config': model.optimizer.get_config()},
which, in the case of Adam for example (Link), is as follows:
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
As such, this only saves the fundamental parameters but no per-variable optimizer weights.
However, after dumping the config in save_model, it looks like some optimizer weights are saved as well (Link). Unfortunately, I can't really understand if every weight of the optimizer saved.
So if you want to continue training the model in a new session with load_model, is the state of the optimizer really 100% the same as in the last training session? E.g. in the case of SGD with momentum, does it save all per-variable momentums?
Or in general, does it make a difference in training if you stop and resume training with save/load_model?
It seem your links don't point to the same lines anymore than they originally pointed to at the time of your question, so I don't know which lines you are referring to.
But the answer is yes, the entire state of the optimizer is saved along with the model. You can see this happening in save_model(). Also if you wish not to save the optimizer weights, you can do so by calling save_model(include_optimizer=False).
If you inspect the resulting *.h5 file, for example by means of h5dump | less, you can see those weights. (h5dump is part of h5utils.)
Therefore saving a model and loading it again later should make no difference in many common cases. However there are exceptions not related to the optimizer. One that comes to my mind right now is an LSTM(stateful=True) layer which I believe does not save the internal LSTM states when calling save_model(). There are possibly many more reasons why interrupting the training with save/load might not produce the exact same results as training without interruption. But investigating this maybe makes sense only in the context of concrete code.

Resources