Load model by from_pretrained() then model.train()? - pytorch

I got one question about torch.
I load pre-training model like:
model_name = "bert-base-uncased"
model = BertTokenizer.from_pretrained(model_name)
and I read To train the model, you should first set it back in training mode with model.train().
but I don't understand how it does work. when I read document of from_pretrained(), there isn't any explanation about train().
How it works?

.train() is a method of torch.nn.Module. It notifies the module to switch to the training mode, see documentation. What exactly happens under the hood is up to the actual Module, in many modules it doesn't change anything, so without knowing your network we cannot say what examctly happens. But for instance in torch.nn.BatchNormNd it has an effect.

Related

How to save model architecture in PyTorch?

I know I can save a model by torch.save(model.state_dict(), FILE) or torch.save(model, FILE). But both of them don't save the architecture of model.
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ? I want to apply different tweaks to my model. Do I have any better way than copying the whole class definition every time and creating a new class if I can't save the architecture of a model?
You can refer to this article to understand how to save the classifier. To make a tweaks to a model, what you can do is create a new model which is a child of the existing model.
class newModel( oldModelClass):
def __init__(self):
super(newModel, self).__init__()
With this setup, newModel has all the layers as well as the forward function of oldModelClass. If you need to make tweaks, you can define new layers in the __init__ function and then write a new forward function to define it.
Saving all the parameters (state_dict) and all the Modules is not enough, since there are operations that manipulates the tensors, but are only reflected in the actual code of the specific implementation (e.g., reshapeing in ResNet).
Furthermore, the network might not have a fixed and pre-determined compute graph: You can think of a network that has branching or a loop (recurrence).
Therefore, you must save the actual code.
Alternatively, if there are no branches/loops in the net, you may save the computation graph, see, e.g., this post.
You should also consider exporting your model using onnx and have a representation that captures both the trained weights as well as the computation graph.
Regarding the actual question:
So how can we save the architecture of a model in PyTorch like creating a .pb file in Tensorflow ?
The answer is: You cannot
Is there any way to load a trained model without declaring the class definition before ?
I want the model architecture as well as parameters to be loaded.
no, you have to load the class definition before, this is a python pickling limitation.
https://discuss.pytorch.org/t/how-to-save-load-torch-models/718/11
Though, there are other options (probably you have already seen most of those) that are listed at this PyTorch post:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
PyTorch's way of serializing a model for inference is to use torch.jit to compile the model to TorchScript.
PyTorch's TorchScript supports more advanced control flows than TensorFlow, and thus the serialization can happen either through tracing (torch.jit.trace) or compiling the Python model code (torch.jit.script).
Great references:
Video which explains this: https://www.youtube.com/watch?app=desktop&v=2awmrMRf0dA
Documentation: https://pytorch.org/docs/stable/jit.html

Why isn't my RL model behaving the same after being loaded in pytorch?

I'm training some simple neural networks for Reinforcement Learning in Pytorch. At the end of the training, I save the model like so:
torch.save(self.policy_NN.state_dict(), self.model_fname)
It's doing pretty good at this point. Then later, in another script, I load it again, like so:
self.policy_NN.load_state_dict(torch.load(model_fname))
And then just play out the episode, as if the training never stopped (except I'm not doing DQN learning anymore, it's just taking the greedy action at each point). So I'd expect it to behave basically as it did when I saved it.
However, whenever I load it, it behaves completely differently, to the point that it seems like it didn't learn at all before I saved it. For example, if I look at the last 1000 time steps of the training session, it will get many rewards, but after loading it, it gets basically none.
I've verified (by doing print(self.policy_NN.state_dict())) that the weights and biases are in fact the same when I save the model and when I load it again.
What could be going on? Is there something else to the network that might not be getting saved somehow?
Dropout and some other layers behave differently in eval and train mode. You can switch between these by model.train() and model.eval().
I remember reading that RL typically suffers from brittle learning where changing the input even slightly causes wildly different performance. The example was when training an algorithm on an Atari game and simply by shifting the screen right by one pixel the network lost all performance gains.
You might want to check that in both modes your environment behaves similarly.

Using a pytorch model for inference

I am using the fastai library (fast.ai) to train an image classifier. The model created by fastai is actually a pytorch model.
type(model)
<class 'torch.nn.modules.container.Sequential'>
Now, I want to use this model from pytorch for inference. Here is my code so far:
torch.save(model,"./torch_model_v1")
the_model = torch.load("./torch_model_v1")
the_model.eval() # shows the entire network architecture
Based on the example shown here: http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#sphx-glr-beginner-data-loading-tutorial-py, I understand that I need to write my own data loading class which will override some of the functions in the Dataset class. But what is not clear to me is the transformations that I need to apply at test time? In particular, how do I normalize the images at test time?
Another question: is my approach of saving and loading the model in pytorch fine? I read in the tutorial here: http://pytorch.org/docs/master/notes/serialization.html that the approach that I have used is not recommended. The reason is not clear though.
Just to clarify: the_model.eval() not only prints the architecture, but sets the model to evaluation mode.
In particular, how do I normalize the images at test time?
It depends on the model you have. For instance, for torchvision modules, you have to normalize the inputs this way.
Regarding on how to save / load models, torch.save/torch.load "saves/loads an object to a disk file."
So, if you save the_model, it will save the entire model object, including its architecture definition and some other internal aspects. If you save the_model.state_dict(), it will save a dictionary containing the model state (i.e. parameters and buffers) only. Saving the model can break the code in various ways, so the preferred method is to save and load only the model state. However, I'm not sure if fast.ai "model file" is actually a full model or the state of a model. You have to check this so you can correctly load it.

Continue training a deeplearning4j model after it has been saved and loaded

I am using a Convolutional Neural Network and I am saving it and loading it via the model serializer class.
What I want to do is to be able to come back at a later time and continue training the model on new data provided to it.
What I am doing is I load it using
ComputationGraph net = ModelSerializer.restoreComputationGraph(modelFileName);
and then I give it the data like before with
net.train(dataSetIterator);
This seems to work, but it makes my accuracy really bad. It was about 89% before I did this, and, using the same data, it gets to be around 50% accurate after a few iterations (using the same data it just trained itself on, so if anything it should be getting stupidly more accurate right?).
Am I missing a step?
I think it'll be difficult to answer based on the information given, but I'll give you an example.
I had this exact problem. I had based my app on the GravesLSTMCharModellingExample (which is LSTM). I had saved my model after running for a couple of epochs (at which point it generated legible sentences), but when loading it, it produced garbage.
I thought everything was the same, but in the end it turned out I didn't initialize the CharacterIterator the same. When I fixed it, it worked as expected.
So to cut a long story short;
Check your values when initializing the auxiliary classes.

sklearn pickled model "Attribute Error: model has no attribute classes_"

I built a sklearn model in one PC and pickled it. When, I tried to use the same model in another PC, I get below error:
Attribute Error: model has no attribute classes_
When I built the model , I did check
Model.classes_
It printed the classes. What could be the reason for this?
it would help knowing the version details of scikit-learn in both the PCs but the information on the website tells it is possible to happen due to different versions. Hope it helps.
Think of pickle as a way to dump and load a snapshot of an object and its environment at a given moment.
Sometimes the object you deal with does not mean anything by itself . You must provide extra data with it.
That's particularly the case with trained classifiers. In your case, model_classes works perfectly in your script when you have you data and fitted your classifier. Suppose now that you have dumped the classifier and loaded it later in another script : what are the classes you are talking about ? About what data are we talking about ? ... Got it ?
What you have to do then is to provide additional meta data when pickling.
This section of sklearn's documentation describes what needs to be pickled alonside the classifier (training data, source code ...).
NB:
Check first that both versions of sklearn are the same. Sometimes it can be just that.

Resources