I am playing around with code from this Github repository https://github.com/jindongwang/Pytorch-CapsuleNet.
After training the model for 5 epochs, I got an accuracy of 99.2% on the test dataset. So I saved model using the following code:
torch.save(capsule_net.state_dict(),"capsnet_mnist_state.pt")
I tried loading the model back in another machine with the below code:
capsnet = CapsNet(Config())
capsnet.load_state_dict(torch.load('capsnet_mnist_state.pt'))
capsnet.eval()
Now the model predicts the 0 has the output for every input. Is there anything wrong with way I saved the model or loaded the model?.
I don't think there is anything wrong with the way you saved or the way you load your model.
But I happened to see the same problem because I didn't initialise some parameters, so they were stuck to 'None' even after loading the state dict.
I would suggest to look into the loaded state_dict to see if any parameters makes no sense.
Hope this help.
Related
I got one question about torch.
I load pre-training model like:
model_name = "bert-base-uncased"
model = BertTokenizer.from_pretrained(model_name)
and I read To train the model, you should first set it back in training mode with model.train().
but I don't understand how it does work. when I read document of from_pretrained(), there isn't any explanation about train().
How it works?
.train() is a method of torch.nn.Module. It notifies the module to switch to the training mode, see documentation. What exactly happens under the hood is up to the actual Module, in many modules it doesn't change anything, so without knowing your network we cannot say what examctly happens. But for instance in torch.nn.BatchNormNd it has an effect.
I am using the class api to subclass a model based on keras.models.Model.
Is there some trick to getting the save_weights working?
Am seeing errors like
ValueError: Layer #0 (named "dense_S") expects 0 weight(s), but the saved weights have 2 element(s).
Am trying by_name=True and False.
EDIT: it seems that calling predict once, with ANY data, is needed to build the layers for some reason. It would be interesting to here a proper explanation from anyone who knows more.
I have some models saved that have dropout layers. Unfortunately, the dropout_keep_dim value was not given as placeholders. Now when I restore the model for test purpose, it gives random output for each run. So, my question is, is it possible to change the dropout_keep_dim of a saved variable? The dropout layer is added the following way:
tf.nn.dropout(layer_no, dropout_keep_dim)
I have already wasted hours on google and didn't find any working solution. Is there even a solution or are my saved models of no use now? Tf.assign doesn't work as, in my case, dropout_keep_dim is not a variable. Any kind of help is appreciated.
NB. I can restore the dropout_keep_dim value and print it. I want to change it if that's possible and then test with the saved weights.
I am using a Convolutional Neural Network and I am saving it and loading it via the model serializer class.
What I want to do is to be able to come back at a later time and continue training the model on new data provided to it.
What I am doing is I load it using
ComputationGraph net = ModelSerializer.restoreComputationGraph(modelFileName);
and then I give it the data like before with
net.train(dataSetIterator);
This seems to work, but it makes my accuracy really bad. It was about 89% before I did this, and, using the same data, it gets to be around 50% accurate after a few iterations (using the same data it just trained itself on, so if anything it should be getting stupidly more accurate right?).
Am I missing a step?
I think it'll be difficult to answer based on the information given, but I'll give you an example.
I had this exact problem. I had based my app on the GravesLSTMCharModellingExample (which is LSTM). I had saved my model after running for a couple of epochs (at which point it generated legible sentences), but when loading it, it produced garbage.
I thought everything was the same, but in the end it turned out I didn't initialize the CharacterIterator the same. When I fixed it, it worked as expected.
So to cut a long story short;
Check your values when initializing the auxiliary classes.
Suppose you have a Keras model with an optimizer like Adam that you save via save_model.
If you load the model again with load_model, does it really load ALL optimizer parameters + weights?
Based on the code of save_model(Link), Keras saves the config of the optimizer:
f.attrs['training_config'] = json.dumps({
'optimizer_config': {
'class_name': model.optimizer.__class__.__name__,
'config': model.optimizer.get_config()},
which, in the case of Adam for example (Link), is as follows:
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
As such, this only saves the fundamental parameters but no per-variable optimizer weights.
However, after dumping the config in save_model, it looks like some optimizer weights are saved as well (Link). Unfortunately, I can't really understand if every weight of the optimizer saved.
So if you want to continue training the model in a new session with load_model, is the state of the optimizer really 100% the same as in the last training session? E.g. in the case of SGD with momentum, does it save all per-variable momentums?
Or in general, does it make a difference in training if you stop and resume training with save/load_model?
It seem your links don't point to the same lines anymore than they originally pointed to at the time of your question, so I don't know which lines you are referring to.
But the answer is yes, the entire state of the optimizer is saved along with the model. You can see this happening in save_model(). Also if you wish not to save the optimizer weights, you can do so by calling save_model(include_optimizer=False).
If you inspect the resulting *.h5 file, for example by means of h5dump | less, you can see those weights. (h5dump is part of h5utils.)
Therefore saving a model and loading it again later should make no difference in many common cases. However there are exceptions not related to the optimizer. One that comes to my mind right now is an LSTM(stateful=True) layer which I believe does not save the internal LSTM states when calling save_model(). There are possibly many more reasons why interrupting the training with save/load might not produce the exact same results as training without interruption. But investigating this maybe makes sense only in the context of concrete code.