Loaded PyTorch model has a different result compared to saved model - pytorch

I have a python script that trains and then tests a CNN model. The model weights/parameters are saved after testing through the use of:
checkpoint = {'state_dict': model.state_dict(),'optimizer' :optimizer.state_dict()}
torch.save(checkpoint, path + filename)
After saving I immediately load the model through the use of a function:
model_load = create_model(cnn_type="vgg", numberofclasses=len(cases))
And then, I load the model weights/parameters through:
model_load.load_state_dict(torch.load(filePath+filename), strict = False)
model_load.eval()
Finally, I feed this model the same testing data I used before the model was saved.
The problem is that the testing results are not the same when I compare the testing results of the model before saving and after loading. My hunch is that due to strict = False, some of the parameters are not being passed through to the model. However, when I make strict = True. I receive errors. Is there a work around this?
The error message is:
RuntimeError: Error(s) in loading state_dict for CNN:
Missing key(s) in state_dict: "linear.weight", "linear.bias", "linear 2.weight", "linea r2.bias", "linear 3.weight", "linear3.bias". Unexpected key(s) in state_dict: "state_dict", "optimizer".

You are loading a dictionary containing the state of your model as well as the optimizer's state. According to your error stack trace, the following should solve the issue:
>>> model_state = torch.load(filePath+filename)['state_dict']
>>> model_load.load_state_dict(model_state, strict=True)

Related

AttributeError: 'CountVectorizer' object has no attribute '_load_specials'

I am dumping my pretrained doc2vec model using below command
model.train(labeled_data,total_examples=model.corpus_count, epochs=model.epochs)
print("Model Training Done")
#Saving the created model
model.save(project_name + '_doc2vec_vectorizer.npz')
vectorizer=CountVectorizer()
vectorizer.fit(df[0])
vec_file = project_name + '_doc2vec_vectorizer.npz'
**pickle.dump(vectorizer, open(vec_file, 'wb'))**
vdb = db['vectorizers']
and then I am loading Doc2vec model using below command in another function
loaded_vectorizer = pickle.load(open(vectorizer, 'rb'))
and then getting the error CountVectorizer has no attribute _load_specials on below line i.e model2
model2= gensim.models.doc2vec.Doc2Vec.load(vectorizer)
The gensim version being used by me is 3.8.3 as I am using the LabeledSentence class
The .load() method on Gensim model classes should only be used with objects of exactly that same class that were saved to file(s) *using the Gensim .save() method.
Your code shows you trying to use Doc2Vec.load() with the vectorizer object itself (not a file path to the previously-saved model), so the error is to be expected.
If you actually want to pickle-save & then pickle-load the vectorizer object, be sure to:
use a different file path than you did for the model, or you'll overwrite the model file!
use pickle methods (not Gensim methods) to re-load anything that was pickle-saved

model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1))) AttributeError: 'NoneType' object has no attribute 'add'

This error occur in Maxpooling stage while i train my CNN model
Error: Attribute Error: 'None Type' object has no attribute 'current'. Please help.
model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))
The question is missing some info, but I think I can see what's going on.
Assuming that model was at some point a tf.models.Sequential() I guess you did something like:
model = models.Sequential()
model = model.add(...)
model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))
However, that's not quite how model.add(..) works. Instead of returning a new model, it modifies the existing model.
Instead you should do something like:
model = models.Sequential() # create a first model
model.add(...) # add things to the existing model
model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))

onnxjs opset versions and MaxPool

I'm trying to export an FCN from torchvision using the following code:
model= models.segmentation.fcn_resnet101(pretrained=True, progress=True, num_classes=21, aux_loss=None)
model.eval()
x = torch.randn(1,3,512,512)
torch_out = model(x)
torch.onnx.export(model, x, "seg_rn.onnx",
export_params=True,
opset_version=11,
do_constant_folding=True,
verbose=True
)
When exporting the model, I need minimum opset 11 to support the way pytorch's interpolation works, and this is confirmed by the output of the onnx model when running in the python onnx runtime.
Running in the python onnx runtime is fine, but when I load the model in onnxjs, like this:
var session = new InferenceSession();
const modelURL = "./models/seg_rn.onnx";
await session.loadModel(modelURL);
I get Uncaught (in promise) TypeError: cannot resolve operator 'Shape' with opsets: ai.onnx v11
If I go and create my own copies of bits from torchvision.models.segmentation, I can get rid of the error about Shape (by speciying a static shape for the input and telling the interpolation what the resizing factor should be), but then I get basically the same error: Uncaught (in promise) TypeError: cannot resolve operator 'MaxPool' with opsets: ai.onnx v11 but this time in reference to MaxPool. Ignoring tests and outputting with opset v10 results in a loadable model, but one which will be incorrect.
What is going on? Is there any way forward, or am I basically stuck?

The inference file that goes into the entry point of PyTorchModel to be deployed does not have an effect to the output of the predictor

I am currently running the code on AWS Sagemaker, trying to predict data using an already-trained model, accessed by MODEL_URL.
With the code below, the inference.py as the entry_point does not seem to have an effect on the result of the trained prediction model. Any changes in inference.py does not alter the output (the output is always correct). Is there something I am misunderstanding with how the model works? And how can I incorporate inference.py to the prediction model as the entry point?
role = sagemaker.get_execution_role()
model = PyTorchModel(model_data = MODEL_URL,
role = role,
framework_version = '0.4.0',
entry_point = '/inference.py',
source_dir = SOURCE_DIR)
predictor = model.deploy(instance_type = 'ml.c5.xlarge',
initial_instance_count = 1,
endpoint_name = RT_ENDPOINT_NAME)
result = predictor.predict(someData)
The entrypoint (inference.py) is the code file that defines how a model is loaded, input preprocessing, prediction logic, and output postprocessing.
Any changes in inference.py does not alter the output
What are you changing in inference.py that you expect alter the result of predictor.predict? If the underlying model_data is not changing, the entry point script will be using the same model. Are you making some change to loading the model in model_fn, or in processing predictions via input_fn, predict_fn, or output_fn?

while running huggingface gpt2-xl model embedding index getting out of range

I am trying to run hugginface gpt2-xl model. I ran code from the quickstart page that load the small gpt2 model and generate text by the following code:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained('gpt2')
generated = tokenizer.encode("The Manhattan bridge")
context = torch.tensor([generated])
past = None
for i in range(100):
print(i)
output, past = model(context, past=past)
token = torch.argmax(output[0, :])
generated += [token.tolist()]
context = token.unsqueeze(0)
sequence = tokenizer.decode(generated)
print(sequence)
This is running perfectly. Then I try to run gpt2-xl model.
I changed tokenizer and model loading code like following:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-xl")
model = GPT2LMHeadModel.from_pretrained('gpt2-xl')
The tokenizer and model loaded perfectly. But I a getting error on the following line:
output, past = model(context, past=past)
The error is:
RuntimeError: index out of range: Tried to access index 204483 out of table with 50256 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418
Looking at error it seems that the embedding size is not correct. So I write the following line to specifically fetch the config file of gpt2-xl:
config = GPT2Config.from_pretrained("gpt2-xl")
But, here vocab_size:50257
So I changed explicitly the value by:
config.vocab_size=204483
Then after printing the config, I can see that the previous line took effect in the configuration. But still, I am getting the same error.
This was actually an issue I reported and they fixed it.
https://github.com/huggingface/transformers/issues/2774

Resources