I am new to reinforcement learning and I have implemented a simple q-learning model using just Numpy. So I would like to know how I can save the model to use it later, and also I want to know how I can use it later, i.e, how do I use it for evaluation?
Thanks a lot.
Q-learning models are simple tables!
Therefore, you could use something like pickle to save your object in a file, and then load it the next time you need to use it.
import pickle
with open("qtable.pickle", "wb") as f:
pickle.dump(q_table, f)
to save your model.
To retrieve your model then use
with open("qtable.pickle", "rb") as f:
q_table = pickle.load(f)
Related
I noticed that, unlike in Sci-kit learn, the PySpark implementation for CountVectorizer uses the socket library and so I'm unable to pickle it.
Is there any way around this or another way to persist the vectorizer? I need the vectorized model because I take in input text data that I want to convert into the same kind of word vector as is used in the testing data.
I tried looking at the CountVectorizer source code and I couldn't see any obvious uses of the socket library.
Any ideas are appreciated, thanks!
Here's me trying to pickle the model:
with open("vectorized_model.pkl", "wb") as output_file:
pickle.dump(vectorized_model, output_file)
Resulting in: TypeError: Cannot serialize socket object
Here's the original creation of the model:
from pyspark.ml.feature import CountVectorizer
import dill as pickle
vectorizer = CountVectorizer()
vectorizer.setInputCol("TokenizedText")
vectorizer.setOutputCol("Tfidf")
vectorized_model = vectorizer.fit(training_data)
vectorized_model.setInputCol("TokenizedText")
So I realized, instead of pickling, I can use vectorized_model.save() and CountVectorizerModel.load() to persist and retrieve the model.
I'm fairly new to pytorch and this might be a version issue, but I see torch.load and torch.load_state_dict used, but in both cases the file extension is commonly ".pth"
Models that I have created, I can Save and Load them via torch.Save and torch.Load and call model.eval()
I have another model file that I'm fairly sure is just the state dictionary, as model.eval() fails after a load.
How would I inspect the file and know that one has a full model in it?
Thanks much.
As far as I know there isn't a foolproof way to figure this out. torch.save uses Python's pickle under the hood (ref: Pytorch docs), so users can save arbitrary Python objects. For example, the following code wraps the state dicts in a dictionary:
# example from https://github.com/lucidrains/lightweight-gan/blob/fce20938562a0cc289c915f7317722a8241abd37/lightweight_gan/lightweight_gan.py#L1437
save_data = {
'GAN': self.GAN.state_dict(),
'version': __version__,
'G_scaler': self.G_scaler.state_dict(),
'D_scaler': self.D_scaler.state_dict()
}
torch.save(save_data, self.model_name(num))
If it helps, state dicts themselves are OrderedDict objects. If isinstance(model, collections.OrderedDict) returns True, you can be fairly confident that model is a state dict. (Remember to import collections)
Models themselves are subclasses of torch.nn.Module, so you can check if something is a model by verifying that isinstance(model, torch.nn.Module) returns True.
When i try to load my saved model, i need to import its class. For example:
from module import Net
torch.load('saved_model.pth')
Is there any way to avoid this importing?. For example saving the model with class or something else?
If you want to simply load the model into a known nn.Module object such as net you can use torch.load_state_dict('saved_model.pth'). If you want the entire model saved so someone else can use it you'll have to use to pickle:
import pickle
net = Net()
with open('saved_model.pth', 'w') as filehandler:
pickle.dump(net, filehandler)
to load:
with open('saved_model.pth', 'w') as filehandler:
net = pickle.load(filehandler)
However it is highly recommended not to use pickle as it may save stuff custom to your machine / environment and cause it to not work on someone else's machine. If you really must use pickle it may be worth looking if you can decouple the class from the neural network, save the class in a pickle file and the parameters in a torch file.
Hope this helps and isn't just stuff you know.
I usually get to feature importance using
regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_
where type(regr) is .
However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().
I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...
So ideally I'd have something like
xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_
Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...
Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation
Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:
It doesn't appear to be possible to recreate the classifier from the booster. :(
https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.
Crazy things you can do however:
You can create a classifier object and then over-ride the booster within it:
xgb_classifier = xgb.XGBClassifier(**xgboost_params)
[..]
xgb_classifier._Boster = booster
This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)
You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.
Recommendation
If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.
I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.
I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by
(1) extracting all tuning parameters from the booster model using this:
import json
json.loads(your_booster_model.save_config())
(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.
Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.
I am using google colaboratory to implement deep learning in python3. I create a model, train it, test it. Everything is fine. Finally I try to save the model on my google drive. But is says
Error: Currently 'save' requires model to be a graph network.
Upto training and testing there is no problem.
Then I mount the drive
from google.colab import drive
drive.mount('/content/gdrive')
And then try to save the model for later use as:
model.save('my_model_name.model')
But it is not saving the model. What am I missing?
the preferred way to save a model with tensorflow is to use the tf.train.Saver() module. so lets say your model is simply called Model and you want to save it in a particular directory. This is the preferred way to do it.
import tensorflow as tf
directory_to_save = '/content/drive'
with tf.Session() as sess
saver = tf.train.Saver()
#train model
saver.save(sess, directory_to_save)