Pickle fit-object - object

I wrote a class where some data are fitted. Since the fitting takes very long when lots of data have to be fitted, I want to save the fit-object of this class so I do not have to repeat the fitting when I want to use the fitted data later. Using pickle, I get the following error calling the save method on an object:
AttributeError: Can't pickle local object 'ConstantModel.__init__.<locals>.constant'
I only have this problem when pickle the fitted data, pickle works if I save the object before fitting.
Is there a way to pickle fitted data or is there a nice workaround?
class pattern:
def fitting(self):
mod_total = lmfit.models.ConstantModel()
pars_total = mod_total.guess(self.y, x=self.x)
self.fit = mod_total.fit(self.y, pars_total, x=self.x)
def save(self, path):
with open(path, 'wb') as filehandler:
pickle.dump(self, filehandler)

I found a solution for this problem: Using dill instead of pickle works (as I want it to do).

Related

Gensim: Not able to load the id2word file

I am working on topic inference on a new corpus given a previously derived lda model. I am able to load the model perfectly, while I am not able to load the id2word file to create the corpora.Dictionary object needed to map the new corpus into numbers: the load method returns a dict attribute error that I don't know why. Below is the minimal code that replicates the situation, and I have attached the code (and packages used) here.
Thank you in advance for your response...
import numpy as np
import os
import pandas as pd
import gensim
from gensim import corpora
import datetime
import nltk
model_name = "lda_sub_full_35"
dictionary_name = "lda_sub_full_35.id2word"
model_for_inference = gensim.models.LdaModel.load(model_name, mmap='r')
print('Successfully load the model')
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
I expect to have both the dictionary and the model loaded, but it turns out that when I load the dictionary, I got the below error:
File "topic_inference.py", line 31, in <module>
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
File "/topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'```
How were the contents of the lda_sub_full_35.id2word file originally saved?
Only if it was saved by a Gensim corpora.Dictionary object's .save() method should it be loaded as you've tried, with corpora.Dictionary.load().
If, by any chance, it was just a plain Python dict saved via some other method of writing a pickle()-created object, then you would need to load it in a symmetrically-matched way. That might be as simple as:
import pickle
with open(path, 'rb') as f:
lda_dictionary = pickle.load(f)

How to create wordcloud from LDA model?

Following the documentation of ?gensim.models.ldamodel, I want to train an ldamodel and (from this SO answer create a worcloud from it). I am using the following code from both sources:
from gensim.test.utils import common_texts
from gensim.corpora.dictionary import Dictionary
import gensim
import matplotlib.pyplot as plt
from wordcloud import WordCloud
common_dictionary = Dictionary(common_texts) # create corpus
common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]
lda = gensim.models.LdaModel(common_corpus, num_topics=10) # train model on corpus
for t in range(lda.num_topics):
plt.figure()
plt.imshow(WordCloud().fit_words(lda.show_topic(t, 200)))
plt.axis("off")
plt.title("Topic #" + str(t))
plt.show()
However, I get an AttributeError: 'list' object has no attribute 'items' on the line plt.imshow(...)
Can someone help me out here? (Answers to similar questions have not been working for me and I am trying to compile a minimal pipeline with this.)
From the docs, the method WordCloud.fit_words() expects a dictionary as input.
Your error seems to highlight that it's looking for an attribute 'items', typically an attribute of dictionaries, but instead finds a list object.
So the problem is: lda.show_topic(t, 200) returns a list instead of a dictionary. Use dict() to cast it!
Finally:
plt.imshow(WordCloud().fit_words(dict(lda.show_topic(t, 200))))

Function not working, Syntax errors and more

The other day I was working on a project for an Image captioning model on Keras. But when I am running it, I am facing a host of error. Note that I am using Atom Editor and a virtual environment in Python, Running everything from a Command-line.
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
In this line, I am receiving this error==>
File "C:\Users\neelg\Documents\Atom_projects\Main\Img_cap.py", line 143
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
^
SyntaxError: invalid syntax
I think that the syntax is correct regarding the function, yet the error persists. So, in a seperate file I copied the function and tried to isolate problem.
Code for the standalone function:-
from pickle import load
import os
def load_photo_features(filename, dataset):
all_features = load(open(filename, 'rb'))
features = {k: all_features[k] for k in dataset}
return features
filename = 'C:/Users/neelg/Documents/Atom_projects/Main/Flickr8k_text/Flickr8k.trainImages.txt'
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
Now, A different type of problem crops up:
Traceback (most recent call last):
File "C:\Users\neelg\Documents\Atom_projects\Main\testing.py", line 10, in <module>
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
TypeError: 'module' object is not callable
Any help? I am trying to import the Flickr_8k dataset, which contains random pictures and another small dataset which are the labels of those photographs...
P.S=>Pls send suggestions after testing the code on tour own editors before submitting because I suspect there is some core problem arising due to the System encoding(As suggested by some others). Also, it is not possible to load the whole code due to it's length and requirement of multiple files.
This error comes from the fact that you're calling os.path which is a module not a function. Just remove it, you don't need it in this use-case, a string is enough for filename in open
I was about to ask you the same question with #ted why do you use os.path when you are trying to load the file.
Normally, I am using the following code for loading from pickle:
def load_obj(filename):
with open(filename, "rb") as fp:
return pickle.load(fp, enconding = 'bytes')
Furthermore, if I try something like that it works:
from pickle import load
import os
import pdb
def load_photo_features(filename):
all_features = load(open(filename, 'rb'))
pdb.set_trace()
#features = {k: all_features[k] for k in dataset}
#return features
train_features = load_photo_features('train.pkl')
I do not know what is the dataset input to proceed, but loading of the pickle file works fine.

Pytorch DataLoader multiple data source

I am trying to use Pytorch dataloader to define my own dataset, but I am not sure how to load multiple data source:
My current code:
class MultipleSourceDataSet(Dataset):
def __init__ (self, json_file, root_dir, transform = None):
with open(root_dir + 'block0.json') as f:
self.result = torch.Tensor(json.load(f))
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.result[0])
def __getitem__ (self):
None
The data source is 50 blocks under root_dir = ~/Documents/blocks/
I split them and avoid to combine them directly before since this is a very big dataset.
How can I load them into a single dataloader?
For DataLoader you need to have a single Dataset, your problem is that you have multiple 'json' files and you only know how to create a Dataset from each 'json' separately.
What you can do in this case is to use ConcatDataset that contains all the single-'json' datasets you create:
import os
import torch.utils.data as data
class SingeJsonDataset(data.Dataset):
# implement a single json dataset here...
list_of_datasets = []
for j in os.path.listdir(root_dir):
if not j.endswith('.json'):
continue # skip non-json files
list_of_datasets.append(SingeJsonDataset(json_file=j, root_dir=root_dir, transform=None))
# once all single json datasets are created you can concat them into a single one:
multiple_json_dataset = data.ConcatDataset(list_of_datasets)
Now you can feed the concatenated dataset into data.DataLoader.
I should revise my question as 2 different sub-questions:
How to deal with large datasets in PyTorch to avoid memory error
If I am separating large a dataset into small chunks, how can I load multiple mini-datasets
For question 1:
PyTorch DataLoader can prevent this issue by creating mini-batches. Here you can find further explanations.
For question 2:
Please refer to Shai's answer above.

LightGBM: loading from json

I am trying to load a LightGBM.Booster from a JSON file pointer, and can't find an example online.
import json ,lightgbm
import numpy as np
X_train = np.arange(0, 200).reshape((100, 2))
y_train = np.tile([0, 1], 50)
tr_dataset = lightgbm.Dataset(X_train, label=y_train)
booster = lightgbm.train({}, train_set=tr_dataset)
model_json = booster.dump_model()
with open('model.json', 'w+') as f:
json.dump(model_json, f, indent=4)
with open('model.json') as f2:
model_json = json.load(f2)
How can I create a lightGBM booster from f2 or model_json? This snippet only shows dumping to JSON. model_from_string might help but seems to require an instance of the booster, which I won't have before loading.
There's no such method for creation of Booster directly from json. No such method in the source code or documentation, also there's no github issue.
Because of it, I just load models from a text file via
gbm.save_model('model.txt') # gbm is trained Booster instance
# ...
bst = lgb.Booster(model_file='model.txt')
or use pickle to dump and load models:
import pickle
pickle.dump(gbm, open('model.pkl', 'wb'))
# ...
gbm = pickle.load(open('model.pkl', 'rb'))
Unforunately, pickle files are unreadable (or, at least, this files are not so clear). But it's better than nothing.

Resources