joblib: importing .pkl file with personal classes - python-3.x

I'm using Jupyter to learn and practice machine learning. I created a Pipeline object with many classes from Scikit Learn and custom classes that I wrote. After that I saved this Pipeline object in a file 'classif_pipeline.pkl.z' using
joblib.dump(pipeline, 'classif_pipeline.pkl.z').
The problem is when I try to load this file in a different computer I get the error message bellow.
Code first:
import joblib
full_pipeline = joblib.load('classif_pipeline.pkl.z')
Error message. Also, I have the same version of Scikit Learn and joblib in this pc too.
Traceback (most recent call last):
File "/media/backup/programming/python/jupyter/classification/main.py", line 3, in <module>
full_pipeline = joblib.load('classif_pipeline.pkl.z')
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/usr/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/usr/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute 'DependentsImputer' on <module '__main__' from '/media/backup/programming/python/jupyter/classification/main.py'>
DependentsImputer is one of the many other classes I implemented in the Jupyter notebook.
How can I load this file?

Related

Unable to use Graphviz library in Python

Here is the code I try to execute:
from graphviz import Graph
# Instantiate a new Graph object
dot = Graph('Data Science Process', format='png')
# Add nodes
dot.node('A', 'Get Data')
dot.node('B', 'Clean, Prepare, & Manipulate Data')
dot.node('C', 'Train Model')
dot.node('D', 'Test Data')
dot.node('E', 'Improve')
# Connect these nodes
dot.edges(['AB', 'BC', 'CD', 'DE'])
# Save chart
#dot.render('data_science_flowchart', view=True)
The render function won't work, and I have not idea what is wrong. If commented out, the code works, but of course produces nothing. My goal is to make a visualization of the graph (PNG image, or PDF file). Just trying to plot a rudimentary flowchart in Python, I am open to using other libraries than graphviz: I am new to this, and tried graphviz after reading recommendations, tested a dozen scripts posted online, but none of them work, always resulting in the same error.
Here is the error:
$ py graph2.py
py graph2.py
Traceback (most recent call last):
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\backend\execute.py", line 81, in run_check
proc = subprocess.run(cmd, **kwargs)
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 501, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 966, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1435, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\vince\graph2.py", line 17, in <module>
dot.render('data_science_flowchart', view=True)
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\_tools.py", line 171, in wrapper
return func(*args, **kwargs)
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\rendering.py", line 122, in render
rendered = self._render(*args, **kwargs)
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\_tools.py", line 171, in wrapper
return func(*args, **kwargs)
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\backend\rendering.py", line 324, in render
execute.run_check(cmd,
File "C:\Users\vince\AppData\Local\Programs\Python\Python310\lib\site-packages\graphviz\backend\execute.py", line 84, in run_check
raise ExecutableNotFound(cmd) from e
graphviz.backend.execute.ExecutableNotFound: failed to execute WindowsPath('dot'), make sure the Graphviz executables are on your systems' PATH

AttributeError: module 'dataclasses' has no attribute 'is_dataclass'

I am trying to create a dynamic model using pydantic but it seems it can't get even the basic example:
from pydantic import BaseModel, create_model
MyModel = create_model('MyModel', foo="foo")
The error is:
MyModel = create_model('MyModel', foo="foo")
File "pydantic/main.py", line 972, in pydantic.main.create_model
File "pydantic/main.py", line 228, in pydantic.main.ModelMetaclass.__new__
File "pydantic/fields.py", line 488, in pydantic.fields.ModelField.infer
File "pydantic/fields.py", line 419, in pydantic.fields.ModelField.__init__
File "pydantic/fields.py", line 539, in pydantic.fields.ModelField.prepare
File "pydantic/fields.py", line 801, in pydantic.fields.ModelField.populate_validators
File "pydantic/validators.py", line 682, in find_validators
File "pydantic/dataclasses.py", line 82, in pydantic.dataclasses.is_builtin_dataclass
AttributeError: module 'dataclasses' has no attribute 'is_dataclass'
Since I am using poetry, the command I used is:
poetry run python main.py
Anyone has any idea why it gets such an error?
THe reason is because I have another file dataclasses.py in the same folder as main.py and poetry run python main.py used my local datacalsses.py automatically and thus override the system dataclasses somehow. A good lesson here is do not use class names as your local module name.

Issues tokenizing text

Started text analysing, and eventually ran into a need for downloading Corpora in using PyCharm2019 as IDE. Not really sure what traceback message wants me to do, since I used PyCharm's own lib import interface to enable Corpora already. Why does an error stating that Corpora is not available to the code keep reappearing?
Imported TextBlob, tried to do a line like: from textblob import TextBlob...view code below
from textblob import TextBlob
TextBlob(train['tweet'][1]).words
print("\nPRINT TOKENIZATION") # own instruction to allow for knowing what code result delivers
print(TextBlob(train['tweet'][1]).words)
….
Tried to install via nltk, no luck...error when downloading 'brown.tei'
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\jcst\AppData\Local\Programs\Python\Python37-32\lib\tkinter__init__.py", line 1705, in call
return self.func(*args)
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\downloader.py", line 1796, in _download
return self._download_threaded(*e)
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\downloader.py", line 2082, in _download_threaded
assert self._download_msg_queue == []
AssertionError
Traceback (most recent call last):
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 35, in decorated
return func(*args, **kwargs)
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\tokenizers.py", line 57, in tokenize
return nltk.tokenize.sent_tokenize(text)
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\tokenize__init__.py", line 104, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 870, in load
opened_resource = _open(resource_url)
Resource File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 995, in open
punkt not found.
Please use the NLTK Downloader to obtain the resource:
return find(path, path + ['']).open()
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 701, in find
import nltk
nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in:
- 'C:\Users\jcst/nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\share\nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\lib\nltk_data'
- 'C:\Users\jcst\AppData\Roaming\nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- ''
raise LookupError(resource_not_found)
LookupError:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in:
- 'C:\Users\jcst/nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\share\nltk_data'
- 'C:\Users\jcst\PycharmProjects\TextMining\venv\lib\nltk_data'
- 'C:\Users\jcst\AppData\Roaming\nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/jcst/PycharmProjects/TextMining/ModuleImportAndTrainFileIntro.py", line 151, in
TextBlob(train['tweet'][1]).words
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 24, in get
value = obj.dict[self.func.name] = self.func(obj)
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\blob.py", line 649, in words
return WordList(word_tokenize(self.raw, include_punc=False))
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\tokenizers.py", line 73, in word_tokenize
for sentence in sent_tokenize(text))
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\base.py", line 64, in itokenize
return (t for t in self.tokenize(text, *args, **kwargs))
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 38, in decorated
raise MissingCorpusError()
textblob.exceptions.MissingCorpusError:
Looks like you are missing some required data for this feature.
To download the necessary data, simply run
python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: http://nltk.org/data.html
If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.

How do I get word2vec to load a string? problem:'dict' object has no attribute '_load_specials'

I have a problem when using word2vec and lstm, the code is:
def input_transform(string):
words=jieba.lcut(string)
words=np.array(words).reshape(1,-1)
model=Word2Vec.load('lstm_datamodel.pkl')
combined=create_dictionaries(model,words)
return combined
def lstm_predict(string):
print ('loading model......')
with open('lstm_data.yml', 'r') as f:
yaml_string = yaml.load(f)
model = model_from_yaml(yaml_string)
print ('loading weights......')
model.load_weights('lstm_data.h5')
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=['accuracy'])
data=input_transform(string)
data.reshape(1,-1)
#print data
result=model.predict_classes(data)
if result[0][0]==1:
print (string,' positive')
else:
print (string,' negative')
and the error is:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1312, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "C:\Python36\lib\site-packages\gensim\utils.py", line 423, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/GitHub/reviewsentiment/veclstm.py", line 211, in <module>
lstm_predict(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 191, in lstm_predict
data=input_transform(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 177, in input_transform
model=Word2Vec.load('lstm_datamodel.pkl')
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1323, in load
return load_old_word2vec(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 153, in load_old_word2vec
old_model = Word2Vec.load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 1618, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\old_saveload.py", line 88, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'enter code here
I am sorry for including so much code.
This is my first time to ask on StackOverflow, and I have tried my very best to find the answer on my own, but failed. So can you help me? Thank you very much!
The error is occurring on the line...
model=Word2Vec.load('lstm_datamodel.pkl')
...so all the other/later code you've supplied is irrelevant and superfluous.
The suffix of your filename, lstm_datamodel.pkl, suggests it may have been created via Python's pickle() facility. The gensim Word2Vec.load() method only expects to load models that were saved by the module's own save() routine, not any pickled object.
The gensim native save() does make use of pickle for some of its saving, but not all, and thus wouldn't expect a fully-pickled object in the file provided.
This might be cause of your problem. You could try instead a load based entirely on Python pickle:
model = pickle.load('lstm_datamodel.pkl')
Alternatively, if you can reconstruct the model in the file, but be sure to save it via the native gensim model.save(filename), that might also resolve the problem.

PyYAML Error: TypeError: can't pickle _thread.RLock objects

I'm trying to dump what perhaps a somewhat complex Class with YAML and am seeing the following error. I don't know what pickle does, but I'm not engaged in any multithread programming to my knowledge. This happens while running a pyunit unit test:
Any idea how to find the offending attribute?
ERROR: test_multi_level_needs (test_needs.needs_TestCase)
-----------------------------------------------------------
Traceback (most recent call last):
File "/Users/rsalemi/.../test_needs.py", line 240, in test_multi_level_needs
print(yaml.dump(test2_comp))
File ".../.../yaml/__init__.py", line 200, in dump
<snipped lots of stack trace>
File ".../.../yaml/representer.py", line 91, in represent_sequence
node_item = self.represent_data(item)
File ".../.../yaml/representer.py", line 51, in represent_data
node = self.yaml_multi_representers[data_type](self, data)
File ".../.../yaml/representer.py", line 341, in represent_object
'tag:yaml.org,2002:python/object:'+function_name, state)
File ".../.../yaml/representer.py", line 116, in represent_mapping
node_value = self.represent_data(item_value)
File ".../.../yaml/representer.py", line 51, in represent_data
node = self.yaml_multi_representers[data_type](self, data)
File ".../.../yaml/representer.py", line 315, in represent_object
reduce = data.__reduce_ex__(2)
TypeError: can't pickle _thread.RLock objects

Resources