tokenizer has no attribute oov_token - keras

Traceback (most recent call last):
File "dac.py", line 87, in
X_train=load_create_padded_data(X_train=X_train,savetokenizer=False,isPaddingDone=False,maxlen=sequence_length,tokenizer_path='./New_Tokenizer.tkn')
File "/home/dpk/Downloads/DAC/New_Utils.py", line 92, in load_create_padded_data
X_train=tokenizer.texts_to_sequences(X_train)
File "/home/dpk/anaconda2/envs/venv/lib/python2.7/site-packages/keras_preprocessing/text.py", line 278, in texts_to_sequences
return list(self.texts_to_sequences_generator(texts))
File "/home/dpk/anaconda2/envs/venv/lib/python2.7/site-packages/keras_preprocessing/text.py", line 296, in texts_to_sequences_generator
oov_token_index = self.word_index.get(self.oov_token)
AttributeError: 'Tokenizer' object has no attribute 'oov_token'

Probably this one:
You can manually set tokenizer.oov_token = None to fix this.
Pickle is not a reliable way to serialize objects since it assumes
that the underlying Python code/modules you're importing have not
changed. In general, DO NOT use pickled objects with a different
version of the library than what was used at pickling time. That's not
a Keras issue, it's a generic Python/Pickle
https://github.com/keras-team/keras/issues/9099

To fix this I manually set
self.oov_token = None
But not
tokenizer.oov_token = None

Related

joblib: importing .pkl file with personal classes

I'm using Jupyter to learn and practice machine learning. I created a Pipeline object with many classes from Scikit Learn and custom classes that I wrote. After that I saved this Pipeline object in a file 'classif_pipeline.pkl.z' using
joblib.dump(pipeline, 'classif_pipeline.pkl.z').
The problem is when I try to load this file in a different computer I get the error message bellow.
Code first:
import joblib
full_pipeline = joblib.load('classif_pipeline.pkl.z')
Error message. Also, I have the same version of Scikit Learn and joblib in this pc too.
Traceback (most recent call last):
File "/media/backup/programming/python/jupyter/classification/main.py", line 3, in <module>
full_pipeline = joblib.load('classif_pipeline.pkl.z')
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/usr/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/usr/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute 'DependentsImputer' on <module '__main__' from '/media/backup/programming/python/jupyter/classification/main.py'>
DependentsImputer is one of the many other classes I implemented in the Jupyter notebook.
How can I load this file?

Save and load checkpoint pytorch

i make a model and save the configuration as:
def checkpoint(state, ep, filename='./Risultati/checkpoint.pth'):
if ep == (n_epoch-1):
print('Saving state...')
torch.save(state,filename)
checkpoint({'state_dict':rnn.state_dict()},ep)
and then i want load this configuration :
state_dict= torch.load('./Risultati/checkpoint.pth')
rnn.state_dict(state_dict)
when i try, this is the error:
Traceback (most recent call last):
File "train.py", line 288, in <module>
rnn.state_dict(state_dict)
File "/home/marco/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in state_dict
destination._metadata[prefix[:-1]] = dict(version=self._version)
AttributeError: 'dict' object has no attribute '_metadata'
where i do wrong?
thx in advance
You need to load rnn.state_dict() stored in the dictionary you loaded:
rnn.load_state_dict(state_dict['state_dict'])
Look at load_state_dict method for more information.

Does the object.attribute syntax in python count as a name?

I am wondering if something counts as a name if it is expressed as an object.attribute syntax. The motivation comes from trying to understand this code from Learning Python:
def makeopen(id):
original = builtins.open
def custom(*pargs, **kargs):
print('Custom open call %r' %id, pargs, kargs)
return original(*pargs,*kargs)
builtins.open(custom)
I wanted to map out each name/variable to the scope that they exist in. I am unsure what to do with builtins.open. Is builtins.open a name? In the book the author does state that object.attribute lookup follows completely different rules to plain looksups, which would mean to me that builtins.open is not a name at all, since the execution model docs say that scopes define where names are visible. Since object.attribute syntax is visible in any scope, it doesn't fit into this classification and is not a name.
However the conceptual problem I have is then defining what builtins.open is? It is still a reference to an object, and can be rebound to any other object. In that sense it is a name, even although it doesn't follow scope rules?
Thank you.
builtins.open is just another way to access the global open function:
import builtins
print(open)
# <built-in function open>
print(builtins.open)
# <built-in function open>
print(open == builtins.open)
# True
From the docs:
This module provides direct access to all ‘built-in’ identifiers of
Python; for example, builtins.open is the full name for the built-in
function open()
Regarding the second part of your question, I'm not sure what you mean. (Almost) every "name" in Python can be reassigned to something completely different.
>>> list
<class 'list'>
>>> list = 1
>>> list
1
However, everything under builtins is protected, otherwise some nasty weird behavior was bound to happen in case someone(thing) reassigned its attributes during runtime.
>>> import builtins
>>> builtins.list = 1
Traceback (most recent call last):
File "C:\Program Files\PyCharm 2018.2.1\helpers\pydev\_pydev_comm\server.py", line 34, in handle
self.processor.process(iprot, oprot)
File "C:\Program Files\PyCharm 2018.2.1\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 266, in process
self.handle_exception(e, result)
File "C:\Program Files\PyCharm 2018.2.1\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 254, in handle_exception
raise e
File "C:\Program Files\PyCharm 2018.2.1\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 263, in process
result.success = call()
File "C:\Program Files\PyCharm 2018.2.1\helpers\third_party\thriftpy\_shaded_thriftpy\thrift.py", line 228, in call
return f(*(args.__dict__[k] for k in api_args))
File "C:\Program Files\PyCharm 2018.2.1\helpers\pydev\_pydev_bundle\pydev_console_utils.py", line 217, in getFrame
return pydevd_thrift.frame_vars_to_struct(self.get_namespace(), hidden_ns)
File "C:\Program Files\PyCharm 2018.2.1\helpers\pydev\_pydevd_bundle\pydevd_thrift.py", line 239, in frame_vars_to_struct
keys = dict_keys(frame_f_locals)
File "C:\Program Files\PyCharm 2018.2.1\helpers\pydev\_pydevd_bundle\pydevd_constants.py", line 173, in dict_keys
return list(d.keys())
TypeError: 'int' object is not callable

How do I get word2vec to load a string? problem:'dict' object has no attribute '_load_specials'

I have a problem when using word2vec and lstm, the code is:
def input_transform(string):
words=jieba.lcut(string)
words=np.array(words).reshape(1,-1)
model=Word2Vec.load('lstm_datamodel.pkl')
combined=create_dictionaries(model,words)
return combined
def lstm_predict(string):
print ('loading model......')
with open('lstm_data.yml', 'r') as f:
yaml_string = yaml.load(f)
model = model_from_yaml(yaml_string)
print ('loading weights......')
model.load_weights('lstm_data.h5')
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=['accuracy'])
data=input_transform(string)
data.reshape(1,-1)
#print data
result=model.predict_classes(data)
if result[0][0]==1:
print (string,' positive')
else:
print (string,' negative')
and the error is:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1312, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "C:\Python36\lib\site-packages\gensim\utils.py", line 423, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/GitHub/reviewsentiment/veclstm.py", line 211, in <module>
lstm_predict(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 191, in lstm_predict
data=input_transform(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 177, in input_transform
model=Word2Vec.load('lstm_datamodel.pkl')
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1323, in load
return load_old_word2vec(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 153, in load_old_word2vec
old_model = Word2Vec.load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 1618, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\old_saveload.py", line 88, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'enter code here
I am sorry for including so much code.
This is my first time to ask on StackOverflow, and I have tried my very best to find the answer on my own, but failed. So can you help me? Thank you very much!
The error is occurring on the line...
model=Word2Vec.load('lstm_datamodel.pkl')
...so all the other/later code you've supplied is irrelevant and superfluous.
The suffix of your filename, lstm_datamodel.pkl, suggests it may have been created via Python's pickle() facility. The gensim Word2Vec.load() method only expects to load models that were saved by the module's own save() routine, not any pickled object.
The gensim native save() does make use of pickle for some of its saving, but not all, and thus wouldn't expect a fully-pickled object in the file provided.
This might be cause of your problem. You could try instead a load based entirely on Python pickle:
model = pickle.load('lstm_datamodel.pkl')
Alternatively, if you can reconstruct the model in the file, but be sure to save it via the native gensim model.save(filename), that might also resolve the problem.

TypeError: Can't convert 'bytes' object to str implicitly for tweepy

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey=''
csecret=''
atoken=''
asecret=''
class listener(StreamListener):
def on_data(self,data):
print(data)
return True
def on_error(self,status):
print(status)
auth = OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track="cricket")
This code filter the twitter stream based on the filter. But I am getting following traceback after running the code. Can somebody please help
Traceback (most recent call last):
File "lab.py", line 23, in <module>
twitterStream.filter(track="car".strip())
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 430, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 298, in _read_loop
line = buf.read_line().strip()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 171, in read_line
self._buffer += self._stream.read(self._chunk_size)
TypeError: Can't convert 'bytes' object to str implicitly
Im assuming you're using tweepy 3.4.0. The issue you've raised is 'open' on github (https://github.com/tweepy/tweepy/issues/615).
Two work-arounds :
1)
In streaming.py:
I changed line 161 to
self._buffer += self._stream.read(read_len).decode('UTF-8', 'ignore')
and line 171 to
self._buffer += self._stream.read(self._chunk_size).decode('UTF-8', 'ignore')
and then reinstalled via python3 setup.py install on my local copy of tweepy.
2)
remove the tweepy 3.4.0 module, and install 3.3.0 using command: pip install -I tweepy==3.3.0
Hope that helps,
-A
You can't do twitterStream.filter(track="car".strip()). Why are you adding the strip() it's serving no purpose in there.
track must be a str type before you invoke a connection to Twitter's Streaming API and tweepy is preventing that connection because you're trying to add strip()
If for some reason you need it, you can do track_word='car'.strip() then track=track_word, that's even unnecessary because:
>>> print('car'.strip())
car
Also, the error you're getting does not match the code you have listed, the code that's in your question should work fine.

Resources