How do I get word2vec to load a string? problem:'dict' object has no attribute '_load_specials' - python-3.x

I have a problem when using word2vec and lstm, the code is:
def input_transform(string):
words=jieba.lcut(string)
words=np.array(words).reshape(1,-1)
model=Word2Vec.load('lstm_datamodel.pkl')
combined=create_dictionaries(model,words)
return combined
def lstm_predict(string):
print ('loading model......')
with open('lstm_data.yml', 'r') as f:
yaml_string = yaml.load(f)
model = model_from_yaml(yaml_string)
print ('loading weights......')
model.load_weights('lstm_data.h5')
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=['accuracy'])
data=input_transform(string)
data.reshape(1,-1)
#print data
result=model.predict_classes(data)
if result[0][0]==1:
print (string,' positive')
else:
print (string,' negative')
and the error is:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1312, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "C:\Python36\lib\site-packages\gensim\utils.py", line 423, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/GitHub/reviewsentiment/veclstm.py", line 211, in <module>
lstm_predict(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 191, in lstm_predict
data=input_transform(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 177, in input_transform
model=Word2Vec.load('lstm_datamodel.pkl')
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1323, in load
return load_old_word2vec(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 153, in load_old_word2vec
old_model = Word2Vec.load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 1618, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\old_saveload.py", line 88, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'enter code here
I am sorry for including so much code.
This is my first time to ask on StackOverflow, and I have tried my very best to find the answer on my own, but failed. So can you help me? Thank you very much!

The error is occurring on the line...
model=Word2Vec.load('lstm_datamodel.pkl')
...so all the other/later code you've supplied is irrelevant and superfluous.
The suffix of your filename, lstm_datamodel.pkl, suggests it may have been created via Python's pickle() facility. The gensim Word2Vec.load() method only expects to load models that were saved by the module's own save() routine, not any pickled object.
The gensim native save() does make use of pickle for some of its saving, but not all, and thus wouldn't expect a fully-pickled object in the file provided.
This might be cause of your problem. You could try instead a load based entirely on Python pickle:
model = pickle.load('lstm_datamodel.pkl')
Alternatively, if you can reconstruct the model in the file, but be sure to save it via the native gensim model.save(filename), that might also resolve the problem.

Related

joblib: importing .pkl file with personal classes

I'm using Jupyter to learn and practice machine learning. I created a Pipeline object with many classes from Scikit Learn and custom classes that I wrote. After that I saved this Pipeline object in a file 'classif_pipeline.pkl.z' using
joblib.dump(pipeline, 'classif_pipeline.pkl.z').
The problem is when I try to load this file in a different computer I get the error message bellow.
Code first:
import joblib
full_pipeline = joblib.load('classif_pipeline.pkl.z')
Error message. Also, I have the same version of Scikit Learn and joblib in this pc too.
Traceback (most recent call last):
File "/media/backup/programming/python/jupyter/classification/main.py", line 3, in <module>
full_pipeline = joblib.load('classif_pipeline.pkl.z')
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/usr/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/usr/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute 'DependentsImputer' on <module '__main__' from '/media/backup/programming/python/jupyter/classification/main.py'>
DependentsImputer is one of the many other classes I implemented in the Jupyter notebook.
How can I load this file?

RuntimeError: can't start new thread

My objective was to use Scikit-Optimize library in python to minimize the function value in order to find the optimized parameters for xgboost model. The process involve running the model with different random parameters for 5,000 times.
However, it seems that the loop stopped at some point and gave me a RuntimeError: can't start new thread. I am using ubuntu 20.04 and is running python 3.8.5, Scikit-Optimize version is 0.8.1. I ran the same code in windows 10 and it appears that I do not encounter this RuntimeError, however, the code runs much more slower.
I think I may need a threadpool to solve this issue but after searching through the web and I had no luck on finding a solution to implement the threadpool.
Below is a simplified version of the codes:
#This function will be passed to Scikit-Optimize to find the optimized parameters (Params)
def find_best_xgboost_para(params):`
#Defines the parameters that I want to optimize
learning_rate,gamma,max_depth,min_child_weight,reg_alpha,reg_lambda,subsample,max_bin,num_parallel_tree,colsamp_lev,colsamp_tree,StopSteps\
=float(params[0]),float(params[1]),int(params[2]),int(params[3]),\
int(params[4]),int(params[5]),float(params[6]),int(params[7]),int(params[8]),float(params[9]),float(params[10]),int(params[11])
xgbc=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=colsamp_lev,
colsample_bytree=colsamp_tree, gamma=gamma, learning_rate=learning_rate, max_delta_step=0,
max_depth=max_depth, min_child_weight=min_child_weight, missing=None, n_estimators=nTrees,
objective='binary:logistic',random_state=101, reg_alpha=reg_alpha,
reg_lambda=reg_lambda, scale_pos_weight=1,seed=101,
subsample=subsample,importance_type='gain',gpu_id=GPUID,max_bin=max_bin,
tree_method='gpu_hist',num_parallel_tree=num_parallel_tree,predictor='gpu_predictor',verbosity=0,\
refresh_leaf=0,grow_policy='depthwise',process_type=TreeUpdateStatus,single_precision_histogram=SinglePrecision)
tscv = TimeSeriesSplit(CV_nSplit)
error_data=xgboost.cv(xgbc.get_xgb_params(), CVTrain, num_boost_round=CVBoostRound, nfold=None, stratified=False, folds=tscv, metrics=(), \
obj=None, feval=f1_eval, maximize=False, early_stopping_rounds=StopSteps, fpreproc=None, as_pandas=True, \
verbose_eval=True, show_stdv=True, seed=101, shuffle=shuffle_trig)
eval_set = [(X_train, y_train), (X_test, y_test)]
xgbc.fit(X_train, y_train, eval_metric=f1_eval, early_stopping_rounds=StopSteps, eval_set=eval_set,verbose=True)
xgbc_predictions=xgbc.predict(X_test)
error=(1-metrics.f1_score(y_test, xgbc_predictions,average='macro'))
del xgbc
return error
#Define the range of values that Scikit-Optimize can choose from to find the optimized parameters
lr_low, lr_high=float(XgParamDict['lr_low']), float(XgParamDict['lr_high'])
gama_low, gama_high=float(XgParamDict['gama_low']), float(XgParamDict['gama_high'])
depth_low, depth_high=int(XgParamDict['depth_low']), int(XgParamDict['depth_high'])
child_weight_low, child_weight_high=int(XgParamDict['child_weight_low']), int(XgParamDict['child_weight_high'])
alpha_low,alpha_high=int(XgParamDict['alpha_low']),int(XgParamDict['alpha_high'])
lambda_low,lambda_high=int(XgParamDict['lambda_low']),int(XgParamDict['lambda_high'])
subsamp_low,subsamp_high=float(XgParamDict['subsamp_low']),float(XgParamDict['subsamp_high'])
max_bin_low,max_bin_high=int(XgParamDict['max_bin_low']),int(XgParamDict['max_bin_high'])
num_parallel_tree_low,num_parallel_tree_high=int(XgParamDict['num_parallel_tree_low']),int(XgParamDict['num_parallel_tree_high'])
colsamp_lev_low,colsamp_lev_high=float(XgParamDict['colsamp_lev_low']),float(XgParamDict['colsamp_lev_high'])
colsamp_tree_low,colsamp_tree_high=float(XgParamDict['colsamp_tree_low']),float(XgParamDict['colsamp_tree_high'])
StopSteps_low,StopSteps_high=float(XgParamDict['StopSteps_low']),float(XgParamDict['StopSteps_high'])
#Pass the target function (find_best_xgboost_para) as well as parameter ranges to Scikit-Optimize, 'res' will be an array of values that will need to be pass to another function
res=gbrt_minimize(find_best_xgboost_para,[(lr_low,lr_high),(gama_low, gama_high),(depth_low,depth_high),(child_weight_low,child_weight_high),\
(alpha_low,alpha_high),(lambda_low,lambda_high),(subsamp_low,subsamp_high),(max_bin_low,max_bin_high),\
(num_parallel_tree_low,num_parallel_tree_high),(colsamp_lev_low,colsamp_lev_high),(colsamp_tree_low,colsamp_tree_high),\
(StopSteps_low,StopSteps_high)],random_state=101,n_calls=5000,n_random_starts=1500,verbose=True,n_jobs=-1)
Below is the error message:
Traceback (most recent call last):
File "/home/FactorOpt.py", line 91, in <module>Opt(**FactorOptDict)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/gbrt.py", line 179, in gbrt_minimize return base_minimize(func, dimensions, base_estimator,
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/base.py", line 301, in base_minimize
next_y = func(next_x)
File "/home/anaconda3/lib/python3.8/modelling/FactorOpt.py", line 456, in xgboost_opt
res=gbrt_minimize(find_best_xgboost_para,[(lr_low,lr_high),(gama_low, gama_high),(depth_low,depth_high),(child_weight_low,child_weight_high),\
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/gbrt.py", line 179, in gbrt_minimize
return base_minimize(func, dimensions, base_estimator,
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/base.py", line 302, in base_minimize
result = optimizer.tell(next_x, next_y)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/optimizer.py", line 493, in tell
return self._tell(x, y, fit=fit)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/optimizer.py", line 536, in _tell
est.fit(self.space.transform(self.Xi), self.yi)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/learning/gbrt.py", line 85, in fit
self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')(
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
if self.dispatch_one_batch(iterator):
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
self._dispatch(tasks)
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 784, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 252, in apply_async
return self._get_pool().apply_async(
File "/home/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 407, in _get_pool
self._pool = ThreadPool(self._n_jobs)
File "/home/anaconda3/lib/python3.8/multiprocessing/pool.py", line 925, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/home/anaconda3/lib/python3.8/multiprocessing/pool.py", line 232, in __init__
self._worker_handler.start()
File "/home/anaconda3/lib/python3.8/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

tokenizer has no attribute oov_token

Traceback (most recent call last):
File "dac.py", line 87, in
X_train=load_create_padded_data(X_train=X_train,savetokenizer=False,isPaddingDone=False,maxlen=sequence_length,tokenizer_path='./New_Tokenizer.tkn')
File "/home/dpk/Downloads/DAC/New_Utils.py", line 92, in load_create_padded_data
X_train=tokenizer.texts_to_sequences(X_train)
File "/home/dpk/anaconda2/envs/venv/lib/python2.7/site-packages/keras_preprocessing/text.py", line 278, in texts_to_sequences
return list(self.texts_to_sequences_generator(texts))
File "/home/dpk/anaconda2/envs/venv/lib/python2.7/site-packages/keras_preprocessing/text.py", line 296, in texts_to_sequences_generator
oov_token_index = self.word_index.get(self.oov_token)
AttributeError: 'Tokenizer' object has no attribute 'oov_token'
Probably this one:
You can manually set tokenizer.oov_token = None to fix this.
Pickle is not a reliable way to serialize objects since it assumes
that the underlying Python code/modules you're importing have not
changed. In general, DO NOT use pickled objects with a different
version of the library than what was used at pickling time. That's not
a Keras issue, it's a generic Python/Pickle
https://github.com/keras-team/keras/issues/9099
To fix this I manually set
self.oov_token = None
But not
tokenizer.oov_token = None

Save and load checkpoint pytorch

i make a model and save the configuration as:
def checkpoint(state, ep, filename='./Risultati/checkpoint.pth'):
if ep == (n_epoch-1):
print('Saving state...')
torch.save(state,filename)
checkpoint({'state_dict':rnn.state_dict()},ep)
and then i want load this configuration :
state_dict= torch.load('./Risultati/checkpoint.pth')
rnn.state_dict(state_dict)
when i try, this is the error:
Traceback (most recent call last):
File "train.py", line 288, in <module>
rnn.state_dict(state_dict)
File "/home/marco/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in state_dict
destination._metadata[prefix[:-1]] = dict(version=self._version)
AttributeError: 'dict' object has no attribute '_metadata'
where i do wrong?
thx in advance
You need to load rnn.state_dict() stored in the dictionary you loaded:
rnn.load_state_dict(state_dict['state_dict'])
Look at load_state_dict method for more information.

Keras / Theano exceptions are getting masked

I am using an evolutionary algorithm to find satisfactory hyper-parameters for a CNN written in Keras/Theano. The stochastic nature of this approach means that from time to time a pathological configuration will be tried, which will yield an exception. In those scenarios, I'd like to catch the exception so I can assign an appropriate low fitness. Unfortunately, when Theano throws an exception, it appears to be masked before it reaches my try/catch block. That is, at some point the exception is caught and not re-raised, which means it never propagates up the stack to reach my try/catch block.
I've asked on the Keras Slack workspace if there was some configuration I had to tickle in Keras to un-mask these exceptions, but I was told that the problem was not at the Keras level, that it was something with Theano. And, so here I am.
I have the following configuration settings at the top of the corresponding theanorc file that I had hoped would solve the problem:
[config]
on_opt_error = raise
on_shape_error = raise
numpy.seterr_all = raise
compute_test_value = raise
And, these are the exceptions I am seeing:
ERROR (theano.gof.opt): SeqOptimizer apply <theano.tensor.opt.ShapeOptimizer object at 0x2aaae03674a8>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 235, in apply
sub_prof = optimizer.optimize(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 83, in optimize
self.add_requirements(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1482, in add_requirements
fgraph.attach_feature(ShapeFeature())
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/fg.py", line 541, in attach_feature
attach(self)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1299, in on_attach
self.on_import(fgraph, node, reason='on_attach')
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1362, in on_import
self.set_shape(r, s)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1151, in set_shape
shape_vars.append(self.unpack(s[i], r))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1073, in unpack
raise ValueError(msg)
ValueError: There is a negative shape in the graph!
Backtrace when that variable is created:
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 218, in <module>
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 193, in train_cnn
model = create_cnn(kernel_sizes=kernel_sizes)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 52, in create_cnn
model.add(Conv2D(256, kernel_size=kernel_sizes[3], activation="relu", kernel_initializer="normal"))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/models.py", line 475, in add
output_tensor = layer(self.outputs[0])
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/backend/theano_backend.py", line 1890, in conv2d
filter_dilation=dilation_rate)
And, if you're curious to see the try/catch block, it's just this:
try:
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
except Exception as e:
print(socket.gethostname(), ', Caught exception while training:', str(e) )
My intuition is that this is probably something very, very simple. Maybe I need to add more options to the THEANORC file?
Setting theano.config.compute_test_value = 'raise' appears to work.
Curiously, compute_test_value should have been set from the Theano configuration file, which suggests that it's not being properly read and parsed. I should not have to set this value programmatically when I explicitly set it in the configuration file.

Resources