Keras / Theano exceptions are getting masked - keras

I am using an evolutionary algorithm to find satisfactory hyper-parameters for a CNN written in Keras/Theano. The stochastic nature of this approach means that from time to time a pathological configuration will be tried, which will yield an exception. In those scenarios, I'd like to catch the exception so I can assign an appropriate low fitness. Unfortunately, when Theano throws an exception, it appears to be masked before it reaches my try/catch block. That is, at some point the exception is caught and not re-raised, which means it never propagates up the stack to reach my try/catch block.
I've asked on the Keras Slack workspace if there was some configuration I had to tickle in Keras to un-mask these exceptions, but I was told that the problem was not at the Keras level, that it was something with Theano. And, so here I am.
I have the following configuration settings at the top of the corresponding theanorc file that I had hoped would solve the problem:
[config]
on_opt_error = raise
on_shape_error = raise
numpy.seterr_all = raise
compute_test_value = raise
And, these are the exceptions I am seeing:
ERROR (theano.gof.opt): SeqOptimizer apply <theano.tensor.opt.ShapeOptimizer object at 0x2aaae03674a8>
ERROR (theano.gof.opt): Traceback:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 235, in apply
sub_prof = optimizer.optimize(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/opt.py", line 83, in optimize
self.add_requirements(fgraph)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1482, in add_requirements
fgraph.attach_feature(ShapeFeature())
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/gof/fg.py", line 541, in attach_feature
attach(self)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1299, in on_attach
self.on_import(fgraph, node, reason='on_attach')
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1362, in on_import
self.set_shape(r, s)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1151, in set_shape
shape_vars.append(self.unpack(s[i], r))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/theano/tensor/opt.py", line 1073, in unpack
raise ValueError(msg)
ValueError: There is a negative shape in the graph!
Backtrace when that variable is created:
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 218, in <module>
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 193, in train_cnn
model = create_cnn(kernel_sizes=kernel_sizes)
File "/ccs/proj/geo121/mcoletti/dl-4-settlement-mapping/eadl/train_cnn.py", line 52, in create_cnn
model.add(Conv2D(256, kernel_size=kernel_sizes[3], activation="relu", kernel_initializer="normal"))
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/models.py", line 475, in add
output_tensor = layer(self.outputs[0])
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/ccs/proj/geo121/python3.5-packages/dl4sm/keras/backend/theano_backend.py", line 1890, in conv2d
filter_dilation=dilation_rate)
And, if you're curious to see the try/catch block, it's just this:
try:
validation_accuracy = train_cnn(data_dir=args.data_dir, kernel_sizes=args.kernel_sizes, max_epoch=args.epoch, batch_sizes=args.batch_size)
except Exception as e:
print(socket.gethostname(), ', Caught exception while training:', str(e) )
My intuition is that this is probably something very, very simple. Maybe I need to add more options to the THEANORC file?

Setting theano.config.compute_test_value = 'raise' appears to work.
Curiously, compute_test_value should have been set from the Theano configuration file, which suggests that it's not being properly read and parsed. I should not have to set this value programmatically when I explicitly set it in the configuration file.

Related

GPU out of memory when trying to do augmentations (preprocessing) in the GPU

I am trying to make data augmentations (self.transform, self.transform_prime in the code below) to be done on the GPU to reduce computation time. However, I am running into memory errors.
Below is a snippet of the dataset class code.
Below, sub_img is a NumPy tensor inside the RAM. As the code below shows, I tried to make them into gpu torch tensor, do transformations on them (inside the GPU), then return them.
However, when I made a data loader using the dataset and ran it, CUDA out of memory occurred, even for batch sizes of 2, which is weird, since it worked for batch sizes of 35, when I ran the version of the code that does augmentations inside of the CPU. (Also, the nvidia-smi shows that lots of processes (that take up VRAM) are created, before CUDA out of memory occurs)
def __getitem__(self,idx):
sub_data = self.dataset[idx]
sub_img, sub_label = self.dataset[idx] #해당 idx subject의 img뽑기
if self.split == 'train':
"""below : major revision, so check again (copy 안해도?)"""
y1 = self.transform(from_numpy(sub_img).float().to("cuda:0"))
y2 = self.transform_prime(from_numpy(sub_img).float().to("cuda:0"))
return (y1, y2), sub_label
Could anyone explain to me how I can fix this? The questions are :
why does the CUDA out of memory occur when doing __getitem__ ? If my understanding is correct, __getitem__ is used when batches are generated in dataloader, and therefore should get removed when that specific batch is not used anymore. Shouldn’t this mean that the gpu memory used when I moved the sub_img to CUDA:0 be removed after each batch and hence not take up lots of memory? Why is there a GPU memory error?
How can I fix this? Should I make it so that the self.transform itself gets NumPy arrays, but within self.transform function it converts the arrays to tensors to perform operations in the GPU then return the tensor back to the CPU? Wouldn’t this be inefficient since the tensor has to move back and forth between the CPU and GPU?
I am sorry for my novice questions… thank you for any help and suggestions :)
I have attached the error log below :
Traceback (most recent call last):
File "main_3D.py", line 371, in <module>
main()
File "main_3D.py", line 87, in main
torch.multiprocessing.spawn(main_worker, (args,), args.ngpus_per_node)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/scratch/connectome/dyhan316/VAE_ADHD/barlowtwins/main_3D.py", line 151, in main_worker
for step, ((y1, y2), _) in enumerate(loader, start=epoch * len(loader)):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/connectome/dyhan316/.local/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/connectome/dyhan316/VAE_ADHD/barlowtwins/dataset.py", line 81, in __getitem__
y1 = self.transform(from_numpy(sub_img).float().to("cuda:0"))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

RuntimeError: can't start new thread

My objective was to use Scikit-Optimize library in python to minimize the function value in order to find the optimized parameters for xgboost model. The process involve running the model with different random parameters for 5,000 times.
However, it seems that the loop stopped at some point and gave me a RuntimeError: can't start new thread. I am using ubuntu 20.04 and is running python 3.8.5, Scikit-Optimize version is 0.8.1. I ran the same code in windows 10 and it appears that I do not encounter this RuntimeError, however, the code runs much more slower.
I think I may need a threadpool to solve this issue but after searching through the web and I had no luck on finding a solution to implement the threadpool.
Below is a simplified version of the codes:
#This function will be passed to Scikit-Optimize to find the optimized parameters (Params)
def find_best_xgboost_para(params):`
#Defines the parameters that I want to optimize
learning_rate,gamma,max_depth,min_child_weight,reg_alpha,reg_lambda,subsample,max_bin,num_parallel_tree,colsamp_lev,colsamp_tree,StopSteps\
=float(params[0]),float(params[1]),int(params[2]),int(params[3]),\
int(params[4]),int(params[5]),float(params[6]),int(params[7]),int(params[8]),float(params[9]),float(params[10]),int(params[11])
xgbc=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=colsamp_lev,
colsample_bytree=colsamp_tree, gamma=gamma, learning_rate=learning_rate, max_delta_step=0,
max_depth=max_depth, min_child_weight=min_child_weight, missing=None, n_estimators=nTrees,
objective='binary:logistic',random_state=101, reg_alpha=reg_alpha,
reg_lambda=reg_lambda, scale_pos_weight=1,seed=101,
subsample=subsample,importance_type='gain',gpu_id=GPUID,max_bin=max_bin,
tree_method='gpu_hist',num_parallel_tree=num_parallel_tree,predictor='gpu_predictor',verbosity=0,\
refresh_leaf=0,grow_policy='depthwise',process_type=TreeUpdateStatus,single_precision_histogram=SinglePrecision)
tscv = TimeSeriesSplit(CV_nSplit)
error_data=xgboost.cv(xgbc.get_xgb_params(), CVTrain, num_boost_round=CVBoostRound, nfold=None, stratified=False, folds=tscv, metrics=(), \
obj=None, feval=f1_eval, maximize=False, early_stopping_rounds=StopSteps, fpreproc=None, as_pandas=True, \
verbose_eval=True, show_stdv=True, seed=101, shuffle=shuffle_trig)
eval_set = [(X_train, y_train), (X_test, y_test)]
xgbc.fit(X_train, y_train, eval_metric=f1_eval, early_stopping_rounds=StopSteps, eval_set=eval_set,verbose=True)
xgbc_predictions=xgbc.predict(X_test)
error=(1-metrics.f1_score(y_test, xgbc_predictions,average='macro'))
del xgbc
return error
#Define the range of values that Scikit-Optimize can choose from to find the optimized parameters
lr_low, lr_high=float(XgParamDict['lr_low']), float(XgParamDict['lr_high'])
gama_low, gama_high=float(XgParamDict['gama_low']), float(XgParamDict['gama_high'])
depth_low, depth_high=int(XgParamDict['depth_low']), int(XgParamDict['depth_high'])
child_weight_low, child_weight_high=int(XgParamDict['child_weight_low']), int(XgParamDict['child_weight_high'])
alpha_low,alpha_high=int(XgParamDict['alpha_low']),int(XgParamDict['alpha_high'])
lambda_low,lambda_high=int(XgParamDict['lambda_low']),int(XgParamDict['lambda_high'])
subsamp_low,subsamp_high=float(XgParamDict['subsamp_low']),float(XgParamDict['subsamp_high'])
max_bin_low,max_bin_high=int(XgParamDict['max_bin_low']),int(XgParamDict['max_bin_high'])
num_parallel_tree_low,num_parallel_tree_high=int(XgParamDict['num_parallel_tree_low']),int(XgParamDict['num_parallel_tree_high'])
colsamp_lev_low,colsamp_lev_high=float(XgParamDict['colsamp_lev_low']),float(XgParamDict['colsamp_lev_high'])
colsamp_tree_low,colsamp_tree_high=float(XgParamDict['colsamp_tree_low']),float(XgParamDict['colsamp_tree_high'])
StopSteps_low,StopSteps_high=float(XgParamDict['StopSteps_low']),float(XgParamDict['StopSteps_high'])
#Pass the target function (find_best_xgboost_para) as well as parameter ranges to Scikit-Optimize, 'res' will be an array of values that will need to be pass to another function
res=gbrt_minimize(find_best_xgboost_para,[(lr_low,lr_high),(gama_low, gama_high),(depth_low,depth_high),(child_weight_low,child_weight_high),\
(alpha_low,alpha_high),(lambda_low,lambda_high),(subsamp_low,subsamp_high),(max_bin_low,max_bin_high),\
(num_parallel_tree_low,num_parallel_tree_high),(colsamp_lev_low,colsamp_lev_high),(colsamp_tree_low,colsamp_tree_high),\
(StopSteps_low,StopSteps_high)],random_state=101,n_calls=5000,n_random_starts=1500,verbose=True,n_jobs=-1)
Below is the error message:
Traceback (most recent call last):
File "/home/FactorOpt.py", line 91, in <module>Opt(**FactorOptDict)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/gbrt.py", line 179, in gbrt_minimize return base_minimize(func, dimensions, base_estimator,
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/base.py", line 301, in base_minimize
next_y = func(next_x)
File "/home/anaconda3/lib/python3.8/modelling/FactorOpt.py", line 456, in xgboost_opt
res=gbrt_minimize(find_best_xgboost_para,[(lr_low,lr_high),(gama_low, gama_high),(depth_low,depth_high),(child_weight_low,child_weight_high),\
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/gbrt.py", line 179, in gbrt_minimize
return base_minimize(func, dimensions, base_estimator,
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/base.py", line 302, in base_minimize
result = optimizer.tell(next_x, next_y)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/optimizer.py", line 493, in tell
return self._tell(x, y, fit=fit)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/optimizer/optimizer.py", line 536, in _tell
est.fit(self.space.transform(self.Xi), self.yi)
File "/home/anaconda3/lib/python3.8/site-packages/skopt/learning/gbrt.py", line 85, in fit
self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')(
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
if self.dispatch_one_batch(iterator):
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
self._dispatch(tasks)
File "/home/anaconda3/lib/python3.8/site-packages/joblib/parallel.py", line 784, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 252, in apply_async
return self._get_pool().apply_async(
File "/home/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 407, in _get_pool
self._pool = ThreadPool(self._n_jobs)
File "/home/anaconda3/lib/python3.8/multiprocessing/pool.py", line 925, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/home/anaconda3/lib/python3.8/multiprocessing/pool.py", line 232, in __init__
self._worker_handler.start()
File "/home/anaconda3/lib/python3.8/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

TensorFlow 2.1 using TPUEstimator: RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor

I just converted an existing project from TF 1.14 to TF 2.1 which uses the TPUEstimator API. After making the conversion, testing locally (i.e. use_tpu=False) runs successfully. However, I am getting errors when running on Google Cloud TPU (i.e. use_tpu=True).
Note: This is in the context of the AdaNet AutoML framework (v0.8.0), although I suspect this may be a general TPUEstimator-related error, as the errors appear to originate in the tpu_estimator.py and error_handling.py scripts seen in the Traceback below:
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3032, in train
rendezvous.record_error('training_loop', sys.exc_info())
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 81, in record_error
if value and value.op and value.op.type == _CHECK_NUMERIC_OP_NAME:
AttributeError: 'RuntimeError' object has no attribute 'op'
During handling of the above exception, another exception occurred:
File "workspace/trainer/train.py", line 331, in <module>
main(args=parsed_args)
File "workspace/trainer/train.py", line 177, in main
run_config=run_config)
File "workspace/trainer/train.py", line 68, in run_experiment
estimator.train(input_fn=train_input_fn, max_steps=total_train_steps)
File "/usr/local/lib/python3.6/site-packages/adanet/core/estimator.py", line 853, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 143, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 374, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1164, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1194, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3186, in _model_fn
host_ops = host_call.create_tpu_hostcall()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2226, in create_tpu_hostcall
'dimension, but got scalar {}'.format(dequeue_ops[i][0]))
RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor("OutfeedDequeueTuple:1", shape=(), dtype=int64, device=/job:tpu_worker/task:0/device:CPU:0)'
The previous version of the project using TF 1.14 runs both locally and on TPU using TPUEstimator without issues. Is there something obvious I am potentially missing for the conversion over to TF 2.1 when using TPUEstimator API?
Have you applied the following:
dataset = ...
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
this potentially drops the last few samples from a file to ensure that every batch has a static shape of batch_size, which is required when training on TPUs.

How do I get word2vec to load a string? problem:'dict' object has no attribute '_load_specials'

I have a problem when using word2vec and lstm, the code is:
def input_transform(string):
words=jieba.lcut(string)
words=np.array(words).reshape(1,-1)
model=Word2Vec.load('lstm_datamodel.pkl')
combined=create_dictionaries(model,words)
return combined
def lstm_predict(string):
print ('loading model......')
with open('lstm_data.yml', 'r') as f:
yaml_string = yaml.load(f)
model = model_from_yaml(yaml_string)
print ('loading weights......')
model.load_weights('lstm_data.h5')
model.compile(loss='binary_crossentropy',
optimizer='adam',metrics=['accuracy'])
data=input_transform(string)
data.reshape(1,-1)
#print data
result=model.predict_classes(data)
if result[0][0]==1:
print (string,' positive')
else:
print (string,' negative')
and the error is:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1312, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 1244, in load
model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\base_any2vec.py", line 603, in load
return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
File "C:\Python36\lib\site-packages\gensim\utils.py", line 423, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/GitHub/reviewsentiment/veclstm.py", line 211, in <module>
lstm_predict(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 191, in lstm_predict
data=input_transform(string)
File "C:/GitHub/reviewsentiment/veclstm.py", line 177, in input_transform
model=Word2Vec.load('lstm_datamodel.pkl')
File "C:\Python36\lib\site-packages\gensim\models\word2vec.py", line 1323, in load
return load_old_word2vec(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 153, in load_old_word2vec
old_model = Word2Vec.load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\word2vec.py", line 1618, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Python36\lib\site-packages\gensim\models\deprecated\old_saveload.py", line 88, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'enter code here
I am sorry for including so much code.
This is my first time to ask on StackOverflow, and I have tried my very best to find the answer on my own, but failed. So can you help me? Thank you very much!
The error is occurring on the line...
model=Word2Vec.load('lstm_datamodel.pkl')
...so all the other/later code you've supplied is irrelevant and superfluous.
The suffix of your filename, lstm_datamodel.pkl, suggests it may have been created via Python's pickle() facility. The gensim Word2Vec.load() method only expects to load models that were saved by the module's own save() routine, not any pickled object.
The gensim native save() does make use of pickle for some of its saving, but not all, and thus wouldn't expect a fully-pickled object in the file provided.
This might be cause of your problem. You could try instead a load based entirely on Python pickle:
model = pickle.load('lstm_datamodel.pkl')
Alternatively, if you can reconstruct the model in the file, but be sure to save it via the native gensim model.save(filename), that might also resolve the problem.

TensorFlow reshaping with Conv1D

I have seen problems similar to mine here on Stack Overflow, but not exactly the same. I can reshape when using fully-connected NN layers, but not with Conv1D layers. Here's a minimal example. I'm using TF 1.4.0 on Python 3.6.3.
import tensorflow as tf
# fully connected
fc = tf.placeholder(tf.float32, [None,12])
fc = tf.contrib.layers.fully_connected(fc, 12)
fc = tf.contrib.layers.fully_connected(fc, 6)
fc = tf.reshape(fc, [-1,3,2])
# convolutional
con = tf.placeholder(tf.float32, [None,50,4])
con = tf.layers.Conv1D(con, 12, 3, activation=tf.nn.relu)
con = tf.layers.Conv1D(con, 6, 3, activation=tf.nn.relu)
con = tf.reshape(con, [-1,50,3,2])
Here is the output (yes, I'm aware of the RuntimeWarning. The messages I have found which discuss it suggest that it's harmless, but if you know otherwise, please share!):
/usr/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 468, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 468, in <listcomp>
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got <tensorflow.python.layers.convolutional.Conv1D object at 0x7fa67e0d1a20>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "minimal reshape example.py", line 16, in <module>
con = tf.reshape(con, [-1,width,3,2])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3938, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
raise err
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py", line 472, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'tensorflow.python.layers.convolutional.Conv1D'> to Tensor. Contents: <tensorflow.python.layers.convolutional.Conv1D object at 0x7fa67e0d1a20>. Consider casting elements to a supported type.
My code fails at con = tf.reshape(con, [-1,50,3,2]). Yet the pattern is nearly identical to the pattern that I use for the fully-connected graph, fc.
I made nets very similar to these work in the higher-level API for TensorFlow called TFLearn. However, TFLearn's DNN Estimator object is having trouble managing a tf.Session correctly. After over a month, I have yet to resolve the issue with TFLearn's developers on GitHub.
I don't mind using TensorFlow's native Estimator, but I have to solve this reshape problem to achieve it.
Well, I found the error: tf.layers.Conv1D != tf.layers.conv1d. Changing the former to the latter eliminated the error. Let the TensorFlow / Python user beware!
Even though TensorFlow seems to avoid Python's object model (which is probably necessary, given the possibility of distributed, low-level computation), there are in fact a few genuine classes in the Python API. The class constructors can accept many (all?) of the same arguments as the similarly-named convenience functions.

Resources