Error while loading MLFlow model in python 3.7 - python-3.x

I am saving the MLFlow model using databricks. Below are the details:
artifact_path: model
databricks_runtime: 8.4.x-gpu-ml-scala2.12
flavors:
python_function:
data: data
env: conda.yaml
loader_module: mlflow.pytorch
pickle_module_name: mlflow.pytorch.pickle_module
python_version: 3.8.8
pytorch:
model_data: data
pytorch_version: 1.9.0+cu102
I am not able to locally load the model using Python 3.7, whereas it works well with Python 3.9.
Any idea what could be the possible resolution to this?
Error Trace:
File "/Users/danishm/opt/miniconda3/envs/py37/lib/python3.7/site-packages/mlflow/pytorch/__init__.py", line 714, in load_model
return _load_model(path=torch_model_artifacts_path, **kwargs)
File "/Users/danishm/opt/miniconda3/envs/py37/lib/python3.7/site-packages/mlflow/pytorch/__init__.py", line 626, in _load_model
return torch.load(model_path, **kwargs)
File "/Users/danishm/opt/miniconda3/envs/py37/lib/python3.7/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/Users/danishm/opt/miniconda3/envs/py37/lib/python3.7/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
TypeError: code() takes at most 15 arguments (16 given)
I have tried specifying the pickle module name explicitly as mlflow.pytorch.load_model(ML_MODEL_PATH,pickle_module=mlflow.pytorch.pickle_module)
But still getting the same error.

Related

joblib: importing .pkl file with personal classes

I'm using Jupyter to learn and practice machine learning. I created a Pipeline object with many classes from Scikit Learn and custom classes that I wrote. After that I saved this Pipeline object in a file 'classif_pipeline.pkl.z' using
joblib.dump(pipeline, 'classif_pipeline.pkl.z').
The problem is when I try to load this file in a different computer I get the error message bellow.
Code first:
import joblib
full_pipeline = joblib.load('classif_pipeline.pkl.z')
Error message. Also, I have the same version of Scikit Learn and joblib in this pc too.
Traceback (most recent call last):
File "/media/backup/programming/python/jupyter/classification/main.py", line 3, in <module>
full_pipeline = joblib.load('classif_pipeline.pkl.z')
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 658, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/guilherme/.local/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/usr/lib/python3.10/pickle.py", line 1538, in load_stack_global
self.append(self.find_class(module, name))
File "/usr/lib/python3.10/pickle.py", line 1582, in find_class
return _getattribute(sys.modules[module], name)[0]
File "/usr/lib/python3.10/pickle.py", line 331, in _getattribute
raise AttributeError("Can't get attribute {!r} on {!r}"
AttributeError: Can't get attribute 'DependentsImputer' on <module '__main__' from '/media/backup/programming/python/jupyter/classification/main.py'>
DependentsImputer is one of the many other classes I implemented in the Jupyter notebook.
How can I load this file?

TensorFlow 2.1 using TPUEstimator: RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor

I just converted an existing project from TF 1.14 to TF 2.1 which uses the TPUEstimator API. After making the conversion, testing locally (i.e. use_tpu=False) runs successfully. However, I am getting errors when running on Google Cloud TPU (i.e. use_tpu=True).
Note: This is in the context of the AdaNet AutoML framework (v0.8.0), although I suspect this may be a general TPUEstimator-related error, as the errors appear to originate in the tpu_estimator.py and error_handling.py scripts seen in the Traceback below:
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3032, in train
rendezvous.record_error('training_loop', sys.exc_info())
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 81, in record_error
if value and value.op and value.op.type == _CHECK_NUMERIC_OP_NAME:
AttributeError: 'RuntimeError' object has no attribute 'op'
During handling of the above exception, another exception occurred:
File "workspace/trainer/train.py", line 331, in <module>
main(args=parsed_args)
File "workspace/trainer/train.py", line 177, in main
run_config=run_config)
File "workspace/trainer/train.py", line 68, in run_experiment
estimator.train(input_fn=train_input_fn, max_steps=total_train_steps)
File "/usr/local/lib/python3.6/site-packages/adanet/core/estimator.py", line 853, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 143, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 374, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1164, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1194, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
config)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3186, in _model_fn
host_ops = host_call.create_tpu_hostcall()
File "/usr/local/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2226, in create_tpu_hostcall
'dimension, but got scalar {}'.format(dequeue_ops[i][0]))
RuntimeError: All tensors outfed from TPU should preserve batch size dimension, but got scalar Tensor("OutfeedDequeueTuple:1", shape=(), dtype=int64, device=/job:tpu_worker/task:0/device:CPU:0)'
The previous version of the project using TF 1.14 runs both locally and on TPU using TPUEstimator without issues. Is there something obvious I am potentially missing for the conversion over to TF 2.1 when using TPUEstimator API?
Have you applied the following:
dataset = ...
dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
this potentially drops the last few samples from a file to ensure that every batch has a static shape of batch_size, which is required when training on TPUs.

Loading a saved Tensorflow model from its .meta file

I am trying to load a tensorflow meta graph from a saved checkpoint using Tensorflow version 1.15 to convert it to a SavedModel for tensorflow serving. It is a Speech Recognition Model with Local attention and unidirectional LSTM implemented using the Returnn Toolkit with Tensorflow Backend. I am using the following code.
import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import sys
if len(sys.argv)!=2:
print("Usage:" + sys.argv[0] + "save_dir")
exit(1)
export_dir=sys.argv[1]
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
sigs={}
with tf.Session(graph=tf.Graph()) as sess:
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
new_saver.restore(sess, tf.train.latest_checkpoint("./serv_test"))
graph=tf.get_default_graph()
input_audio=graph.get_tensor_by_name('inference/default/wav:0')
output_hyps=graph.get_tensor_by_name('inference/default/Reshape_7:0')
sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = tf.saved_model.signature_def_utils.predict_signature_def({"in":input_audio},{"out":output_hyps})
builder.add_meta_graph_and_variables(sess, [tag_constants.SERVING], signature_def_map=sigs,)
builder.save()
But I am getting the following error in the import_meta_graph line:
Traceback (most recent call last):
File "xport.py", line 16, in <module>
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered
'NativeLstm2' in binary running on ip-10-1-21-241. Make sure the Op and Kernel
are registered in the binary running in this process. Note that if you are loading a
saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler`
should be done before importing the graph, as contrib ops are lazily registered when
the module is first accessed.
Is there any way to get around this error? Is it because of the custom built layers used in Returnn? Is there any way to make a Returnn Model tensorflow servable?
Thanks.
You should remove the graph=tf.Graph(), otherwise your import_meta_graph will import it into the wrong graph.
Just see some official TF examples how to use import_meta_graph.

I am getting error "Restoring from checkpoint failed." while training tensorflow estimator api on AI-platform(ml-engine)

I am trying to do hyperparameter tuning on ai-engine for DNN regressor using tensorflow estimator api. But after submitting the job, it shows that job is failed and I get this error in job details.
Can someone help?
Hyperparameter Tuning Trial #1 Failed before any other successful trials were completed. The failed trial had parameters: learning_rate=0.0019937718716419557, num-layers=2, first-layer-size=148, scale-factor=0.7910729020312588, . The trial's error message was: The replica master 0 exited with a non-zero status of 1.
Traceback (most recent call last):
[...]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 507, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 385, in _AddShardedRestoreOps
name="restore_shard"))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 332, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 580, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
tensor_name = dnn/hiddenlayer_0/bias; shape in shape_and_slice spec [148] does not match the shape stored in checkpoint: [117]
[[node save/RestoreV2_1 (defined at /usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py:1403) ]]
Looks like you are using the same output directory for all the trials, and so trial#1 is trying to read trial#2 checkpoint (perhaps because it is the latest one in the directory) and failing because the architecture is different
Make sure to use a different output directory for each hyperparam training run. There are two ways you do this:
Use the --job-dir as the output directory.
Append a hyperparam trial number to the output directory you are using now:
output_dir = os.path.join( output_dir, json.loads( os.environ.get('TF_CONFIG', '{}') ).get('task', {}).get('trial', '') )

TypeError: 'Not JSON Serializable' while doing tf.keras.Model.save and using keras variable in loss_weights in tf.keras.Model.compile

System information   
OS Platform and Distribution: Ubuntu 16.04 LTS  
TensorFlow installed from (source or binary): binary  
TensorFlow version (use command below): 1.12.0  
Python version: 3.5.2  
CUDA/cuDNN version: release 9.0, V9.0.176  
GPU model and memory: Tesla K80, 12 GB  
Describe the current behavior  
When I try to save my model using model.save() where model is a tf.keras.Model instance, it throws a TypeError: ('Not JSON Serializable:', <tf.Variable 'Variable:0' shape=() dtype=float32>) .
I am using a tf.keras.backend.variable() in loss_weights in model.compile.  
Optimizer: tf.keras.optimizers.Adam  
Interestingly, when I try to save my model weights only using model.save_weights where model is a tf.keras.Model instance, it works fine, NO ERROR.
Code to reproduce the issue  
alpha = tf.keras.backend.variable(0.25)
Code any model using tf.keras  
model= get_model() 
model.compile(optimizer=optimizer,loss={"softmax1":generalized_dice_loss,"softmax2":generalized_dice_loss}, loss_weights=[1.0,alpha]) 
Do training using model.fit()-
model.save()
Other info / logs  
Traceback (most recent call last):  
File "main_latest.py", line 45, in 
max_queue_size=10)  
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 2177, in fit_generator
initial_epoch=initial_epoch)    
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/training_generator.py", line 216, in fit_generator  
callbacks.on_epoch_end(epoch, epoch_logs)  
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/callbacks.py", line 214, in on_epoch_end
callback.on_epoch_end(epoch, logs)    
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/callbacks.py", line 601, in on_epoch_end
self.model.save(filepath, overwrite=True)  
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/network.py", line 1363, in save
save_model(self, filepath, overwrite, include_optimizer)  
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/keras/engine/saving.py", line 134, in save_model
default=serialization.get_json_type).encode('utf8')  
File "/usr/lib/python3.5/json/init.py", line 237, in dumps
**kw).encode(obj)  
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)  
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)  
File "/home/tejal/.local/lib/python3.5/site-packages/tensorflow/python/util/serialization.py", line 64, in get_json_type  
raise TypeError('Not JSON Serializable:', obj)  
TypeError: ('Not JSON Serializable:', <tf.Variable 'Variable:0' shape=() dtype=float32>)
  
Screenshot of Error:
model.save is trying to save a tf.Variable which is not JSON serializable.
model.fit saves everything, not just the model weights. I've seen this problem when my optimizer had a tf.Tensor which cannot be serialized.
Everything points to alpha being your problem in this case. Also the documentation for model.compile states that the loss_weights should be a Python float.
So using alpha.numpy() should solve your problem:
model.compile(optimizer=optimizer,loss={"softmax1":generalized_dice_loss,"softmax2":generalized_dice_loss}, loss_weights=[1.0,alpha.numpy()])

Resources