I follow the official tutotial from microsoft: https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-score-model-predict-spark-pool
But when I execute:
#Bind model within Spark session
model = pcontext.bind_model(
return_types=RETURN_TYPES,
runtime=RUNTIME,
model_alias="Sales", #This alias will be used in PREDICT call to refer this model
model_uri=AML_MODEL_URI, #In case of AML, it will be AML_MODEL_URI
aml_workspace=ws #This is only for AML. In case of ADLS, this parameter can be removed
).register()
I´ve got:
NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel'
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_context.py", line 47, in bind_model
udf = _create_udf(
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_udf.py", line 104, in _create_udf
model_runtime = runtime_gen._create_runtime()
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 103, in _create_runtime
if self._check_model_runtime_compatibility(model_runtime):
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 166, in _check_model_runtime_compatibility
model_wrapper = self._load()
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_runtime.py", line 78, in _load
return SynapsePredictModelCache._get_or_load(
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/core/_cache.py", line 172, in _get_or_load
model = load_model(runtime, model_uri, functions)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 257, in load_model
model = loader.load(model_uri, functions)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 122, in load
model = self._load(model_uri)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 215, in _load
return self._load_mlflow(model_uri)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/azure/synapse/ml/predict/utils/_model_loader.py", line 59, in _load_mlflow
model = mlflow.pyfunc.load_model(model_uri)
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/pyfunc/init.py", line 640, in load_model
model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))
File "/home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/mlflow/models/model.py", line 124, in load
with open(path) as f:
NotADirectoryError: [Errno 20] Not a directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1648328086462_0002/spark-3d802a7e-15b7-4eb6-88c5-f0e01f8cdb35/userFiles-fbe23a43-67d3-4e65-a879-4a497e804b40/68603955220f5f8646700d809b71be9949011a2476a34965a3d5c0f3d14de79b.pkl/MLmodel'
How can I fix that error ?
(UPDATE:29/3/2022): You will experiencing this error message if you model does not contains all the required files in the ML model.
As per the repro, I had created two ML models named:
sklearn_regression_model: Which contains only sklearn_regression_model.pkl file.
When I predict for MLFLOW packaged model named sklearn_regression_model, getting same error as shown above:
linear_regression: Which contains the below files:
When I predict for MLFLOW packaged model named linear_regression, it works as excepted.
It should be AML_MODEL_URI = "" #In URI ":x" => Rossman_Sales:2
Before running this script, update it with the URI for ADLS Gen2 data file along with model output return data type and ADLS/AML URI for the model file.
#Set model URI
#Set AML URI, if trained model is registered in AML
AML_MODEL_URI = "<aml model uri>" #In URI ":x" signifies model version in AML. You can choose which model version you want to run. If ":x" is not provided then by default latest version will be picked.
#Set ADLS URI, if trained model is uploaded in ADLS
ADLS_MODEL_URI = "abfss://<filesystemname>#<account name>.dfs.core.windows.net/<model mlflow folder path>"
Model URI from AML Workspace:
DATA_FILE = "abfss://data#cheprasynapse.dfs.core.windows.net/AML/LengthOfStay_cooked_small.csv"
AML_MODEL_URI_SKLEARN = "aml://mlflow_sklearn:1" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
RETURN_TYPES = "INT"
RUNTIME = "mlflow"
Model URI uploaded to ADLS Gen2:
DATA_FILE = "abfss://data#cheprasynapse.dfs.core.windows.net/AML/LengthOfStay_cooked_small.csv"
AML_MODEL_URI_SKLEARN = "abfss://data#cheprasynapse.dfs.core.windows.net/linear_regression/linear_regression" #Here ":1" signifies model version in AML. We can choose which version we want to run. If ":1" is not provided then by default latest version will be picked
RETURN_TYPES = "INT"
RUNTIME = "mlflow"
Related
I am trying to save a trained model to S3 storage and then trying to load and predict using this model via Pipeline package from pyspark.ml.
Here's an example of how I am saving my model.
#stage_1 to stage_4 are some basic trasnformation on data one-hot encoding e.t.c
# define stage 5: logistic regression model
stage_5 = LogisticRegression(featuresCol='features',labelCol='label')
# SETUP THE PIPELINE
regression_pipeline = Pipeline(stages= [stage_1, stage_2, stage_3, stage_4, stage_5])
# fit the pipeline for the trainind data
model = regression_pipeline.fit(dataFrame1)
model_path ="s3://s3-dummy_path-orch/dummy models/pipeline_testing_1.model"
model.save(model_path)
I am able to save the model successfully & at the above mentioned model path two folders get created
stages
metadata.
However when I am trying to load the model it is giving me the below error.
Traceback (most recent call last):
File "/tmp/pythonScript_85ff2462_e087_4805_9f50_0c75fc4302e2958379757178872310.py", line 75, in <module>
pipelineModel = Pipeline.load(model_path)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 362, in load
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 207, in load
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 300, in load
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.Pipeline but found class name org.apache.spark.ml.PipelineModel'
I am trying to load the model as below:
from pyspark.ml import Pipeline
## same path used while #model.save in the above code snippet
model_path ="s3://s3-dummy_path-orch/dummy models/pipeline_testing_1.model"
pipelineModel = Pipeline.load(model_path)
How could I go about rectifying this?
If you saved a pipeline model, you should load it as a pipeline model, not as a pipeline. The difference is that a pipeline model is fitted to a dataframe, but a pipeline is not.
from pyspark.ml import PipelineModel
pipelineModel = PipelineModel.load(model_path)
I am trying to deploy a pre-trained ML model (saved as .h5 file) to Azure ML. I have created an AKS cluster and trying to deploy the model as shown below:
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AksWebservice, LocalWebservice
from azureml.core.compute import ComputeTarget
workspace = Workspace.from_config(path="config.json")
env = Environment.get(workspace, name='AzureML-TensorFlow-1.13-GPU')
# Installing packages present in my requirements file
with open('requirements.txt') as f:
dependencies = f.readlines()
dependencies = [x.strip() for x in dependencies if '# ' not in x]
dependencies.append("azureml-defaults>=1.0.45")
env.python.conda_dependencies = CondaDependencies.create(conda_packages=dependencies)
# Including the source folder so that all helper scripts are included in my deployment
inference_config = InferenceConfig(entry_script='app.py', environment=env, source_directory='./ProcessImage')
aks_target = ComputeTarget(workspace=workspace, name='sketch-ppt-vm')
# Deployment with suitable config
deployment_config = AksWebservice.deploy_configuration(cpu_cores=4, memory_gb=32)
model = Model(workspace, 'sketch-inference')
service = Model.deploy(workspace, "process-sketch-dev", [model], inference_config, deployment_config, deployment_target=aks_target, overwrite=True)
service.wait_for_deployment(show_output = True)
print(service.state)
My main entry script requires some additional helper scripts, which I include by mentioning the source folder in my inference config.
I was expecting that the helper scripts I add should be able to access the packages installed while setting up the environment during deployment, but I get ModuleNotFoundError.
Here is the error output, along with the a couple of environment variables I printed while executing entry script:
AZUREML_MODEL_DIR ---- azureml-models/sketch-inference/1
PYTHONPATH ---- /azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages:/var/azureml-server:
PATH ---- /azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/bin:/opt/miniconda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/intel/compilers_and_libraries/linux/mpi/bin64
Exception in worker process
Traceback (most recent call last):
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/workers/base.py", line 129, in init_process
self.load_wsgi()
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
self.wsgi = self.app.wsgi()
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
return self.load_wsgiapp()
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
return util.import_app(self.app_uri)
File "/azureml-envs/azureml_6dc005c11e151f8d9427c0c6091a1bb9/lib/python3.6/site-packages/gunicorn/util.py", line 350, in import_app
__import__(module)
File "/var/azureml-server/wsgi.py", line 1, in <module>
import create_app
File "/var/azureml-server/create_app.py", line 3, in <module>
from app import main
File "/var/azureml-server/app.py", line 32, in <module>
from aml_blueprint import AMLBlueprint
File "/var/azureml-server/aml_blueprint.py", line 25, in <module>
import main
File "/var/azureml-app/main.py", line 12, in <module>
driver_module_spec.loader.exec_module(driver_module)
File "/structure/azureml-app/ProcessImage/app.py", line 16, in <module>
from ProcessImage.samples.coco.inference import run as infer
File "/var/azureml-app/ProcessImage/samples/coco/inference.py", line 1, in <module>
import skimage.io
ModuleNotFoundError: No module named 'skimage'
The existing answers related to this aren't of much help. I believe there must be a simpler way to fix this, since AzureML specifically provides the feature to setup environment with pip/conda packages installed either by supplying requirements.txt file or individually.
What am I missing here? Kindly help.
So, after some trial and error, creating a fresh environment and then adding the packages solved the problem for me. I am still not clear on why this didn't work when I tried to use Environment.from_pip_requirements(). A detailed answer in this regard would be interesting to read.
My primary task was inference - object detection given an image, and we have our own model developed by our team. There are two types of imports I wanted to have:
1. Standard python packages (installed through pip)
This was solved by creating conda dependencies and add it to env object (Step 2)
2. Methods/vars from helper scripts (if you have pre/post processing to be done during model inference):
This was done by mentioning source_directory in InferenceConfig (step 3)
Here is my updated script which combines Environment creation, Inference and Deployment configs and using existing compute in the workspace (created through portal).
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AksWebservice, LocalWebservice
from azureml.core.compute import ComputeTarget
# 1. Instantiate the workspace
workspace = Workspace.from_config(path="config.json")
# 2. Setup the environment
env = Environment('sketchenv')
with open('requirements.txt') as f: # Fetch all dependencies as a list
dependencies = f.readlines()
dependencies = [x.strip() for x in dependencies if '# ' not in x]
env.docker.base_image = DEFAULT_GPU_IMAGE
env.python.conda_dependencies = CondaDependencies.create(conda_packages=['numpy==1.17.4', 'Cython'], pip_packages=dependencies)
# 3. Inference Config
inference_config = InferenceConfig(entry_script='app.py', environment=env, source_directory='./ProcessImage')
# 4. Compute target (using existing cluster from the workspacke)
aks_target = ComputeTarget(workspace=workspace, name='sketch-ppt-vm')
# 5. Deployment config
deployment_config = AksWebservice.deploy_configuration(cpu_cores=6, memory_gb=100)
# 6. Model deployment
model = Model(workspace, 'sketch-inference') # Registered model (which contains model files/folders)
service = Model.deploy(workspace, "process-sketch-dev", [model], inference_config, deployment_config, deployment_target=aks_target, overwrite=True)
service.wait_for_deployment(show_output = True)
print(service.state)
I am trying to load a tensorflow meta graph from a saved checkpoint using Tensorflow version 1.15 to convert it to a SavedModel for tensorflow serving. It is a Speech Recognition Model with Local attention and unidirectional LSTM implemented using the Returnn Toolkit with Tensorflow Backend. I am using the following code.
import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import sys
if len(sys.argv)!=2:
print("Usage:" + sys.argv[0] + "save_dir")
exit(1)
export_dir=sys.argv[1]
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
sigs={}
with tf.Session(graph=tf.Graph()) as sess:
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
new_saver.restore(sess, tf.train.latest_checkpoint("./serv_test"))
graph=tf.get_default_graph()
input_audio=graph.get_tensor_by_name('inference/default/wav:0')
output_hyps=graph.get_tensor_by_name('inference/default/Reshape_7:0')
sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = tf.saved_model.signature_def_utils.predict_signature_def({"in":input_audio},{"out":output_hyps})
builder.add_meta_graph_and_variables(sess, [tag_constants.SERVING], signature_def_map=sigs,)
builder.save()
But I am getting the following error in the import_meta_graph line:
Traceback (most recent call last):
File "xport.py", line 16, in <module>
new_saver=tf.train.import_meta_graph("./serv_test/model.238.meta")
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered
'NativeLstm2' in binary running on ip-10-1-21-241. Make sure the Op and Kernel
are registered in the binary running in this process. Note that if you are loading a
saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler`
should be done before importing the graph, as contrib ops are lazily registered when
the module is first accessed.
Is there any way to get around this error? Is it because of the custom built layers used in Returnn? Is there any way to make a Returnn Model tensorflow servable?
Thanks.
You should remove the graph=tf.Graph(), otherwise your import_meta_graph will import it into the wrong graph.
Just see some official TF examples how to use import_meta_graph.
Although there are many question threads for error ValueError: negative dimensions are not allowed
I couldn't find the answer for my problem
After training Machine learning model using SGDclassifer
clf=linear_model.SGDClassifier(loss='log',random_state=20000,verbose=1,class_weight='balanced')
model=clf.fit(X,Y)
Dimension of X is (1651880,246177)
The below code is working i.e when saving model object and when using model for prediction
joblib.dump(model, 'trainedmodel.pkl',compress=3)
prediction_result=model.predict(x_test)
but getting error when loading the saved model
model = joblib.load('trainedmodel.pkl')
below is the error message
Please help me out to resolve it.
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 598, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 526, in _unpickle
obj = unpickler.load()
File "C:\Users\Taxonomy\Anaconda3\lib\pickle.py", line 1050, in load
dispatch[key[0]](self)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 352, in load_build
self.stack.append(array_wrapper.read(self))
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 195, in read
array = self.read_array(unpickler)
File "C:\Users\Taxonomy\AppData\Roaming\Python\Python36\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 141, in read_array
array = unpickler.np.empty(count, dtype=self.dtype)
ValueError: negative dimensions are not allowed
Try to dump model with protocol 4.
from python's pickle docs:
Protocol version 4 was added in Python 3.4. It adds support for very
large objects, pickling more kinds of objects, and some data format
optimizations. Refer to PEP 3154 for information about improvements
brought by protocol 4.
I recently read a paper about UNet++,and I want to implement this structure with tensorflow-2.0 and keras customized model. As the structure is so complicated, I decided to manage the keras layers by a dictionary. Everything went well in training, but an error occurred while saving the model. Here is a minimum code to show the error:
class DicModel(tf.keras.Model):
def __init__(self):
super(DicModel, self).__init__(name='SequenceEECNN')
self.c = {}
self.c[0] = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3,activation='relu',padding='same'),
tf.keras.layers.BatchNormalization()]
)
self.c[1] = tf.keras.layers.Conv2D(3,3,activation='softmax',padding='same')
def call(self,images):
x = self.c[0](images)
x = self.c[1](x)
return x
X_train,y_train = load_data()
X_test,y_test = load_data()
class_weight.compute_class_weight('balanced',np.ravel(np.unique(y_train)),np.ravel(y_train))
model = DicModel()
model_name = 'test'
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='logs/'+model_name+'/')
early_stop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',patience=100,mode='min')
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy'])
results = model.fit(X_train,y_train,batch_size=4,epochs=5,validation_data=(X_test,y_test),
callbacks=[tensorboard_callback,early_stop_callback],
class_weight=[0.2,2.0,100.0])
model.save_weights('model/'+model_name,save_format='tf')
The error information is:
Traceback (most recent call last):
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/learn_tf2/test_model.py", line 61, in \<module>
model.save_weights('model/'+model_name,save_format='tf')
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1328, in save_weights
self.\_trackable_saver.save(filepath, session=session)
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1106, in save
file_prefix=file_prefix_tensor, object_graph_tensor=object_graph_tensor)
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1046, in \_save_cached_when_graph_building
object_graph_tensor=object_graph_tensor)
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1014, in \_gather_saveables
feed_additions) = self.\_graph_view.serialize_object_graph()
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/graph_view.py", line 379, in serialize_object_graph
trackable_objects, path_to_root = self.\_breadth_first_traversal()
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/graph_view.py", line 199, in \_breadth_first_traversal
for name, dependency in self.list_dependencies(current_trackable):
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/graph_view.py", line 159, in list_dependencies
return obj.\_checkpoint_dependencies
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/data_structures.py", line 690, in \_\_getattribute\_\_
return object.\_\_getattribute\_\_(self, name)
File "/media/xrzhang/Data/ZHS/Research/CNN-TF2/venv/lib/python3.6/site-packages/tensorflow/python/training/tracking/data_structures.py", line 732, in \_checkpoint_dependencies
"ignored." % (self,))
ValueError: Unable to save the object {0: \<tensorflow.python.keras.engine.sequential.Sequential object at 0x7fb5c6c36588>, 1: \<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x7fb5c6c36630>} (a dictionary wrapper constructed automatically on attribute assignment). The wrapped dictionary contains a non-string key which maps to a trackable object or mutable data structure.
If you don't need this dictionary checkpointed, wrap it in a tf.contrib.checkpoint.NoDependency object; it will be automatically un-wrapped and subsequently ignored.
The tf.contrib.checkpoint.NoDependency seems has been removed from Tensorflow-2.0 (https://medium.com/tensorflow/whats-coming-in-tensorflow-2-0-d3663832e9b8). How can I fix this issue? Or should I just give up using dictionary in customized Keras Model. Thank you for your time and helps!
Use string keys. For some reason tensorflow doesn't like int keys.
The exception message was incorrect in Tensorflow 2.0 and has been fixed in 2.2
You can avoid the problem by wrapping the c attribute like this
from tensorflow.python.training.tracking.data_structures import NoDependency
self.c = NoDependency({})
For more details check this issue.