Databricks error when trying to load a model - databricks

I am trying to train a model in databricks with mlflow and roberta transformers. I am able to register the model but when I call it for testing, I got the following error:
OSError: We couldn't connect to 'https://huggingface.co/' to load this model and it looks like dbfs:/databricks/mlflow-tracking/3638851642935524/cd4eae6034684211933b97b178e5f062/artifacts/checkpoint-36132/artifacts/checkpoint-36132 is not the path to a directory containing a config.json file.
However, when I check for the model saved, I can see that the config.json and other files are saved in the mentioned artifact but with the following error:
Couldn't load model information due to an error.
For more details, I have followed the following link for creating an mlflow with transformers on databricks:
https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/mlflow

Related

Accessing pickle file through API

I have a pickle file parameters.pkl containing some parameters and their values of a model. Is there a process to store it in Microsoft Azure Machine Learning Studio as endpoint. So that we can access the parameters and their values through API at some later stage.
The pickle file has been created through the following process:
dict={'scaler': scaler,
'features': z_tags,
'Z_reconstruction_loss': Z_reconstruction_loss}
pickle.dump(dict, open('parameters.pkl', 'wb'))

I can not register a model in my Azure ml experiment using run context

I am trying to register a model inside one of my azure ml experiments. I am able to register it via Model.register but not via run_context.register_model
This are the two code sentences I use. The commented one is the one that fails
learn.path = Path('./outputs').absolute()
Model.register(run_context.experiment.workspace, "outputs/login_classification.pkl","login_classification", tags=metrics)
run_context.register_model("login_classification", "outputs/login_classification.pkl", tags=metrics)
I received the next error:
Message: Could not locate the provided model_path outputs/login_classification.pkl
But model is stored in this path:
Before implementing run_context.register_model() implement run_context = Run.get_context()
I was able to fix the problem by explicitly uploading the model into the run history record before trying for registering the model.
run.upload_file("output/model.pickle", "output/model.pickle")
Check the documentation for Message: Could not locate the provided model_path outputs/login_classification.pkl
To check about Run Class

MLFlow Model Registry ENDPOINT_NOT_FOUND: No API found for ERROR

I'm currently using MLFlow in Azure Databricks and trying to load a model from the Model Registry. Currently referencing the version, but will want to reference the stage 'Production' (I get the same error when referencing the stage as well)
I keep encountering an error:
ENDPOINT_NOT_FOUND: No API found for 'POST /mlflow/model-versions/get-download-uri'
My artifacts are stored in the dbfs filestore.
I have not been able to identify why this is happening.
Code:
from mlflow.tracking.client import MlflowClient
from mlflow.entities.model_registry.model_version_status import ModelVersionStatus
import mlflow.pyfunc
model_name = "model_name"
model_version_uri = "models:/{model_name}/4".format(model_name=model_name)
print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_version_uri))
model_version_4 = mlflow.pyfunc.load_model(model_version_uri)
model_production_uri = "models:/{model_name}/production".format(model_name=model_name)
print("Loading registered model version from URI: '{model_uri}'".format(model_uri=model_production_uri))
model_production = mlflow.pyfunc.load_model(model_production_uri)

How to Deploy trained TensorFlow 2.0 models using Amazon SageMaker?

i am trying to deploy a custom trained tensorflow model using Amazon SageMaker. i have trained xlm roberta using tf 2.2.0 for multilingual sentiment analysis task.(please refer to this notebook : https://www.kaggle.com/mobassir/understanding-cross-lingual-models)
now, using trained weight file of my model i am trying to deploy that in sagemaker, i was following this tutorial : https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/
converted some keras code from there to tensorflow.keras for 2.2.0
but when i do : !ls export/Servo/1/variables i can see that export as Savedmodel generating empty variables directory like this : https://github.com/tensorflow/models/issues/1988
i can't find any documentation help for tf 2.2.0 trained model deployment
need example like this : https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ for tf 2.x models and not keras
even though !ls export/Servo/1/variables shows empty directory but An endpoint was created successfully and now i am not sure if my model was deployed successfully or not because when i try to test the model deployment inside aws notebook by using predictor = sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)
i.e. predictor.predict(data) i get the following error message:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
"error": "Session was not created with a graph before Run()!"
}"
related problem : Inference error with TensorFlow C++ on iOS: "Invalid argument: Session was not created with a graph before Run()!"
the code i tried can be found here : https://pastebin.com/sGuTtnSD

Tensorboard + Keras + ML Engine

I currently have Google Cloud ML Engine setup to train models created in Keras. When using Keras, it seems ML Engine does not automatically save the logs to a storage bucket. I see the logs in the ML Engine Jobs page but they do not show in my storage bucket and therefore I am unable to run tensorboard while training.
You can see the job completed successfully and produced logs:
But then there are no logs saved in my storage bucket:
I followed this tutorial when setting up my environment: (http://liufuyang.github.io/2017/04/02/just-another-tensorflow-beginner-guide-4.html)
So, how do I get the logs and run tensorboard when training a Keras model on ML Engine? Has anyone else had success with this?
You will need to create a callback keras.callbacks.TensorBoard(..) in order to write out the logs. See Tensorboad callback. You can supply GCS path as well (gs://path/to/my/logs) to the log_dir argument of the callback and then point Tensorboard to that location. You will add the callback as a list when calling model.fit_generator(...) or model.fit(...).
tb_logs = callbacks.TensorBoard(
log_dir='gs://path/to/logs',
histogram_freq=0,
write_graph=True,
embeddings_freq=0)
model.fit_generator(..., callbacks=[tb_logs])

Resources