UI does display data in MLflow - databricks

This is in reference to rather comment (not answer), I added here: MLflow: INVALID_PARAMETER_VALUE: Unsupported URI './mlruns' for model registry store
I extracted files from here
train.py MLproject wine-quality.csv
These are in directory:feb24MLFLOW
I am in directory feb24MLFLOW with following contents
:memory mlruns train.py wine-quality.csv
When I run following command
mlflow server --backend-store-uri sqlite:///:memory --default-artifact-root ./mlruns
The UI loads but does not show any data in it neigther does database as below. see screenshot.
I am using --default-artifact-root ./mlruns flag because, when I print print(mlflow.get_tracking_uri()), I get the current directory
file:///<mydirectorylocations>/feb24MLFLOW/mlruns
For some reason I see my database is not updating (or inserting). I checked that with in terminal.
$ sqlite3
sqlite> .open :memory
sqlite> .tables
alembic_version metrics registered_model_tags
experiment_tags model_version_tags registered_models
experiments model_versions runs
latest_metrics params tags
sqlite> select * from runs;
sqlite>
As you can see there is no data after running select * from runs above.
Please note that I have following contents in
./mlruns
d6db5cf1443d49c19971a1b8b606d692 meta.yaml
Can somebody suggest I show results in the UI? or insert in databse? or what am I doing wrong?
Please note that when I run mlflow ui, I see data in the UI but I get:
error_code: "INVALID_PARAMETER_VALUE"
message: " Model registry functionality is unavailable; got unsupported URI './mlruns' for model registry data storage. Supported URI schemes are: ['postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations."

Related

Azure ML Pipeline - Error: Message: "Missing data for required field". Path: "environment". value: "null"

I am trying to create a pipeline with Python SDK v2 in Azure Machine Learning Studio. Been stuck on this error for many.. MANY.. hours now, so now I am reaching out.
I have been following this guide: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-python-sdk
My setup is very similar, but I split "data_prep" into two separate steps, and I am using a custom ml model.
How the pipeline is defined:
`
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output
import pathlib
import os
#dsl.pipeline(
compute=cpu_compute_target,
description="Car predict pipeline",
)
def car_predict_pipeline(
pipeline_job_data_input,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
)
print('-----------------------------------------------')
print(os.path.realpath(str(pipeline_job_data_input)))
print(os.path.realpath(str(data_prep_job.outputs.prepared_data)))
print('-----------------------------------------------')
train_test_split_job = traintestsplit_component(
prepared_data=data_prep_job.outputs.prepared_data
)
# using train_func like a python call with its own inputs
train_job = train_component(
train_data=train_test_split_job.outputs.train_data, # note: using outputs from previous step
test_data=train_test_split_job.outputs.test_data, # note: using outputs from previous step
registered_model_name=pipeline_job_registered_model_name,
)
# a pipeline returns a dictionary of outputs
# keys will code for the pipeline output identifier
return {
# "pipeline_job_train_data": train_job.outputs.train_data,
# "pipeline_job_test_data": train_job.outputs.test_data,
"pipeline_job_model": train_job.outputs.model
}
`
I managed to run every single component successfully, in order, via the command line and produced a trained model. Ergo the components and data works fine, but the pipeline won't run.
I can provide additional info, but I am not sure what is needed and I do not want to clutter the post.
I have tried googling. I have tried comparing the tutorial pipeline with my own. I have tried using print statements to isolate the issue. Nothing has worked so far. Nothing that I have done has changed the error either, it's the same error no matter what.
Edit:
Some additional info about my environment:
from azure.ai.ml.entities import Environment
custom_env_name = "pipeline_test_environment_pricepredict_model"
pipeline_job_env = Environment(
name=custom_env_name,
description="Environment for testing out Jeppes model in pipeline building",
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
version="1.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)
Build status of environment. It had already run successfully.
In azure machine learning studio, when the application was running and model was deployed we have the default options to get the curated environments or custom environments. If the environment was created based on the existing deployment, we need to check with the build was successful or not.
Until we get the success in the deployment, we cannot get the environment variables noted into the program and we cannot retrieve the variables through the code block.
Select the environment need to be used.
Choose the existing version created.
We will get the mount location details and the docker file if creating using the docker and conda environment.
The environment and up and running successfully. If the case is running, then using the asset ID or the mount details we can retrieve the environment variables information.
/mnt/batch/tasks/shared/LS_root/mounts/clusters/workspace-name/code/files/docker/Dockerfile

Object embedded in Databricks SQL command

I came across the following SQL command in Databricks notebook and I am confused about what is this ${da.paths.working_dir} object in following SQL command. Is it a python object or something else?
SELECT * FROM parquet.${da.paths.working_dir}/weather
I know it contains the path of a working directory but how can I access/print it.
I tried to demystify it but failed as illustrated in the following figure.
NOTE: My notebook is SQL notebook
Finally, I figured it out. This is a high-level variable in Databricks SQL and we can access it using the SELECT keyword in Databricks SQL as shown below:
SELECT '${da.paths.working_dir}';
EDIT: This high variable is spark configuration which can be set as follows:
## spark.conf.set(key, value)
spark.conf.set(da.paths.working_dir, "/path/to/files")
To access this property in python:
spark.conf.get(da.paths.working_dir)
To access this property in Databricks SQL:
SELECT {da.paths.working_dir}

Failed to write to mlflow after deleting experiment via mlflow UI

I'm using mlflow version 1.18.0
When I delete experiment from mlflow UI, and than try to create and write a new experiment (with same name which I just deleted) I'm getting error on this line code:
mlflow.start_run(run_name=run_name)
Error:
error mlflow.util.rest_util API resest to faild with code 500 != 200
If I change the experiment name, I have no problem to write new tests.
Why is this happening ? (as I wrote, I delete the experiment name)
Is there a way to solve it (without giving new experiment name) ?

Can someone explain me which are the steps to create a trigger in cassandra v3.11?

I have to create a trigger for my database in cassandra, but the doc does not help me.
After creating the keyspace, the tables and tried to do insert on the test table and gives me the following error:
Following the example to which the datastax documentation refers, I have done all the steps and it works.
https://github.com/apache/cassandra/tree/trunk/examples/triggers
Then I copied the created .jar file and the .proprierties file and copied them to the respective directories of my ubuntu18 server as suggested in the last two steps of the readme file.
After creating the keyspace, the tables and tried to do insert on the test table and gives me the following error:
ServerError: java.lang.NoClassDefFoundError: org/apache/cassandra/schema/Schema
Why this?

oracle data pump import to another schema ora-31696

When trying to export import into a table into another schema i am facing the following issue:
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P41" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P42" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P43" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P44" using client specified AUTOMATIC method
I want to export all data from a table in one schema and import hem into another schema. The errors occur during import. Below is the command i used for export and the command for import which fails:
expdp schema1/schema1#db1 directory=MYDIR tables=schema1.tbl dumpfile=tbl.dmp logfile=tbl.log content=data_only version=10.2.0.4.0
The above works and damp file is created but when trying:
impdp schema2/schema2#db1 directory=MYDIR tables=tbl dumpfile=tbl.dmp logfile=tblload.log content=data_only version=10.2.0.40 remap_schema=schema1:schema2
The above fails with the error specified in the beginning. Can you please advise me what i am doing wrong. I would truly appreciate it
Also as a reference the commands i run them on a linux os with oracle 10

Resources