I am trying to create a pipeline with Python SDK v2 in Azure Machine Learning Studio. Been stuck on this error for many.. MANY.. hours now, so now I am reaching out.
I have been following this guide: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-python-sdk
My setup is very similar, but I split "data_prep" into two separate steps, and I am using a custom ml model.
How the pipeline is defined:
`
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output
import pathlib
import os
#dsl.pipeline(
compute=cpu_compute_target,
description="Car predict pipeline",
)
def car_predict_pipeline(
pipeline_job_data_input,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
)
print('-----------------------------------------------')
print(os.path.realpath(str(pipeline_job_data_input)))
print(os.path.realpath(str(data_prep_job.outputs.prepared_data)))
print('-----------------------------------------------')
train_test_split_job = traintestsplit_component(
prepared_data=data_prep_job.outputs.prepared_data
)
# using train_func like a python call with its own inputs
train_job = train_component(
train_data=train_test_split_job.outputs.train_data, # note: using outputs from previous step
test_data=train_test_split_job.outputs.test_data, # note: using outputs from previous step
registered_model_name=pipeline_job_registered_model_name,
)
# a pipeline returns a dictionary of outputs
# keys will code for the pipeline output identifier
return {
# "pipeline_job_train_data": train_job.outputs.train_data,
# "pipeline_job_test_data": train_job.outputs.test_data,
"pipeline_job_model": train_job.outputs.model
}
`
I managed to run every single component successfully, in order, via the command line and produced a trained model. Ergo the components and data works fine, but the pipeline won't run.
I can provide additional info, but I am not sure what is needed and I do not want to clutter the post.
I have tried googling. I have tried comparing the tutorial pipeline with my own. I have tried using print statements to isolate the issue. Nothing has worked so far. Nothing that I have done has changed the error either, it's the same error no matter what.
Edit:
Some additional info about my environment:
from azure.ai.ml.entities import Environment
custom_env_name = "pipeline_test_environment_pricepredict_model"
pipeline_job_env = Environment(
name=custom_env_name,
description="Environment for testing out Jeppes model in pipeline building",
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
version="1.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)
Build status of environment. It had already run successfully.
In azure machine learning studio, when the application was running and model was deployed we have the default options to get the curated environments or custom environments. If the environment was created based on the existing deployment, we need to check with the build was successful or not.
Until we get the success in the deployment, we cannot get the environment variables noted into the program and we cannot retrieve the variables through the code block.
Select the environment need to be used.
Choose the existing version created.
We will get the mount location details and the docker file if creating using the docker and conda environment.
The environment and up and running successfully. If the case is running, then using the asset ID or the mount details we can retrieve the environment variables information.
/mnt/batch/tasks/shared/LS_root/mounts/clusters/workspace-name/code/files/docker/Dockerfile
I came across the following SQL command in Databricks notebook and I am confused about what is this ${da.paths.working_dir} object in following SQL command. Is it a python object or something else?
SELECT * FROM parquet.${da.paths.working_dir}/weather
I know it contains the path of a working directory but how can I access/print it.
I tried to demystify it but failed as illustrated in the following figure.
NOTE: My notebook is SQL notebook
Finally, I figured it out. This is a high-level variable in Databricks SQL and we can access it using the SELECT keyword in Databricks SQL as shown below:
SELECT '${da.paths.working_dir}';
EDIT: This high variable is spark configuration which can be set as follows:
## spark.conf.set(key, value)
spark.conf.set(da.paths.working_dir, "/path/to/files")
To access this property in python:
spark.conf.get(da.paths.working_dir)
To access this property in Databricks SQL:
SELECT {da.paths.working_dir}
I'm using mlflow version 1.18.0
When I delete experiment from mlflow UI, and than try to create and write a new experiment (with same name which I just deleted) I'm getting error on this line code:
mlflow.start_run(run_name=run_name)
Error:
error mlflow.util.rest_util API resest to faild with code 500 != 200
If I change the experiment name, I have no problem to write new tests.
Why is this happening ? (as I wrote, I delete the experiment name)
Is there a way to solve it (without giving new experiment name) ?
I have to create a trigger for my database in cassandra, but the doc does not help me.
After creating the keyspace, the tables and tried to do insert on the test table and gives me the following error:
Following the example to which the datastax documentation refers, I have done all the steps and it works.
https://github.com/apache/cassandra/tree/trunk/examples/triggers
Then I copied the created .jar file and the .proprierties file and copied them to the respective directories of my ubuntu18 server as suggested in the last two steps of the readme file.
After creating the keyspace, the tables and tried to do insert on the test table and gives me the following error:
ServerError: java.lang.NoClassDefFoundError: org/apache/cassandra/schema/Schema
Why this?
When trying to export import into a table into another schema i am facing the following issue:
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P41" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P42" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P43" using client specified AUTOMATIC method
ORA-31696: unable to export/import TABLE_DATA:"schemaowner"."tablename":"SYS_P44" using client specified AUTOMATIC method
I want to export all data from a table in one schema and import hem into another schema. The errors occur during import. Below is the command i used for export and the command for import which fails:
expdp schema1/schema1#db1 directory=MYDIR tables=schema1.tbl dumpfile=tbl.dmp logfile=tbl.log content=data_only version=10.2.0.4.0
The above works and damp file is created but when trying:
impdp schema2/schema2#db1 directory=MYDIR tables=tbl dumpfile=tbl.dmp logfile=tblload.log content=data_only version=10.2.0.40 remap_schema=schema1:schema2
The above fails with the error specified in the beginning. Can you please advise me what i am doing wrong. I would truly appreciate it
Also as a reference the commands i run them on a linux os with oracle 10