AML run.log() and run.log_list() fail without error - azure-machine-learning-service

I have a Pipeline with DatabricksSteps each containing:
from azureml.core.run import Run
run = Run.get_context()
#do stuff
run.log(name, val, desc)
run.log_list(name, vals, desc)
run.log_image(title, fig, desc)
Only log_image() seems to work. The image appears in the "images" section of the AML experiment workspace as expected, but the "tracked metrics" and "charts" areas are blank. In an interactive job, run.log() and run.log_list() work as expected. I tested that there is no problem with the arguments by using print() instead of run.log().

Add run.flush() at the end of the script.

Related

Azure ML Pipeline - Error: Message: "Missing data for required field". Path: "environment". value: "null"

I am trying to create a pipeline with Python SDK v2 in Azure Machine Learning Studio. Been stuck on this error for many.. MANY.. hours now, so now I am reaching out.
I have been following this guide: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-python-sdk
My setup is very similar, but I split "data_prep" into two separate steps, and I am using a custom ml model.
How the pipeline is defined:
`
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output
import pathlib
import os
#dsl.pipeline(
compute=cpu_compute_target,
description="Car predict pipeline",
)
def car_predict_pipeline(
pipeline_job_data_input,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
)
print('-----------------------------------------------')
print(os.path.realpath(str(pipeline_job_data_input)))
print(os.path.realpath(str(data_prep_job.outputs.prepared_data)))
print('-----------------------------------------------')
train_test_split_job = traintestsplit_component(
prepared_data=data_prep_job.outputs.prepared_data
)
# using train_func like a python call with its own inputs
train_job = train_component(
train_data=train_test_split_job.outputs.train_data, # note: using outputs from previous step
test_data=train_test_split_job.outputs.test_data, # note: using outputs from previous step
registered_model_name=pipeline_job_registered_model_name,
)
# a pipeline returns a dictionary of outputs
# keys will code for the pipeline output identifier
return {
# "pipeline_job_train_data": train_job.outputs.train_data,
# "pipeline_job_test_data": train_job.outputs.test_data,
"pipeline_job_model": train_job.outputs.model
}
`
I managed to run every single component successfully, in order, via the command line and produced a trained model. Ergo the components and data works fine, but the pipeline won't run.
I can provide additional info, but I am not sure what is needed and I do not want to clutter the post.
I have tried googling. I have tried comparing the tutorial pipeline with my own. I have tried using print statements to isolate the issue. Nothing has worked so far. Nothing that I have done has changed the error either, it's the same error no matter what.
Edit:
Some additional info about my environment:
from azure.ai.ml.entities import Environment
custom_env_name = "pipeline_test_environment_pricepredict_model"
pipeline_job_env = Environment(
name=custom_env_name,
description="Environment for testing out Jeppes model in pipeline building",
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
version="1.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)
Build status of environment. It had already run successfully.
In azure machine learning studio, when the application was running and model was deployed we have the default options to get the curated environments or custom environments. If the environment was created based on the existing deployment, we need to check with the build was successful or not.
Until we get the success in the deployment, we cannot get the environment variables noted into the program and we cannot retrieve the variables through the code block.
Select the environment need to be used.
Choose the existing version created.
We will get the mount location details and the docker file if creating using the docker and conda environment.
The environment and up and running successfully. If the case is running, then using the asset ID or the mount details we can retrieve the environment variables information.
/mnt/batch/tasks/shared/LS_root/mounts/clusters/workspace-name/code/files/docker/Dockerfile

Changing subdirectory of MLflow artifact store

Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:
s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json
MlFlow creates a 3/ key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?
As I commented above, yes, mlflow.create_experiment() does allow you set the artifact location using the artifact_location parameter.
However, sort of related, the problem with setting the artifact_location using the create_experiment() function is that once you create a experiment, MLflow will throw an error if you run the create_experiment() function again.
I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment() function again. And as of this post, MLfLow does not have check_if_exists flag or a create_experiments_if_not_exists() function.
To make things more frustrating, you cannot set the artifcact_location in the set_experiment() function either.
So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well.
:
import os
from random import random, randint
from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException
try:
experiment = mlflow.get_experiment_by_name('oof')
experiment_id = experiment.experiment_id
except AttributeError:
experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')
with mlflow.start_run(experiment_id=experiment_id) as run:
mlflow.set_tracking_uri('http://localhost:5000')
print("Running mlflow_tracking.py")
log_param("param1", randint(0, 100))
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs")
If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id does not exist and the except code block gets executed creating the experiment.
If it is the second, third, etc the code is run, it will only execute the code under the try statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.

How to import custom functions on my experiment script for Azure ML?

I can successfully submit an experiment to processing on a remote compute target on Azure ML.
In my notebook, for submitting the experiment, I have:
# estimator
estimator = Estimator(
source_directory='scripts',
entry_script='exp01.py',
compute_target='pc2',
conda_packages=['scikit-learn'],
inputs=[data.as_named_input('my_dataset')],
)
# Submit
exp = Experiment(workspace=ws, name='my_exp')
# Run the experiment based on the estimator
run = exp.submit(config=estimator)
RunDetails(run).show()
run.wait_for_completion(show_output=True)
However, in order to keep things clean, I want to define my general use functions on an auxiliary script, so the first will import it.
On my script experiment file exp01.py, I wanted:
import custom_functions as custom
# azure experiment start
run = Run.get_context()
# the data from azure datasets/datastorage
df = run.input_datasets['my_dataset'].to_pandas_dataframe()
# prepare data
df_transformed = custom.prepare_data(df)
# split data
X_train, X_test, y_train, y_test = custom.split_data(df_transformed)
# run my models.....
model_name = 'RF'
model = custom.model_x(model_name, a_lot_of_args)
# log the results
run.log(model_name, results)
# azure finish
run.complete()
The thing is: Azure wont let me import the custom_functions.py.
How are you doing it?
TL;DR any files you put inside the source_directory in your case, scripts will be available to the Estimator.
To make this happen, simply create a file called custom_functions.py in the scripts folder that contains your prepare_data(), split_data(), model_x() functions.
I also recommend that you include only exactly what you need in the source_directory folder and make distinct folders for each Estimator because:
the entire folder's contents will be uploaded when you use a remote compute_target, and
when you started using ML Pipeilnes (which are awesome), PythonScriptSteps allow_reuse parameter will look to see if any files in the source_directory have changed when determining if the step needs to run again or not.
Lastly, when you want to share general utility functions across PythonScriptSteps or Estimators without having to copy and paste code, that's when you might want to consider creating a custom python package.

Delete a run in the experiment of mlflow from the UI so the run does not exist in backend store

I found deleting a run only change the state from active to deleted, because the run is still visible in the UI if searching by deleted.
Is it possible to remove a run from the UI to save the space?
When removing a run, does the artifact correspond to the run is also removed?
If not, can the run be removed through rest call?
The accepted answer indeed deletes the experiment, not the run of an experiment.
In order to remove the directory one can use the mlflow API. Here is the script that removes all deleted runs:
import mlflow
import shutil
def get_run_dir(artifacts_uri):
return artifacts_uri[7:-10]
def remove_run_dir(run_dir):
shutil.rmtree(run_dir, ignore_errors=True)
experiment_id = 1
deleted_runs = 2
exp = mlflow.tracking.MlflowClient(tracking_uri='./mlflow/mlruns')
runs = exp.search_runs(str(experiment_id), run_view_type=deleted_runs)
_ = [remove_run_dir(get_run_dir(run.info.artifact_uri)) for run in runs]
You can't do it via the web UI but you can from a python terminal
import mlflow
mlflow.delete_experiment(69)
Where 69 is the experiment ID
Whilst Grzegorz already provided a solution, I just wanted to provide an alternative solution using the MLFlow cli.
The cli has a command, mlfLow gc, which deletes runs in the deleted lifecycle stage.
check https://mlflow.org/docs/latest/cli.html#mlflow-gc

pyldavis Unable to view the graph

I am trying to visually depict my topics in python using pyldavis. However i am unable to view the graph. Is it that we have to view the graph in the browser or will it get popped upon execution. Below is my code
import pyLDAvis
import pyLDAvis.gensim as gensimvis
print('Pyldavis ....')
vis_data = gensimvis.prepare(ldamodel, doc_term_matrix, dictionary)
pyLDAvis.display(vis_data)
The program is continuously in execution mode on executing the above commands. Where should I view my graph? Or where it will be stored? Is it integrated only with the Ipython notebook?Kindly guide me through this.
P.S My python version is 3.5.
This not work:
pyLDAvis.display(vis_data)
This will work for you:
pyLDAvis.show(vis_data)
I'm facing the same problem now.
EDIT:
My script looks as follows:
first part:
import pyLDAvis
import pyLDAvis.sklearn
print('start script')
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',stop_words = 'english',lowercase = True,token_pattern = r'\b[a-zA-Z]{3,}\b',max_df = 0.5,min_df = 10)
dtm_tf = tf_vectorizer.fit_transform(docs_raw)
lda_tf = LatentDirichletAllocation(n_topics=20, learning_method='online')
print('fit')
lda_tf.fit(dtm_tf)
second part:
print('prepare')
vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
print('display')
pyLDAvis.display(vis_data)
The problem is in the line "vis_data = (...)".if I run the script, it will print 'prepare' and keep on running after that without printing anything else (so it never reaches the line "print('display')).
Funny thing is, when I just run the whole script it gets stuck on that line, but when I run the first part, got to my console and execute purely the single line "vis_data = pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)" this is executed in a couple of seconds.
As for the graph, I saved it as html ("simple") and use the html file to view the graph.
I ran into the same problem (I use PyCharm as IDE) The problem is that pyLDAvize is developed for Ipython (see the docs, https://media.readthedocs.org/pdf/pyldavis/latest/pyldavis.pdf, page 3).
My fix/workaround:
make a dict of lda_tf, dtm_tf, tf_vectorizer (eg., pyLDAviz_dict)dump the dict to a file (eg mydata_pyLDAviz.pkl)
read the pkl file into notebook (I did get some depreciation info from pyLDAviz, but that had no effect on the end result)
play around with pyLDAviz in notebook
if you're happy with the view, dump it into html
The cause is (most likely) that pyLDAviz expects continuous user interaction (including user-initiated "exit"). However, I rather dump data from a smart IDE and read that into jupyter, than develop/code in jupyter notebook. That's pretty much like going back to before-emacs times.
From experience this approach works quite nicely for other plotting rountines
If you received the module error pyLDA.gensim, then try this one instead:
import pyLdAvis.gensim_models
You get the error because of a new version update.

Resources