Delete a run in the experiment of mlflow from the UI so the run does not exist in backend store - mlflow

I found deleting a run only change the state from active to deleted, because the run is still visible in the UI if searching by deleted.
Is it possible to remove a run from the UI to save the space?
When removing a run, does the artifact correspond to the run is also removed?
If not, can the run be removed through rest call?

The accepted answer indeed deletes the experiment, not the run of an experiment.
In order to remove the directory one can use the mlflow API. Here is the script that removes all deleted runs:
import mlflow
import shutil
def get_run_dir(artifacts_uri):
return artifacts_uri[7:-10]
def remove_run_dir(run_dir):
shutil.rmtree(run_dir, ignore_errors=True)
experiment_id = 1
deleted_runs = 2
exp = mlflow.tracking.MlflowClient(tracking_uri='./mlflow/mlruns')
runs = exp.search_runs(str(experiment_id), run_view_type=deleted_runs)
_ = [remove_run_dir(get_run_dir(run.info.artifact_uri)) for run in runs]

You can't do it via the web UI but you can from a python terminal
import mlflow
mlflow.delete_experiment(69)
Where 69 is the experiment ID

Whilst Grzegorz already provided a solution, I just wanted to provide an alternative solution using the MLFlow cli.
The cli has a command, mlfLow gc, which deletes runs in the deleted lifecycle stage.
check https://mlflow.org/docs/latest/cli.html#mlflow-gc

Related

Azure ML Pipeline - Error: Message: "Missing data for required field". Path: "environment". value: "null"

I am trying to create a pipeline with Python SDK v2 in Azure Machine Learning Studio. Been stuck on this error for many.. MANY.. hours now, so now I am reaching out.
I have been following this guide: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-python-sdk
My setup is very similar, but I split "data_prep" into two separate steps, and I am using a custom ml model.
How the pipeline is defined:
`
# the dsl decorator tells the sdk that we are defining an Azure ML pipeline
from azure.ai.ml import dsl, Input, Output
import pathlib
import os
#dsl.pipeline(
compute=cpu_compute_target,
description="Car predict pipeline",
)
def car_predict_pipeline(
pipeline_job_data_input,
pipeline_job_registered_model_name,
):
# using data_prep_function like a python call with its own inputs
data_prep_job = data_prep_component(
data=pipeline_job_data_input,
)
print('-----------------------------------------------')
print(os.path.realpath(str(pipeline_job_data_input)))
print(os.path.realpath(str(data_prep_job.outputs.prepared_data)))
print('-----------------------------------------------')
train_test_split_job = traintestsplit_component(
prepared_data=data_prep_job.outputs.prepared_data
)
# using train_func like a python call with its own inputs
train_job = train_component(
train_data=train_test_split_job.outputs.train_data, # note: using outputs from previous step
test_data=train_test_split_job.outputs.test_data, # note: using outputs from previous step
registered_model_name=pipeline_job_registered_model_name,
)
# a pipeline returns a dictionary of outputs
# keys will code for the pipeline output identifier
return {
# "pipeline_job_train_data": train_job.outputs.train_data,
# "pipeline_job_test_data": train_job.outputs.test_data,
"pipeline_job_model": train_job.outputs.model
}
`
I managed to run every single component successfully, in order, via the command line and produced a trained model. Ergo the components and data works fine, but the pipeline won't run.
I can provide additional info, but I am not sure what is needed and I do not want to clutter the post.
I have tried googling. I have tried comparing the tutorial pipeline with my own. I have tried using print statements to isolate the issue. Nothing has worked so far. Nothing that I have done has changed the error either, it's the same error no matter what.
Edit:
Some additional info about my environment:
from azure.ai.ml.entities import Environment
custom_env_name = "pipeline_test_environment_pricepredict_model"
pipeline_job_env = Environment(
name=custom_env_name,
description="Environment for testing out Jeppes model in pipeline building",
conda_file=os.path.join(dependencies_dir, "conda.yml"),
image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
version="1.0",
)
pipeline_job_env = ml_client.environments.create_or_update(pipeline_job_env)
print(
f"Environment with name {pipeline_job_env.name} is registered to workspace, the environment version is {pipeline_job_env.version}"
)
Build status of environment. It had already run successfully.
In azure machine learning studio, when the application was running and model was deployed we have the default options to get the curated environments or custom environments. If the environment was created based on the existing deployment, we need to check with the build was successful or not.
Until we get the success in the deployment, we cannot get the environment variables noted into the program and we cannot retrieve the variables through the code block.
Select the environment need to be used.
Choose the existing version created.
We will get the mount location details and the docker file if creating using the docker and conda environment.
The environment and up and running successfully. If the case is running, then using the asset ID or the mount details we can retrieve the environment variables information.
/mnt/batch/tasks/shared/LS_root/mounts/clusters/workspace-name/code/files/docker/Dockerfile

Python: PySVN checkout and update certain directories

I am using PySVN to checkout and update from a repo but don't want to checkout the entire repo. I only want certain folders and files so I am thinking I need to a sparse checkout and update on the files and folders i want.
My repo structure (example)
\Repo1
\animals
\dogs
\cats
\fish
\automobiles
\cars
\motorcycles
\aircraft
I can do a sparse checkout via the following code of a single directory with the following code:
def run_remote(client,remote_path, local_path):
# sparse checkout
client.checkout(remote_path, local_path,depth=pysvn.depth.empty)
def update_remote(client,update_local_test_path, local_path):
client.update(update_local_test_path,depth=pysvn.depth.infinity)
if __name__ == "__main__":
import pysvn
import os
import pprint
DESKTOP_PATH = os.path.join(os.path.expanduser("~"), 'Desktop')
REPO_SVN_URL = "https://snv-test/repo1"
REMOTE_TEST_PATH = "https://snv-test/repo1/animals"
LOCAL_TEST_PATH = os.path.join(DESKTOP_PATH, 'repo1')
#c:\users\desktop\repo1
UPDATE_LOCAL_TEST_PATH = os.path.join(LOCAL_TEST_PATH, 'animals')
#c:\users\desktop\repo1\animals
client = pysvn.Client()
client.callback_get_login = login
client.callback_notify = notify
run_remote(client, REMOTE_TEST_PATH, LOCAL_TEST_PATH)
update_remote(client, UPDATE_LOCAL_TEST_PATH, LOCAL_TEST_PATH)
# FUTURE: LOOP THROUGH DIRECTORIES AND DO MULTIPLE UPDATES
# FUTRE: UPDATE DIRECTORIES BY REVISION
# FUTURE: GET CURRENT REVISION NUMBERS
My code runs the checkout successfully and creates a local folder called 'repo1' that is empty. The update runs with no errors, but does not perform any update. Meaning, no folders are added to the 'repo1' folder. I was expecting the 'animals" directory with all folders and files to be updated locally. I suspect i do not have my update arguments correct.
The ultimate goal is to have a list of directories i want to be able to configure to automate a checkout/update and even do an update by revision eventually. Just want to get the checkout and update working. I am sure there may be more efficient was to do this. Thanks in advance.

Python multiprocessing manager showing error when used in flask API

I am pretty confused about the best way to do what I am trying to do.
What do I want?
API call to the flask application
Flask route starts 4-5 multiprocess using Process module and combine results(on a sliced pandas dataframe) using a shared Managers().list()
Return computed results back to the client.
My implementation:
pos_iter_list = get_chunking_iter_list(len(position_records), 10000)
manager = Manager()
data_dict = manager.list()
processes = []
for i in range(len(pos_iter_list) - 1):
temp_list = data_dict[pos_iter_list[i]:pos_iter_list[i + 1]]
p = Process(
target=transpose_dataset,
args=(temp_list, name_space, align_namespace, measure_master_id, df_searchable, products,
channels, all_cols, potential_col, adoption_col, final_segment, col_map, product_segments,
data_dict)
)
p.start()
processes.append(p)
for p in processes:
p.join()
My directory structure:
- main.py(flask entry point)
- helper.py(contains function where above code is executed & calls transpose_dataset function)
Error that i am getting while running the same?
RuntimeError: No root path can be found for the provided module "mp_main". This can happen because the module came from an import hook that does not provide file name information or because it's a namespace package. In this case the root path needs to be explicitly provided.
Not sure what went wong here, manager list works fine when called from a sample.py file using if __name__ == '__main__':
Update: The same piece of code is working fine on my MacBook and not on windows os.
A sample flask API call:
#app.route(PREFIX + "ping", methods=['GET'])
def ping():
man = mp.Manager()
data = man.list()
processes = []
for i in range(0,5):
pr = mp.Process(target=test_func, args=(data, i))
pr.start()
processes.append(pr)
for pr in processes:
pr.join()
return json.dumps(list(data))
Stack has an ongoing bug preventing me from commenting, so I'll just write up an answer..
Python has 2 (main) ways to start a new process: "spawn", and "fork". Fork is a system command only available in *nix (read: linux or macos), and therefore spawn is the only option in windows. After 3.8 spawn will be the default on MacOS, but fork is still available. The big difference is that fork basically makes a copy of the existing process while spawn starts a whole new process (like just opening a new cmd window). There's a lot of nuance to why and how, but in order to be able to run the function you want the child process to run using spawn, the child has to import the main file. Importing a file is tantamount to just executing that file and then typically binding its namespace to a variable: import flask will run the flask/__ini__.py file, and bind its global namespace to the variable flask. There's often code however that is only used by the main process, and doesn't need to be imported / executed in the child process. In some cases running that code again actually breaks things, so instead you need to prevent it from running outside of the main process. This is taken into account in that the "magic" variable __name__ is only equal to "__main__" in the main file (and not in child processes or when importing modules).
In your specific case, you're creating a new app = Flask(__name__), which does some amount of validation and checks before you ever run the server. It's one of these setup/validation steps that it's tripping over when run from the child process. Fixing it by not letting it run at all is imao the cleaner solution, but you can also fix it by giving it a value that it won't trip over, then just never start that secondary server (again by protecting it with if __name__ == "__main__":)

Changing subdirectory of MLflow artifact store

Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:
s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json
MlFlow creates a 3/ key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?
As I commented above, yes, mlflow.create_experiment() does allow you set the artifact location using the artifact_location parameter.
However, sort of related, the problem with setting the artifact_location using the create_experiment() function is that once you create a experiment, MLflow will throw an error if you run the create_experiment() function again.
I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment() function again. And as of this post, MLfLow does not have check_if_exists flag or a create_experiments_if_not_exists() function.
To make things more frustrating, you cannot set the artifcact_location in the set_experiment() function either.
So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well.
:
import os
from random import random, randint
from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException
try:
experiment = mlflow.get_experiment_by_name('oof')
experiment_id = experiment.experiment_id
except AttributeError:
experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')
with mlflow.start_run(experiment_id=experiment_id) as run:
mlflow.set_tracking_uri('http://localhost:5000')
print("Running mlflow_tracking.py")
log_param("param1", randint(0, 100))
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs")
If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id does not exist and the except code block gets executed creating the experiment.
If it is the second, third, etc the code is run, it will only execute the code under the try statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.

OpenCV(imread) operation stuck in elastic beanstalk

I'm trying to read a png file and output the numpy matrix of the image in terminal using imread function of opencv on the server like this
import cv2
from flask import Flask
import os
#application.route('/readImage',methods=['POST'])
def handleHTTPPostRequest():
imagePath = f'{os.getcwd()}/input.png'
print('image path is', imagePath)
print(cv2.__version__)
im = cv2.imread(imagePath,cv2.IMREAD_COLOR)
print(im)
return 'success'
This is giving expected output on my local machine(Ubuntu 18.04) no matter howmany times I execute it. I moved this to elastic beanstalk(CentOS) with necessary setup. The request runs fine(gives proper logs along with success) the very first time I make a post call.
But when I make the post call second time, it's only outputting first two logs(imagepath and cv2 version) and is stuck there for a while. and after sometime, it's showing this error
End of script output before headers: application.py
I have added one more line just before cv2.imread just to make sure that the file exists
print('does the file exists',os.path.isfile(imagePath) )
This is returning true everytime. I have restarted the server multiple times, looks like it only works the very first time and cv2.imread() is stuck after the first post call.What am I missing
When you print from a request handler, Flask tries to do something sensible, but print really isn't what you want to be doing, as it risks throwing the HTTP request/response bookkeeping off.
A fully-supported way of getting diagnostic info out of a handler is to use the logging module. It will require a small bit of configuration. See http://flask.pocoo.org/docs/1.0/logging/
To anyone facing this issue, I have found a solution. Add this to your ebextensions config file
container_commands:
AddGlobalWSGIGroupAccess:
command: "if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;"
Saikiran's final solution worked for me. I was getting this issue when I tried calling methods from the opencv-python library. I'm running Ubuntu 18.04 locally and it works fine there. However, like Saikiran's original post, when deployed to Elastic Beanstalk the first request works and then the second one does not. For my EB environment, I'm using a Python3.6-based Amazon Linux server.

Resources