Changing subdirectory of MLflow artifact store - mlflow

Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:
s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json
MlFlow creates a 3/ key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?

As I commented above, yes, mlflow.create_experiment() does allow you set the artifact location using the artifact_location parameter.
However, sort of related, the problem with setting the artifact_location using the create_experiment() function is that once you create a experiment, MLflow will throw an error if you run the create_experiment() function again.
I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment() function again. And as of this post, MLfLow does not have check_if_exists flag or a create_experiments_if_not_exists() function.
To make things more frustrating, you cannot set the artifcact_location in the set_experiment() function either.
So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well.
:
import os
from random import random, randint
from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException
try:
experiment = mlflow.get_experiment_by_name('oof')
experiment_id = experiment.experiment_id
except AttributeError:
experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')
with mlflow.start_run(experiment_id=experiment_id) as run:
mlflow.set_tracking_uri('http://localhost:5000')
print("Running mlflow_tracking.py")
log_param("param1", randint(0, 100))
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs")
If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id does not exist and the except code block gets executed creating the experiment.
If it is the second, third, etc the code is run, it will only execute the code under the try statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.

Related

How to write batch of data to Django's sqlite db from a custom written file?

For a pet project I am working on I need to import list of people to sqlite db. I have 'Staff' model, as well as a users.csv file with list of users. Here is how I am doing it:
import csv
from staff.models import Staff
with open('users.csv') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
firstname = row['firstname']
lastname = row['lastname']
email = row['email']
staff = Staff(firstname=firstname, lastname=lastname, email=email)
staff.save()
csv_file.close()
However, I am getting below error message:
raise ImproperlyConfigured(
django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.
Is what I am doing correct? If yes what I am missing here?
Django needs some environment variables when it is being bootstrapped to run. DJANGO_SETTINGS_MODULE is one of these, which is then used to configure Django from the settings. Typically many developers don't even notice because if you stay in Django-land it isn't a big deal. Take a look at manage.py and you'll notice it sets it in that file.
The simplest thing is to stay in django-land and run your script in its framework. I recommend creating a management command. Perhaps a more proper way is to create a data migration and put the data in a storage place like S3 if this is something many many people need to do for local databases... but it seems like a management command is the way to go for you. Another option (and the simplest if this is really just a one-time thing) is to just run this from the django shell. I'll put that at the bottom.
It's very simple and you can drop in your code almost as you have it. Here are the docs :) https://docs.djangoproject.com/en/3.2/howto/custom-management-commands/
For you it might look something like this:
/app/management/commands/load_people.py <-- the file name here is what manage.py will use to run the command later.
from django.core.management.base import BaseCommand, CommandError
import csv
from staff.models import Staff
class Command(BaseCommand):
help = 'load people from csv'
def handle(self, *args, **options):
with open('users.csv') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
firstname = row['firstname']
lastname = row['lastname']
email = row['email']
staff = Staff(firstname=firstname, lastname=lastname, email=email)
staff.save()
# csv_file.close() # you don't need this since you used `with`
which you would call like this:
python manage.py load_people
Finally, the simplest solution is to just run the code in the Django shell.
python manage.py shell
will open up an interactive shell with everything loaded properly. You can execute your code there and it should work.

How to write solution to a .sol file while using cbc in pyomo? How can I get the name of the .sol file after model is solved?

I am using pyomo on Jupyter Notebook. I have kept keepfiles = true in solve.I am able to get the location of .sol file where it is stored. How can I get the filename of the .sol file created for the current instance?
I have used following:
from pyomo.opt import SolverFactory
SolverFactory("cbc").options['solu']="solution_file.sol"
But this does not work in creating the desired solution file.
If you add the keepfiles=True option to your call to solve the temporary files that are used to pass the model to the solver and read in the results will not be deleted and the path to them will be printed on the screen. So I would create and call your solver using something like:
from pyomo.opt import SolverFactory
solver = SolverFactory("cbc")
solver.solve(model, keepfiles=True)

Delete a run in the experiment of mlflow from the UI so the run does not exist in backend store

I found deleting a run only change the state from active to deleted, because the run is still visible in the UI if searching by deleted.
Is it possible to remove a run from the UI to save the space?
When removing a run, does the artifact correspond to the run is also removed?
If not, can the run be removed through rest call?
The accepted answer indeed deletes the experiment, not the run of an experiment.
In order to remove the directory one can use the mlflow API. Here is the script that removes all deleted runs:
import mlflow
import shutil
def get_run_dir(artifacts_uri):
return artifacts_uri[7:-10]
def remove_run_dir(run_dir):
shutil.rmtree(run_dir, ignore_errors=True)
experiment_id = 1
deleted_runs = 2
exp = mlflow.tracking.MlflowClient(tracking_uri='./mlflow/mlruns')
runs = exp.search_runs(str(experiment_id), run_view_type=deleted_runs)
_ = [remove_run_dir(get_run_dir(run.info.artifact_uri)) for run in runs]
You can't do it via the web UI but you can from a python terminal
import mlflow
mlflow.delete_experiment(69)
Where 69 is the experiment ID
Whilst Grzegorz already provided a solution, I just wanted to provide an alternative solution using the MLFlow cli.
The cli has a command, mlfLow gc, which deletes runs in the deleted lifecycle stage.
check https://mlflow.org/docs/latest/cli.html#mlflow-gc

Python: Dynamicaly imported module fails when first created, then succeeds

I have a template engine named Contemplate which has implementations for php, node/js and python.
All work fine except lately the python implementation gives me some issues. Specificaly the problem appears when first parsing a template and creating the template python code which is then dynamically imported as a module. When template is already created everything works fine but when template needs to be parsed and saved to disk and THEN imported it raises an error eg
ModuleNotFoundError: No module named 'blah blah'
(note this error appears to be random, it is not always sure that it will be raised, many times it works even if template is created just before importing, other times it fails and then if ran again with template already created it succeeds)
Is there any way I can bypass this issue, maybe add a delay between saving a parsed template and then importing as module or somethig else?
The code to import the module (the parsed template which is now a python class) is below:
def import_tpl( filename, classname, cacheDir, doReload=False ):
# http://www.php2python.com/wiki/function.import_tpl/
# http://docs.python.org/dev/3.0/whatsnew/3.0.html
# http://stackoverflow.com/questions/4821104/python-dynamic-instantiation-from-string-name-of-a-class-in-dynamically-imported
#_locals_ = {'Contemplate': Contemplate}
#_globals_ = {'Contemplate': Contemplate}
#if 'execfile' in globals():
# # Python 2.x
# execfile(filename, _globals_, _locals_)
# return _locals_[classname]
#else:
# # Python 3.x
# exec(read_file(filename), _globals_, _locals_)
# return _locals_[classname]
# http://docs.python.org/2/library/imp.html
# http://docs.python.org/2/library/functions.html#__import__
# http://docs.python.org/3/library/functions.html#__import__
# http://stackoverflow.com/questions/301134/dynamic-module-import-in-python
# http://stackoverflow.com/questions/11108628/python-dynamic-from-import
# also: http://code.activestate.com/recipes/473888-lazy-module-imports/
# using import instead of execfile, usually takes advantage of Python cached compiled code
global _G
getTplClass = None
# add the dynamic import path to sys
basename = os.path.basename(filename)
directory = os.path.dirname(filename)
os.sys.path.append(cacheDir)
os.sys.path.append(directory)
currentcwd = os.getcwd()
os.chdir(directory) # change working directory so we know import will work
if os.path.exists(filename):
modname = basename[:-3] # remove .py extension
mod = __import__(modname)
if doReload: reload(mod) # Might be out of date
# a trick in-order to pass the Contemplate super-class in a cross-module way
getTplClass = getattr( mod, '__getTplClass__' )
# restore current dir
os.chdir(currentcwd)
# remove the dynamic import path from sys
del os.sys.path[-1]
del os.sys.path[-1]
# return the tplClass if found
if getTplClass: return getTplClass(Contemplate)
return None
Note the engine creates a __init__.py file in cacheDir if it is not there already.
If needed I can change the import_tpl function to sth else I dont mind.
Python tested is python 3.6 on windows but I dont think this is a platform-specific issue.
To test the issue you can download the github repository (linked above) and run the /tests/test.py test after clearing all cached templates from /tests/_tplcache/ folder
UPDATE:
I am thinking of adding a while loop with some counter in import_tpl that catches the error raised if any and retries a specified amount of times until it succeeds to import the module. But I am also wondering if this is a good solution or there is something else I am missing here..
UPDATE (20/02/2019):
Added a loop to retry a specified amount of times plus a small delay of 1 sec if initially failed to import template module (see online repository code), but still it raises same error sometimes when templates are first created before being imported. Any solutions?
Right, if you use a "while" loop with to handle exceptions would be one way.
while True:
try:
#The module importing
break
except ModuleNotFoundError:
print("NOPE! Module not found")
If it works for some other, an not other "module" files, the likely suspect is the template files the template files themselves.

Importing one module from different other modules only executes it once. Why?

I am confused about some behavior of Python. I always thought importing a module basically meant executing it. (Like they say here: Does python execute imports on importation) So I created three simple scripts to test something:
main.py
import config
print(config.a)
config.a += 1
print(config.a)
import test
print(config.a)
config.py
def get_a():
print("get_a is called")
return 1
a = get_a()
test.py
import config
print(config.a)
config.a += 1
The output when running main.py is:
get_a is called
1
2
2
3
Now I am confused because I expected get_a() to be called twice, once from main.py and once from test.py. Can someone please explain why it is not? What if I really wanted to import config a second time, like it was in the beginning with a=1?
(Fortunately, for my project this behavior is exactly what I wanted, because get_a() corresponds to a function, which reads lots of data from a database and of course I only want to read it once, but it should be accessible from multiple modules.)
Because the config module is already loaded so there's no need to 'run' it anymore, just return the loaded instance.
Some standard library modules make use of this, from example random. It creates an object of class Random on first import and reuses it when it gets imported again. A comment on the module reads:
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.

Resources