mlflow Exception: Run with UUID is already active - mlflow

Used mlflow.set_tracking_uri to set up tracking_uri and set_experiment, got an error and check back to run following code again. got an error that "Exception: Run with UUID is already active."
Try to use mlflow.end_run to end current run, but got RestException: RESOURCE_DOES_NOT_EXIST: Run UUID not found.
Currently stuck in this infinite loop. Any suggestion?
mlflow.set_experiment("my_experiment")
mlflow.start_run(run_name='my_project')
mlflow.set_tag('input_len',len(input))
mlflow.log_param('metrics', r2)

In my case, i was using mlflow.get_artifact_uri() just after set_tracking_uri().
Mlflow creates a run for get_artifact_uri() function and when we again try to start a run, it throws the above exception.
Buggy code
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment('Exp1')
print('artifact uri:', mlflow.get_artifact_uri())
with mlflow.start_run():
mlflow.log_param('SIZE',100)
Exception: Run with UUID f7d3c1318eeb403cbcf6545b061654e1 is already active. To start a new run, first end the current run with mlflow.end_run(). To start a nested run, call start_run with nested=True
So get_artifact_uri() has to be used inside the start_run content. and it worked.
Working code
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment('Exp1')
with mlflow.start_run():
print('artifact uri:', mlflow.get_artifact_uri())
mlflow.log_param('SIZE',100)

My case was slightly different, but I'm posting the solution here in case it helps newcomers to this thread. I was accidentally setting run tags before starting the run
mlflow.set_experiment('my_experiment')
mlflow.set_tag('input_len', len(input)) # Auto-creates a run ID
mlflow.start_run(run_name='my_project') # Tries to name the same run, throwing error
Ensuring that start_run came before all other logging/tags solved the issue

Related

Putting dbutils.notebook.exit() in both try and except will only shows exit in except

I'm running a child notebook and wanted to send the status of child notebook execution to master notebook using the exit output:
Code from child as follows:
try:
df.write.format("delta").mode("overwrite").saveAsTable("x.table_name")
dbutils.notebook.exit("x.table_name created Successfully")
except Exception as e:
dbutils.notebook.exit(f"x.table_name creation Failed {e}")
However It'll always show the "x.table_name creation Failed " despite successful execution.
Also, if appreciate some advise if this isn't the right approach for dbricks workflow.
Thanks in advance.
Let's try a very simple example. This is a try statement with two exception clauses. We can see that an notebook exit is considered an exception.
Now, lets fix your code using this knowledge. Regardless of an error, we want to exit the program. The declaration of the msg variable just tells the parent program the condition of the execution.
In the positive test case, we can see the exit command return a success message.
Every good programmer tests all paths thru code. Let's create a negative test case by using a Hive database name that does not exist.
In short, use the try/except to capture the return state. Use the dbutils.notebook.exit() to return this code at the end of the script.

Golem Task respons back with runtime error 2, can't determine the cause

Repo for all code I've been using is updated here . When I run the requestor script it exits with a runtime error 2 (File not found). I am not sure how to further debug this or fix it. So far I've converted my code over to a python slim docker image to better mirror the example. It also works for me when I spin up a docker image that typing and running "/golem/work/imageclassifier.py --trainmodel" works from root. I switched all my code to use absolute paths. I also did make sure the shebang (#!) uses linux end of file characters rather than windows before which was giving me errors. Fixed a bug where my script returns error code 2 when called with no args to now pass.
clf.fit(trainDataGlobal, trainLabelsGlobal)
pkl_file = "classifier.pkl"
with open(pkl_file, 'wb') as file:
pickle.dump(clf, file)
is the only piece I could think of that causes the issue, but as far as I can tell this is the proper way to pickle something in python. Requestor script is also heavily based on the simple service example and I tried to mirror my design to that. I just need help in getting more information while debugging, or guidance on how to move forward from here

create exception when python command generates a program.exe has stopped working type error

I am facing a problem with a program i am developing in Python 3.6 under Windows 10.
One particular command generates an unknown error, windows throws a 'program.exe has stopped working' message and the program exits.
The command is a 3d-model loader that is part of another python package (Panda3D). The crash is always associated with this command (and more particularly with a specific dll of the loader) and a particular file that it tries to open.
Since i cannot locate and therefore solve the faults in the dll (probably there is a bug there) i would like to just pass the problematic file and continue to the next one. But since python exits and i do not know the error type, the typical try, except does not work.
So, i would like to know if there is a way to predict this type of behavior in my code and prevent the program from exiting.
Many thanks for any help.
The pop-up "Program.exe has stopped working." can be caused by a variety of things and therefor there is no "one size fits all" type solution. But if you're certain that your problem is cause by a specific line of code you can always try something along the lines of :
try:
loader.loadModel("c/path/to/your/file")
except Exception as e:
print(e.message, e.args)
# your error-handling code here
Make sure the file path that you're giving to loadModel respects the following :
# WRONG:
loader.loadModel("c:\\Program Files\\My Game\\Models\\Model1.egg")
# RIGHT:
loader.loadModel("/c/Program Files/My Game/Models/Model1.egg")
Source : pandas3d official documentation

How to stop the whole test execution but with PASS status in RobotFramework?

Is there any way I can stop the whole robot test execution with PASS status?
For some specific reasons, I need to stop the whole test but still get a GREEN report.
Currently I am using FATAL ERROR which will raise a assertion error and return FAIL to report.
I was trying to create a user keyword to do this, but I am not really familiar with the robot error handling process, could anyone help?
There's an attribute ROBOT_EXIT_ON_FAILURE in BuiltIn.py, and I am thinking about to create another attribute like ROBOT_EXIT_ON_SUCCESS, but have no idea how to.
Environment: robotframework==3.0.2 with Python 3.6.5
There is nothing built-in to support this. By design, a fatal error will cause all remaining tests and suites to have a FAIL status.
Just about your only choice is to write a keyword that sets a global variable, and then have every test include a setup that uses pass execution if to skip the test if the flag is set.
If I understood you correctly, you need to pass the test execution forcefully and return green status for that test, is that right? You have a built in keyword "Pass Execution" for that. Did you try using that?

Azure: error not detected

I have an experiment in Azure. When I launch the run I obtain:
If you look at the top on the right you see that there is an error, but no module has it.
If I run the single module where (in this simple case) I know that the error has to be, I can highlight the specific error.
Is it a bug or am I doing something wrong?
I had a similar error once when, for some reason, a module (not created by me) was simply under another one. So the error was shown, but I couldn't see that module.

Resources