Cancel a experiment runs in AzureML using python - azure-machine-learning-service

Found that some of the runs did not get failed and were in running state. I assume these might be consuming compute targets.
I tried to cancel the run which are in Running state from Azure ML Runs web UI, but was not working.

First get the experiment object that have the Runs to cancel.
experiment = Experiment.list(workspace=workspace_obj, experiment_name='experiment_name')[0]
or
experiment = ws.experiments['experiment_name']
Cancel all runs which are in running state.
for run in experiment.get_runs():
print(run.id)
if run.status=="Running":
run.cancel()
Again check the status of each run:
for run in experiment.get_runs():
print(run.id)
print(run.status)

Related

How to setup an ADF pipeline that isolates every pipeline run and create its own computer resources?

I have a simple pipeline in ADF that is triggered by a Logic App every time someone submits a file as response in a Microsoft forms. The pipeline creates a cluster based in a Docker and then uses a Databricks notebook to run some calculations that can take several minutes. 
The problem is that every time the pipeline is running and someone submits a new response to the forms, it triggers another pipeline run that, for some reason, will make the previous runs to fail.
The last pipeline will always work fine, but earlier runs will show this error:
 > Operation on target "notebook" failed: Cluster 0202-171614-fxvtfurn does not exist 
However, checking the parameters of the last pipeline it uses a different cluster id, 0202-171917-e616dsng for example.
 It seems that for some reason, the computers resources for the first run are relocated in order to be used for the new pipeline run. However, the IDs of the cluster are different.
I have set up the concurrency up to 5 in the pipeline general settings tab, but still getting the same error. 
Concurrency setup screenshot
Also, in the first connector that looks up for the docker image files I have the concurrency set up to 15, but this won’t fix the issue 
look up concurrency screenshot
To me, it seems a very simple and common task when it comes to automation and data workflows, but I cannot figure it out.
I really appreciate any help and suggestions, thanks in advance
The best way would be use an existing pool rather than recreating the pool everytime

How to stop Azure function app container started from Azure cloud shell?

Using Azure cloud shell to make changes and test locally. After changes are made, starting function app container using func start --verbose. Before making further changes and test again, need to stop the container first. What is the recommended way to do it? Tried ctrl+c, ctrl-z it takes about ~5 mins to ~12 mins everytime and then control returns to the prompt.
Stuck in terminating after printing the following logs
[2022-08-11T07:28:16.777Z] Language Worker Process exited. Pid=515.
[2022-08-11T07:28:16.777Z] python3 exited with code 1 (0x1). .
[2022-08-11T07:28:16.778Z] Exceeded language worker restart retry count for runtime:python. Shutting down and proactively recycling the Functions Host to recover
The func start command is used to run the function. In a background it will trigger function required components like configurations, host, port, etc.,
Whenever we change any configuration in a function the function and container will restart.
If we run the function, it will allocate specific resources and required packages & files. If stop the function in between it will reallocate the resources and release the file components. So, it will take some time to release the control to the prompt.
Before making further changes and test again, need to stop the container first. What is the recommended way to do it?
You can build the container image environment to test it locally. Keep your Dockerfile in root project it will gives the required environment to run the Function App in a container.

Airflow test task works, but in dag run fails

I'm wondering why it is possible that the following command works:
airflow test [dag_id] [task_id] 20200421
but that the same task fails if I trigger the dag manually in the UI.
The task itself is quite easy, it is basically:
cmd = 'ls' # other command
os.system(cmd)
the os library is imported, and like said above, in testing it does work, but in running it does not. My code is in python, and this specific dag needs to run a specific command in the terminal.
Have you got any idea how this is possible?
If you need more info, let me know in the comments!
Answer:
This problem is due to the different user that runs the script.
airflow run uses a different user (and (sub-) processes) as airflow test. Switching to the airflow user does not work, but providing the airflow user with more rights (in linux) should work.
One possible reason for this behaviour could be, your task that was executed earlier cached in the DB.
So the test works but when you call airflow to run the DAG it fails as it already running in the background or the state is cached in database. Try running $ airflow resetdb

How to schedule Azure Machine learning experiment periodically and also give admin to run it when he wants

I have a published AzureML experiment and now I want to schedule that experiment periodically and also want to give flexibility to admin to run that experiment when he wants.
I tried running experiment periodically using Logic App azure service but getting an error "This session has timed out. To see the latest run status, navigate to the runs history blade.". Can anyone help me out?

Starting node server in azure batch startup

I am new to Azure batch. I am working in windows environment.
My requirement is, a node js server should be running before any batch task runs on machine.
I have tried to start the node server in job preparation task as well as pool start task with following task command line statement
cmd /c start node.exe my_js_file.js
But as soon as start task completes , Node server running on machine dies.
If I do not use start in above command , node server starts and keeps running but start task also keeps running and never completes.
What can I do to start node js server in background in azure batch.
I have also tried to start node server when a new task executes (which is a command line application). But as soon as task completes, node process also gets killed.
In order create a detached process that runs forever, you have two options. Either option can be done from a job prep task or a start task, but be warned that if you have multiple jobs requiring the same node.js server context to start, you may encounter errors. Please ensure that if utilizing this at the job-level, that you specify a job release task that kills the long running process correctly. Also be aware that if you allow multiple tasks to be co-scheduled on the same node, that there can be interaction conflicts if they require the same long-lived process.
The recommended way is to install a Windows service that runs your command. There are various ways to bootstrap a service, including use the commandline sc program or the myriad of helper programs to do this on your behalf.
If you do not want to (or cannot) install a Windows service, you can create a C++ program that invokes your command as a "breakaway process." Please consult the MSDN CreateProcess documentation and ensure that you specify the CREATE_BREAKAWAY_FROM_JOB flag for dwCreationFlags. This task must be run with elevated (administrator) privileges. It's also recommended that you start your process in a folder outside of the startup task working directory that is default (such that compute node restarts don't affect possible files that you may generate in the current working directory).

Resources