Update task status from external application in Databricks - databricks

I am having a workflow with a task that is dependant on external application execution (not residing in Databricks). After external application finishes, how to update the status of a task to complete. Currently, Jobs API doesn't support status updates.

Runs Cancel is available
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/cancel \
--data '{ "run_id": <run-id> }'
More Details

Related

How can we configure an existing cluster to run with Databricks notebook being triggered using REST API calls through shell script?

We are trying to trigger an existing Azure Databricks Notebook using a REST API Call through the shell script. There are existing clusters running in the workspace. We want to attach the Databricks notebook with an existing cluster and trigger the Notebook
We are trying to figure out the configuration and the REST API Call that can trigger the notebook with a specific cluster dynamically at the run time.
I have reproduced the above and got the below results.
Here, I have created two clusters C1 and C2 and two notebooks Nb1 and Nb2.
My Nb1 notebook code for sample:
print("Hello world")
I have created Nb1 job and executed with C1 cluster by using the below shell script from Nb2 which is attached with C2.
%sh
curl -n --header "Authorization: Bearer <Access token>" \
-X POST -H 'Content-Type: application/json' \
-d '{
"run_name": "My Notebook run",
"existing_cluster_id": "<cluster id>",
"notebook_task":
{
"notebook_path": "<Your Notebook path>"
}
}' https://<databricks-instance>/api/2.0/jobs/runs/submit
Execution from Nb2:
Job Created:
Job Output:

Prefect2.0 How to trigger a flow using just curl?

Here is my dead simple flow:
from prefect import flow
import datetime
#flow
def firstflow(inreq):
retval={}
retval['type']=str(type(retval))
retval['datetime']=str(datetime.datetime.now())
print(retval)
return retval
I run prefect orion and prefect agent.
Make a trigger using web ui (deployments run) ... the agent succesfully pull and do the job.
My question is how to do the trigger using just curl?
Note : I already read http://127.0.0.1:4200/docs.
but my lame brain couldn't find how to do it.
note:
Lets say my flow id is : 7ca8a456-94d7-4aa1-80b9-64894fdca93b
Parameters I want to be processed is {'msg':'Hello world'}
blindly Tried with
curl -X POST -H 'Content-Type: application/json' http://127.0.0.1:4200/api/flow_runs \
-d '{"flow_id": "7ca8a456-94d7-4aa1-80b9-64894fdca93b", "parameters": {"msg": "Hello World"}, "tags": ["test"]}'
but prefect orion say
INFO: 127.0.0.1:53482 - "POST /flow_runs HTTP/1.1" 307 Temporary Redirect
Sincerely
-bino-
It's certainly possible to do it via curl but it might be painful especially if your flow has parameters. There's much easier way to trigger a flow that will be tracked by the backend API - run the flow Python script and it will have exactly the same effect. This is because the (ephemeral) backend API of Prefect 2.0 is always active in the background and all flow runs, even those started from a terminal, are tracked in the backend.
Regarding curl, it looks like you are missing the trailing slash after flow_runs. Changing your command to this one should work:
curl -X POST -H 'Content-Type: application/json' http://127.0.0.1:4200/api/flow_runs/ \
-d '{"flow_id": "7ca8a456-94d7-4aa1-80b9-64894fdca93b", "parameters": {"msg": "Hello World"}, "tags": ["test"]}'
The route which might be more helpful, though, is this one - it will create a flow run from a deployment and set it into a scheduled state - the default state is pending, which would cause the flow run to be stuck. This should work directly:
curl -X POST -H 'Content-Type: application/json' \
http://127.0.0.1:4200/api/deployments/your-uuid/create_flow_run \
-d '{"name": "curl", "state": {"type": "SCHEDULED"}}'

Download the latest artifacts of failed gitlab pipeline

I want to automatically load test results of my gitlab pipeline (basically one xml file) into my project management application. No direct connection between these two is possible. However the gitlab API offers the following options to download artifacts of a pipeline:
all artifacts of the latest successful pipeline run (selected by job name)
GET /projects/:id/jobs/artifacts/:ref_name/download?job=name`
all artifacts of a specific job (selected by job id)
GET /projects/:id/jobs/:job_id/artifacts
a single artifact file (selected by job id)
GET /projects/:id/jobs/:job_id/artifacts/*artifact_path
My current situation is following:
I have test reports which are saved inside the job artifacts when running the pipeline. The artifacts are created on every run of the pipeline independent of its outcome
gitlab-ci.yaml
...
artifacts:
when: always
...
The artifact I am trying to download has a dynamic name
./reports/junit/test-results-${CI_JOB_ID}.xml
If I now want to download the latest test results to a different server than the gitlab server, I have to realize that I don't know the latest job ID, which means:
I can't access the artifact directly because it has a dynamic name
I can't access the artifacts of a specific job
I can access the artifacts of the latest job, but only if it was successful
This leaves me with the situation, that i only get to download the latest test results, if nothing went wrong while testing. To put it mildly, this is suboptimal.
Is there some way to download the artifacts from the latest job run (without knowing the job ID), independent of its outcome?
Is there some way to download the artifacts from the latest job run
(without knowing the job ID), independent of its outcome?
In order to achieve this we will use the Gitlab API in combination with the jq package.
Let's break down this question into components.
Firstly, we need to find out the id of the last executed pipeline for this project. https://docs.gitlab.com/ee/api/pipelines.html#list-project-pipelines
GET /projects/:id/pipelines
For this call you will need your access token, if you don't have one already check
https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#create-a-personal-access-token
https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html#create-a-project-access-token
You will also need the Project ID
LAST_PIPELINE_ID=$(curl -s --header "PRIVATE-TOKEN: <access_token>" https://gitlab.com/api/v4/projects/<project_id>/pipelines | jq '.[0].id')
Next we will retrieve the job id by providing the job name by using the following API
https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs
GET /projects/:id/pipelines/:pipeline_id/jobs
In your case you need to change the following example with your job's name, in this example let's call it my_job
JOB_ID=$(curl -s --header "PRIVATE-TOKEN: <access_token>" https://gitlab.com/api/v4/projects/<project_id>/pipelines/$LAST_PIPELINE_ID/jobs | jq '.[] | select(.name=="my_job") | .id')
Now we are ready to actually retrieve the artifacts, with the following API
GET /projects/:id/jobs/:job_id/artifacts
https://docs.gitlab.com/ee/api/job_artifacts.html#get-job-artifacts
wget -U "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.17 (KHTML,like Gecko) Ubuntu/11.04 Chromium/11.0.654.0 Chrome/11.0.654.0 Safari/534.17" --header "PRIVATE-TOKEN: <access_token>" "https://gitlab.com/api/v4/projects/<project_id>/jobs/$JOB_ID/artifacts" -O artifacts.zip
The artifacts are available as artifacts.zip in the folder you executed wget from
Combining them here for clarity
LAST_PIPELINE_ID=$(curl -s --header "PRIVATE-TOKEN: <access_token>" https://gitlab.com/api/v4/projects/<project_id>/pipelines | jq '.[0].id')
JOB_ID=$(curl -s --header "PRIVATE-TOKEN: <access_token>" https://gitlab.com/api/v4/projects/<project_id>/pipelines/$LAST_PIPELINE_ID/jobs | jq '.[] | select(.name=="my_job") | .id')
wget -U "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.17 (KHTML,like Gecko) Ubuntu/11.04 Chromium/11.0.654.0 Chrome/11.0.654.0 Safari/534.17" --header "PRIVATE-TOKEN: <access_token>" "https://gitlab.com/api/v4/projects/<project_id>/jobs/$JOB_ID/artifacts" -O artifacts.zip

How to get status of Spark jobs on YARN using REST API?

A spark application can run many jobs. My spark is running on yarn. Version 2.2.0.
How to get job running status and other info for a given application id, possibly using REST API?
job like follows:
enter image description here
This might be late but putting it for convenience. Hope it helps. You can use below Rest API command to get the status of any jobs running on YARN.
curl --negotiate -s -u : -X GET 'http://resourcemanagerhost:8088/ws/v1/cluster/apps/application_121766109986_12343/state'
O/P - {"state":"RUNNING"}
Throughout the job cycle the state will vary from NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED
You can use jq for a formatted output.
curl --negotiate -s -u : -X GET 'http://resourcemanagerhost:8088/ws/v1/cluster/apps/application_121766109986_12343'| jq .app.state
O/P - "RUNNING"
YARN has a Cluster Applications API. This shows the state along with other information. To use it:
$ curl 'RMURL/ws/v1/cluster/apps/APP_ID'
with your application id as APP_ID.
It provides:

gitlab-ci: setup every day builds

Really do not understand how I can setup daily scheduler in gitlab . I have simple application and I need automatically build it every day at 8.00 morning.
I tried with Following https://gitlab.com/help/ci/triggers/README.md ,
but i do not understand how can I run this crone job?
30 0 * * * curl --request POST --form token=TOKEN --form ref=master https://gitlab.example.com/api/v3/projects/9/trigger/builds
This is also unacceptable http://cloudlady911.com/index.php/2016/11/02/how-to-schedule-a-job-in-gitlab-8-13/
because I must manually run it from pipeline.
Any solutions?
Now you can setup schedules in gitlab natively to run any pipeline each day.
Whether you craft a script or just run cURL directly, you can trigger
jobs in conjunction with cron. The example below triggers a job on the
master branch of project with ID 9 every night at 00:30:
30 0 * * * curl --request POST --form token=TOKEN --form ref=master https://gitlab.example.com/api/v3/projects/9/trigger/builds
This triggers script in your .gitlab-ci.yml. The assumption is you have ur deployment script prepared in this file. So it will execute stages step by step and if ur step is deployment, it will deploy your application.

Resources