Why is google cloud scheduler not respecting attemptDeadline - google-cloud-scheduler

I have created an http cloud scheduler task. I'm expecting it to have a maximum run time of 5 minutes. However my task is reporting DEADLINE_EXCEEDED after exactly 1 minute.
When I run gcloud scheduler jobs describe MySyncTask to view my task it reports attemptDeadline: 300s. The service I am calling is cloud run and I have also set a 300s limit.
I am running the task manually by clicking "force a job run" in the GUI.
After 1 minute exactly in the logs it reports DEADLINE_EXCEEDED

When you execute a job from the GUI, it will be executed using the default attemptDeadline value, which is 60 seconds according to this question.
If you want to run it manually, I suggest to run the job from the Cloud Shell and pass the --attempt-deadline flag with the desired value, as shown on this answer:
gcloud beta scheduler jobs update http <job> --attempt-deadline=1800s --project <project>

Related

How to get job logs if parallel jobs are running in gitlab project

In a pipeline I have 2 jobs , second job parses the logs of first job and for that I am using below API to get job id
https://source.golabs.io/api/v4/projects/<id>/jobs?scope[]=success
Now issue is what will happen if I will execute multiple parallel runs using that pipeline.how I can differentiate the job logs in respective pipelines.

Delay a job in azure pipeline YAML

How to delay a job in azure devops pipelines, I have multiple that will be running simultaneously, the problem is in the checkout phase I get the error saying files are used by another process.
I found "delayForMinutes" and running a powershell script but they only work for tasks not for jobs.
My goal is to have the checkout phase for the job to be delayed not the tasks in it.
You can do something like after the checkout job add a agentless Job with in that you can include a delay task. Then you can continue the other task in a separate agent job

Stop azure databricks cluster after threshold time of job execution

I need to know , how to stop a azure databricks cluster by doing configuration when it is running infinitely for executing a job.(without manual stopping)and as well as create an email alert for it, as the job running time exceeds its usual running time.
You can do this in the Jobs UI, Select your job, under Advanced, edit the Alerts and Timeout values.
This Databricks docs page may help you: https://docs.databricks.com/jobs.html

Automatically spawn an Azure Batch AI job periodically

I want to automatically start a job on an Azure Batch AI cluster once a week. The jobs are all identical except for the starting time. I thought of writing a PowerShell Azure Function that does this, but Azure Functions v2 doesn't support PowerShell and I don't want to use v1 in case it will be phased out. I would prefer not to do this in C# or Java. How can I do this?
Currently, there's no option available to trigger a job on Azure Batch AI cluster. Maybe you want to run a shell script which in turn can create a regular schedule using system's task scheduler. Please see if this doc by Said Bleik helps:
https://github.com/saidbleik/batchai_mm_ad#scheduling-jobs
I assume this way you can add multiple schedules for the job!
Azure Batch portal has "Job schedules" tab. You can go there, add a Job, and set a schedule for the Job. You can specify the recurrence in the Schedule
Scheduled jobs
Job schedules enable you to create recurring jobs within the Batch service. A job schedule specifies when to run jobs and includes the specifications for the jobs to be run. You can specify the duration of the schedule--how long and when the schedule is in effect--and how frequently jobs are created during the scheduled period.

PBS automatically restart failed jobs

I use PBS job arrays to submit a number of jobs. Sometimes a small number of jobs get screwed up and not been ran successfully. Is there a way to automatically detect the failed jobs and restart them?
pbs_server supports automatic_requeue_exit_code:
an exit code, defined by the admin, that tells pbs_server to requeue the job instead of considering it as completed. This allows the user to add some additional checks that the job can run meaningfully, and if not, then the job script exits with the specified code to be requeued.
There is also a provision for requeuing jobs in the case where the prologue fails (see the prologue/epilogue script documentation).
There are probably more sophisticated ways of doing this, but they would fall outside the realm of built-in Torque options.

Resources