I'm using APScheduler(3.5.3) to run three different jobs. I need to trigger the second job immediately after the completion of first job. Also I don't know the completion time of first job.I have set trigger type as cron and scheduled to run every 2 hours.
One way I overcame this is by scheduling the next job at the end of each job. Is there any other way we can achieve it through APScheduler?
This can be achieved using scheduler events. Check out this simplified example adapted from the documentation (not tested, but should work):
def execution_listener(event):
if event.exception:
print('The job crashed')
else:
print('The job executed successfully')
# check that the executed job is the first job
job = scheduler.get_job(event.job_id)
if job.name == 'first_job':
print('Running the second job')
# lookup the second job (assuming it's a scheduled job)
jobs = scheduler.get_jobs()
second_job = next((j for j in jobs if j.name == 'second_job'), None)
if second_job:
# run the second job immediately
second_job.modify(next_run_time=datetime.datetime.utcnow())
else:
# job not scheduled, add it and run now
scheduler.add_job(second_job_func, args=(...), kwargs={...},
name='second_job')
scheduler.add_listener(my_listener, EVENT_JOB_EXECUTED | EVENT_JOB_ERROR)
This assumes you don't know jobs' IDs, but identify them by names. If you know the IDs, the logic would be simpler.
Related
In a previous question I asked how to queue a job B to start after job A, which is done with
sbatch --dependency=after:123456:+5 jobB.slurm
where 123456 is the id for job A, and :+5 denotes that it will start five minutes after job A.
I now need to do this for several jobs. Job B should depend on job A, job C on B, job D on C.
sbatch jobA.slurm will return Submitted batch job 123456, and I will need to pass the job id to the call with dependency for all but the first job. As I am using a busy cluster, I can't rely on incrementing the job ids by one, as someone might queue a job between.
As such I want to write a script that takes the job scripts (*.slurm) I want to run as arguments, e.g.
./run_jobs.sh jobA.slurm jobB.slurm jobC.slurm jobD.slurm
The script should then run, for all jobs scripts passed to it,
sbatch jobA.slurm # Submitted batch job 123456
sbatch --dependency=after:123456:+5 jobB.slurm # Submitted batch job 123457
sbatch --dependency=after:123457:+5 jobC.slurm # Submitted batch job 123458
sbatch --dependency=after:123458:+5 jobD.slurm # Submitted batch job 123459
What is an optimal way to do this with bash?
You can use the --parsable option to get the jobid of the previously submitted job:
#!/bin/bash
ID=$(sbatch --parsable $1)
shift
for script in "$#"; do
ID=$(sbatch --parsable --dependency=after:${ID}:+5 $script)
done
I have a DAG with one task that fetches the data from the API. I want that task to fetch the data only for certain time interval and marks itself as SUCCESS so that the tasks after that starts running.
Please note that the tasks below are dependent on the tasks which I want to mark SUCCESS. I know I can mark the task SUCCESS manually from CLI or UI but I want to do it automatically.
Is it possible to do that programmatically using python in Airflow?
You can set status of task using python code, like this:
def set_task_status(**kwargs):
execution_date = kwargs['execution_date']
ti = TaskInstance(HiveOperatorTest, execution_date)
ti.set_state(State.SUCCESS)
I have to perform 2 jobs - A and B. Job 'A' is to be prformed at 9:00 am of every weekday. I dont know the duration for job 'A' though, duration may vary.
Also I want to perform job 'B' after 3mins of completion of job 'A'.
Can anyone suggest the cron expression for this please.
Assuming you are trying to run the second job three minutes after the first job completes, let's say you have Job A which involves calling /home/user/job_a.sh and then once that completes, you want to run /home/user/job_b.sh. Instead of trying to set up two different Cron jobs, you could just make a separate script, say job_c.sh. And all job_c.sh does is run Job A, wait three minutes and then run Job B.
Basically, rather than calling two Cron Jobs and trying to sort out timing for both of them, you can just establish one Cron Job which runs both jobs.
On the other hand, if you want to run the second job three minutes after the first one starts then you might as well create two Cron Jobs with three minutes between them which would look something like this:
00 9 * * 1-5 /home/user/job_a.sh
03 9 * * 1-5 /home/user/job_b.sh
When I launch a computation on the cluster, I usually have a separate program doing the post-processing at the end :
sbatch simulation
sbatch --dependency=afterok:JOBIDHERE postprocessing
I want to avoid mistyping and automatically have the good job id inserted. Any idea? Thanks
You can do something like this:
RES=$(sbatch simulation) && sbatch --dependency=afterok:${RES##* } postprocessing
The RES variable will hold the result of the sbatch command, something like Submitted batch job 102045. The construct ${RES##* } isolates the last word (see more info here), in the current case the job id. The && part ensures you do not try to submit the second job in the case the first submission fails.
I have a cron job which runs every minute. Sometimes, if the cron is running more than a minute then another cron job is instantiated to do the same task. Hence duplicate cron jobs are created which is NOT I want. I want to make a conditional check that if a cron for a specific task is running, wait till the cron job completes or skip creating new cron job till the existing cron completes.
Create a text file somewhere which will store a value. (for example 0 or 1) When the task execute, change the value to 1. In the cron job, add a check that if the value in the file is 1 then don't execute the job. When your task is complete, remember to switch the value back to the default (for example 0).
You can even create a file when the task starts, and delete the file when task end, and only execute the cron job if file doesn't exist.
You can even put the check in the task itself instead of cluttering your cron table