In airflow, have others noticed a longer delay between tasks in a subdag vs tasks in the main dag - delay

I'm running airflow 1.9.0. I put together a dag with all tasks simply having a 1 second sleep. I have 20 of the tasks running in the main dag, 20 more running in a subdag and then 20 running in a sub-subdag.
There is a 2-3 second delay between the end of a task in the main dag and the beginning of the next task. For tasks in the sub-dag, or even in the sub-subdag there is a 7-8 second delay between the end of one task and the beginning of the next.
I have observed the additional delays with both the CeleryExecutor and the LocalExecutor.
I have seen comments that airflow sees these delays as reasonable but I'm really curious as to why there is a difference in the sub-dag.

Related

Delay in starting the next stage in Spark job

While looking into the stage details for a spark job which takes very long time than usual; it is observed that the 'stage n' does not start even after all the 'stages from 0 to n-1' have been completed.
The enclosed details are from the spark details of a job/build -> stage progress.
I am unable to get the reason behind this lag where the stage-8 starts after a long delay (12.48 AM vs 1.25 AM). As you can see; all the stages above 8 get completed in seconds or minutes and the delay of 37 minutes between the highlighted stages is something puzzles me.
Any help is highly appreciated.
It's possible that that lag between both stages is IO happening. I would recommend you partition your datasets so that each file has 128MB each. Opening, writing and closing 1884 files takes time, and with 5.2GB of size you could do this with around 40 files.
df.repartition(40)
should help.

How to schedule millions of jobs in a node js properly?

I am using NodeJS,MongoDB and node-cron npm module to schedule jobs. For 10K of jobs it is taking less time and less memory. But when i am scheduling 100k jobs it is taking more than 10 minutes to schedule jobs and taking nearly 1.5GB of RAM and some times out of memory. Is there any best way achieve this like using activemq or rabbitmq?
One strategy is that you only schedule the next job to run. When it runs, you query the database and find the next job and schedule it.
If you add a new job, you check if it wants to run sooner than the now current next job and, if so, you schedule it and deschedule the previous next job (it will get rescheduled later after this new job runs).
If you remove a job, you check if it is the current next job. If it is, you deschedule it and find the next job in the database and schedule it.
If your database is configured for efficiently querying by job run time, this can be very efficient, uses hardly any memory and scales to an infinitely large number of jobs.

Agenda Job: now() make the job running multiple times

I am scheduling an Agenda Job as below:
await agenda.now("xyz");
But the above command makes my job running almost every 1 minute. But when I change it to
await agenda.every('5 minutes', "xyz");
The above works as expected i.e. it runs the job every 5 minutes.
But I don't want a recurring job. Rather run it once.
The issue was with the concurrency of the job definition. It was set to 10 because of which several instances of the same job were running in parallel.
Changing the concurrency to 1 solved the issue.

Kubernetes CronJob - Skip job if previous is still running AND wait for the next schedule time

I have scheduled the K8s cron to run every 30 mins.
If the current job is still running and the next cron schedule has reached it shouldn't create a new job but rather wait for the next schedule.
And repeat the same process if the previous job is still in Running state.
set the following property to Forbid in CronJob yaml
.spec.concurrencyPolicy
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy
spec.concurrencyPolicy: Forbid will hold off starting a second job if there is still an old one running. However that job will be queued to start immediately after the old job finishes.
To skip running a new job entirely and instead wait until the next scheduled time, set .spec.startingDeadlineSeconds to be smaller than the cronjob interval (but larger than the max expected startup time of the job).
If you're running a job every 30 minutes and know the job will never take more than one minute to start, set .spec.startingDeadlineSeconds: 60

Why does web UI show different durations in Jobs and Stages pages?

I am running a dummy spark job that does the exactly same set of operations in every iteration. The following figure shows 30 iterations, where each job corresponds to one iteration. It can be seen the duration is always around 70 ms except for job 0, 4, 16, and 28. The behavior of job 0 is expected as it is when the data is first loaded.
But when I click on job 16 to enter its detailed view, the duration is only 64 ms, which is similar to the other jobs, the screen shot of this duration is as follows:
I am wondering where does Spark spend the (2000 - 64) ms on job 16?
Gotcha! That's exactly the very same question I asked myself few days ago. I'm glad to share the findings with you (hoping that when I'm lucking understanding others chime in and fill the gaps).
The difference between what you can see in Jobs and Stages pages is the time required to schedule the stage for execution.
In Spark, a single job can have one or many stages with one or many tasks. That creates an execution plan.
By default, a Spark application runs in FIFO scheduling mode which is to execute one Spark job at a time regardless of how many cores are in use (you can check it in the web UI's Jobs page).
Quoting Scheduling Within an Application:
By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into "stages" (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.
You should then see how many tasks a single job will execute and divide it by the number of cores the Spark application have assigned (you can check it in the web UI's Executors page).
That will give you the estimate on how many "cycles" you may need to wait before all tasks (and hence the jobs) complete.
NB: That's where dynamic allocation comes to the stage as you may sometimes want more cores later and start with a very few upfront. That's what the conclusion I offered to my client when we noticed a similar behaviour.
I can see that all the jobs in your example have 1 stage with 1 task (which make them very simple and highly unrealistic in production environment). That tells me that your machine could have got busier at different intervals and so the time Spark took to schedule a Spark job was longer but once scheduled the corresponding stage finished as the other stages from other jobs. I'd say it's a beauty of profiling that it may sometimes (often?) get very unpredictable and hard to reason about.
Just to shed more light on the internals of how web UI works. web UI uses a bunch of Spark listeners that collect current status of the running Spark application. There is at least one Spark listener per page in web UI. They intercept different execution times depending on their role.
Read about org.apache.spark.scheduler.SparkListener interface and review different callback to learn about the variety of events they can intercept.

Resources