I have a SLURM job script a which internally issues an sbatch call to a second job script b. Thus, the job a starts job b.
Now I also have an srun command in job a which depends on the successful execution of b. So I did
srun -d afterok:$jobid <command>
The issue is that dependencies are seemingly not honoured for job steps which I have in this case because my srun runs within the job allocation a (see the --dependency section of https://slurm.schedmd.com/srun.html).
The question: I really need to wait for job b to finish before I issue the job step. How can I do this without resorting to separate jobs?
Related
When using the parameter -p you can define the partition for your job.
In my case a job can run in different partitions, so I do want to restrict my job to only a given partition.
If my job can perfectly run in partitions "p1" and "p3", how can I configure the sbatch command to allow more than one partition?
The --partition option accepts a list of partition. So in your case you would write
#SBATCH --partition=p1,p3
The job will start in the partition that offers the resources the earliest.
job 1: run_test [job name]
uses notebook get_num
params: {"num":"1"}
job 2: run_dev
uses notebook get_num
params: {"num":"2"}
when I run both jobs at the same time, the job which runs first parameters gettting applied to second job also. how can I avoid this.
I have 20 bash scripts. Some run every min, every day and some on every hour using cron. Now I need to migrate in airflow. For this, as per the airflow concept, I need to create 20 more files (DAG file).
Does airflow provide away to create generic dag template which can execute all the bash scripts on given schedule time with different dag id?
I got a reference - Airflow dynamic DAG and Task Ids
But I am in doubt, is it the right way or not?
You can create 3 DAGS:
DAG scheduled every minute
hourly scheduled DAG
daily scheduled DAG.
Under these DAGS create tasks to execute your corresponding scripts.
I run a single spark job on a local cluster(1 master-2workers/executors).
From what i have understood until now, all stages of a job are splited into tasks. Each stage has its own task set. Each task of this TaskSet will be scheduled on an executor of the local cluster.
I want to make TaskSetManager of Spark to schedule all tasks of a TaskSet(of a single Stage) on the same (local) executor, but i have not figured up how to do so.
Thanks,
Jim
while submitting the job , provide the number of executors as one
Is there a way to tell spark to continue a job after a single task failed?
Or even better:
Can we configure a job to fail only if a certain percent of the tasks fails?
My scenario is like this:
I'm using pyspark to do some parallel computations.
I have a job that is composed from thousands of tasks (which are more or less independent from each other - i can allow some to fail).
1 task fails (throws exception), and after few retries for this task the entire job is aborted.
Is there a way to change this (weird) behavior?
No, there is no such feature in spark.
There is an open jira ticket(SPARK-10781) for it but I don't see any action there.
You can do it in mapreduce using config mapreduce.map.failures.maxpercent and mapreduce.max.reduce.failures.percent