Using option --array as an argument in slurm

Using option --array as an argument in slurm - slurm

Is it possible to use the --array option as an argument? I mean, I have a R code where I use arrays. The number of arrays depends of the file on which I execute my R code. I would like to pass as argument the number of arrays into the sbatch my_code.R command line , in order to never modify my slurm code : for example, for a file with 550.000 columns, I will need 10 arrays, a file with 1.000.000 columns will needed 19 arrays etc. I must get something like this "sbatch --array 1-nb_of_arrays_needed my_code.R" . The goal is to make my code usable by everyone, without the user needs to go into the slurm code in order to change the line #SBATCH --array=x-y
My R code (I don't show it in full) :
data<-read.table(opt$file, h=T, row.names=1, sep="\t")+1
ncol=ncol(data)
nb_arrays=ncol/55000
nb_arrays=round(nb_arrays)
opt$number=nb_arrays
...
Bests

Your R script will start only when the job is scheduled. To be scheduled, it must be submitted, and to be submitted, it must know the argument to --array.
So you have two options:
Either split your R script into to, one part that will run before the job is submitted, and the other that will run when the job starts. The first part will compute the necessary number of jobs in the array (and possibly submit the job array automatically) and the other part will do the actual calculations.
If you prefer having only one R script, you can differentiate the behaviour based on the presence or absence of SLURM_JOB_ID variable in the environment. If it is not present, compute the number of jobs and submit, if it is present, do the actual calculations.
The other option is to set --array in the submission job to a large value, and when the first job in the array starts, it computes the number of jobs that are necessary, and cancels the superfluous jobs.

Related

Scheduling more jobs than MaxArraySize

Let's say I have 6233 simulations to run. The commands are generated and stored in a file, one in each line. I would like to use Slurm to schedule and run these commands. However, the MaxArraySize limit is 2000. So I can't use one job array to schedule all of them.
One solution is given here, where we create four separate jobs and use arithmetic indexing into the file, with the last job having a smaller number of tasks to run (233).
Is it possible to do this using one sbatch script with one job ID?
I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?
Update:
Following Damien's solution and examples given here, I ended up with the following line in my bash script:
curID=$(( ${SLURM_ARRAY_TASK_ID} * ${SLURM_NTASKS} + ${SLURM_PROCID} ))
The same can be done using Python (shown in the referenced page). The only difference is that the environment variables should be imported into the script.

Is it possible to do this using one sbatch script with one job ID?
No. That solution will give you multiple job IDs
I set ntasks=1 when using job arrays. Do larger ntasks help in such situations?
Yes, that is a factor that you can leverage.
Each job in the array can spawn multiple tasks (--ntasks=...). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $SLURM_PROCID, and the program must be started with srun. Each task in a job member of the array will run in parallel. How large the job can be will depend on the MaxJobsize limit defined on the cluster/partition/qos you have access to.
Another option is to chain the tasks inside each job of the array, with a Bash loop (for i in $seq(...) ; do ...; done). In that case, the line number in the command file must be computed from $SLURM_ARRAY_TASK_ID and $i. Each task in a job member of the array will run serially. How large the job can be will depend on the MaxWall limit defined on the cluster/partition/qos you have access to.

one input file to yield many output files

This is a bit of a backwards approach to snakemake whose main paradigm is "one job -> one output", but i need many reruns in parallel of my script on the same input matrix on the slurm batch job submission cluster. How do I achieve that?
I tried specifying multiple threads, multiple nodes, each time indicating one cpu per task, but it never submits an array of many jobs, just an array of one job.

I don't think there is a nice way to submit an array job like that. In snakemake, you need to specify a unique output for each job. But you can have the same input. If you want 1000 runs of a job:
ids = range(1000)
rule all:
input: expand('output_{sample}_{id}', sample=samples, id=ids)
rule simulation:
input: 'input_{sample}'
output: 'output_{sample}_{id}'
shell: echo {input} > {output}
If that doesn't help, provide more information about the rule/job you are trying to run.

Is it possible to assign job names to separate workers in a SLURM array via sbatch?

By default, when submitting a SLURM job as an array, all jobs within the array share the same job name. In the docs (here: https://slurm.schedmd.com/job_array.html), it shows that each job in the array can have its name set separately via scontrol (described under the section "Scontrol Command Use").
Can this be done directly from an sbatch script?

I just created an account because I was trying to do this and I did find a solution.
You can use scontrol to change the name of a job, the syntax is the following:
scontrol update job=<job_id> JobName=<new_name>
You can do this manually, but you can also automatically set the name of the job from within the array job, thus automatically assigning a different name to each job in the array.
I find this useful because I'm mostly running calculations in different directories and if I have one job running much longer than the others I want to be able to quickly retrieve where it's running to see what's going on.
Of course you could set other things as your job name, as you prefer.
In my case, I add the scontrol command to the script I run through the array in order to obtain the following name for each directory: "job_name - directory". The job id and job name can be retrieved from environment variables.
scontrol update job=$SLURM_ARRAY_JOB_ID JobName="$SLURM_JOB_NAME - $folder"

JCL should read internal reader than completely submit outer JCL

I have a batch job that has 10 steps in STEP5. I have written an internal JCL and I want after Internal reader step are completed successfully my next step in the parent job which is STEP06 to execute. Could you please give any resolution to this problem.

For what you have described, there are 2 approaches:
Break your process into 3 jobs - steps 1-5 as one job, the second
job consisting of the JCL submitted in sep 5 and the third job
consisting of step 6 (or steps 6-10 - you do not make it clear if
the main JCL has 6 steps and the 'inner' JCL 4 steps, making the 10
steps you mention, or if the main JCL has 10 steps).
The execution of the 3 jobs will need to be serialised somehow.
Simply have the 'inner' JCL as a series of steps in the 'outer' JCL
so that you only have once job with steps that run in order.

The typical approach to this sort of issue would be to use scheduler to handle the 3 part process as 3 jobs the middle one perhaps not submitted by the scheduler but monitored/tracked by it.
With a good scheduler setup, there is a good chance that even if the jobs were run on different machines or even different types of machines that they could be tracked.
To have a single job delayed halfway through is possible but would require some sort of program to run in a loop (waiting so as not to use excessive cpu) checking for an event (a dataset being created or deleted, the job could itself could be checked or numerous other ways).
Another way could be to have part 1 submit a job to do part 2 and that job to then submit another as part 3.
Yet another way, perhaps frowned upon depedning upon it's importance, would be to have three jobs the first part submitted to run, the third part submitted but on hold. The first submits the second which releases the third.
Of course there is also the possibility that one job could do everthing as a single job.

SLURM: Changing the maximum number of simultaneously running tasks for a running array job

I have set of an array job as follows:
sbatch --array=1:100%5 ...
which will limit the number of simultaneously running tasks to 5. The job is now running, and I would like to change this number to 10 (i.e. I wish I'd run sbatch --array=1:100%10 ...).
The documentation on array jobs mentions that you can use scontrol to change options after the job has started. Unfortunately, it's not clear what this option's variable name is, and I don't think it is listed in the documentation of the sbatch command here.
Any pointers well received.

You can change the array throttling limit with the following command:
scontrol update ArrayTaskThrottle=<count> JobId=<jobID>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using option --array as an argument in slurm - slurm

Related

Scheduling more jobs than MaxArraySize

one input file to yield many output files

Is it possible to assign job names to separate workers in a SLURM array via sbatch?

JCL should read internal reader than completely submit outer JCL

SLURM: Changing the maximum number of simultaneously running tasks for a running array job

Categories

Resources