What is the equivalent syntax of LSF for this PBS command? - pbs

In PBS script, this allocates 4 CPUs on each of 4 nodes :
#PBS -l select=4:ncpus=4
How do I translate this into LSF command ? I think the following will do :
#BSUB -n 16 -R "span[ptile=4]"
Am I correct ?

Yes, it seems to be right. Pls see https://hpc.llnl.gov/banks-jobs/running-jobs/lsf-quick-start-guide .

Related

How to run slurm in the background?

To use allocated resources by slurm interactively and in the background, I use salloc -n 12 -t 20:00:00&. The problem is that this command does not redirect me to the compute node and if I run a program it uses resources of the login node. Could you please help me to find the right command?
salloc -n 12 -t 20:00:00 a.out </dev/null&
but it fails :
salloc: error: _fork_command: Unable to find command "a.out"
Any help is highly appreciated.
Is a.out in your path? e.g. what does which a.out return?
you only need to execute salloc -n 12 -t 20:00:00&. Then use ssh to connect to the allocated node (for example, ssh node013).

Best way to automatically create different process names for qsub

I am running my program on a high-performance computer, usually with different parameters as input. Those parameters are given to the program via a parameter file, i.e. the qsub-file looks like
#!/bin/bash
#PBS -N <job-name>
#PBS -A <name>
#PBS -l select=1:ncpus=20:mpiprocs=20
#PBS -l walltime=80:00:00
#PBS -M <mail-address>
#PBS -m bea
module load foss
cd $PBS_O_WORKDIR
mpirun main parameters.prm
# Append the job statistics to the std out file
qstat -f $PBS_JOBID
Now usually I run the same program multiple times more or less at the same time, with different parameter.prm-files. Nevertheless they all show up in the job-list with the same name, making the correlation between the job in the list and the used parameters difficult (not impossible).
Is there a way to change the name of the program in the job list dynamically, depending on the used input parameters (ideally from within main)? Or is there another way to change the job name without having to edit the job-file every time I run
qsub job_script.pbs
?
Would be a solution to create a shell script which reads data from the parameter file, and then in turn creates the job-script and runs it? Or are there easier ways?
Simply use the -N option on the command line:
qsub -N job1 job_script.pbs
You can then use a for loop to iterate over the *.prm files:
for prm in *.prm
do
prmbase=$(basename $prm .prm)
qsub -N $prmbase main $prm
done
This will name each job by the parameter file name, sans the .prm suffix.

Sun Grid Engine: submitted jobs by qsub command

I am using Sun Grid Engine queuing system.
Assume I submitted multiple jobs using a script that looks like:
#! /bin/bash
for i in 1 2 3 4 5
do
sh qsub.sh python run.py ${i}
done
qsub.sh looks like:
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
Assuming that 5 jobs are running, I want to find out which command each job is executing.
By using qstat -f, I can see which node is running which jobID, but not what specific command each jobID is related to.
So for example, I want to check which jobID=xxxx is running python run.py 3 and so on.
How can I do this?
I think you'll see it if you use qstat -j *. See https://linux.die.net/man/1/qstat-ge .
You could try running array jobs. Array jobs are useful when you have multiple inputs to process in the same way. Qstat will identify each instance of the array of jobs. See the docs for more information.
http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/commands/qsub.htm#-t
http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto

Redirect output of my java program under qsub

I am currently running multiple Java executable program using qsub.
I wrote two scripts: 1) qsub.sh, 2) run.sh
qsub.sh
#! /bin/bash
echo cd `pwd` \; "$#" | qsub
run.sh
#! /bin/bash
for param in 1 2 3
do
./qsub.sh java -jar myProgram.jar -param ${param}
done
Given the two scripts above, I submit jobs by
sh run.sh
I want to redirect the messages generated by myProgram.jar -param ${param}
So in run.sh, I replaced the 4th line with the following
./qsub.sh java -jar myProgram.jar -param ${param} > output-${param}.txt
but the messages stored in output.txt is "Your job 730 ("STDIN") has been submitted", which is not what I intended.
I know that qsub has an option -o for specifying the location of output, but I cannot figure out how to use this option for my case.
Can anyone help me?
Thanks in advance.
The issue is that qsub doesn't return the output of your job, it returns the output of the qsub command itself, which is simply informing your resource manager / scheduler that you want that job to run.
You want to use the qsub -o option, but you need to remember that the output won't appear there until the job has run to completion. For Torque, you'd use qstat to check the status of your job, and all other resource managers / schedulers have similar commands.

R programming - submitting jobs on a multiple node linux cluster using PBS

I am running R on a multiple node Linux cluster. I would like to run my analysis on R using scripts or batch mode without using parallel computing software such as MPI or snow.
I know this can be done by dividing the input data such that each node runs different parts of the data.
My question is how do I go about this exactly? I am not sure how I should code my scripts. An example would be very helpful!
I have been running my scripts so far using PBS but it only seems to run on one node as R is a single thread program. Hence, I need to figure out how to adjust my code so it distributes labor to all of the nodes.
Here is what I have been doing so far:
1) command line:
> qsub myjobs.pbs
2) myjobs.pbs:
> #!/bin/sh
> #PBS -l nodes=6:ppn=2
> #PBS -l walltime=00:05:00
> #PBS -l arch=x86_64
>
> pbsdsh -v $PBS_O_WORKDIR/myscript.sh
3) myscript.sh:
#!/bin/sh
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R
4) my_script.R:
> library(survival)
> ...
> write.table(test,"TESTER.csv",
> sep=",", row.names=F, quote=F)
Any suggestions will be appreciated! Thank you!
-CC
This is rather a PBS question; I usually make an R script (with Rscript path after #!) and make it gather a parameter (using commandArgs function) that controls which "part of the job" this current instance should make. Because I use multicore a lot I usually have to use only 3-4 nodes, so I just submit few jobs calling this R script with each of a possible control argument values.
On the other hand your use of pbsdsh should do its job... Then the value of PBS_TASKNUM can be used as a control parameter.
This was an answer to a related question - but it's an answer to the comment above (as well).
For most of our work we do run multiple R sessions in parallel using qsub (instead).
If it is for multiple files I normally do:
while read infile rest
do
qsub -v infile=$infile call_r.pbs
done < list_of_infiles.txt
call_r.pbs:
...
R --vanilla -f analyse_file.R $infile
...
analyse_file.R:
args <- commandArgs()
infile=args[5]
outfile=paste(infile,".out",sep="")...
Then I combine all the output afterwards...
This problem seems very well suited for use of GNU parallel. GNU parallel has an excellent tutorial here. I'm not familiar with pbsdsh, and I'm new to HPC, but to me it looks like pbsdsh serves a similar purpose as GNU parallel. I'm also not familiar with launching R from the command line with arguments, but here is my guess at how your PBS file would look:
#!/bin/sh
#PBS -l nodes=6:ppn=2
#PBS -l walltime=00:05:00
#PBS -l arch=x86_64
...
parallel -j2 --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
Rscript myscript.R {} :::: infilelist.txt
where infilelist.txt lists the data files you want to process, e.g.:
inputdata01.dat
inputdata02.dat
...
inputdata12.dat
Your myscript.R would access the command line argument to load and process the specified input file.
My main purpose with this answer is to point out the availability of GNU parallel, which came about after the original question was posted. Hopefully someone else can provide a more tangible example. Also, I am still wobbly with my usage of parallel, for example, I'm unsure of the -j2 option. (See my related question.)

Resources