Rmpi can not spawn process across nodes - openmpi

I followed the instruction here "http://math.acadiau.ca/ACMMaC/Rmpi/sample.html". Here is my R code
library("Rmpi")
mpi.spawn.Rslaves()
.Last <- function(){
if (is.loaded("mpi_initialize")){
if (mpi.comm.size(1) > 0){
print("Please use mpi.close.Rslaves() to close slaves.")
mpi.close.Rslaves()
}
print("Please use mpi.quit() to quit R")
.Call("mpi_finalize")
}
}
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()-1,"on",mpi.get.processor.name()))
mpi.close.Rslaves()
mpi.quit()
And in the script I designated the process number:
#$ -S /bin/sh
#$ -pe orte 20
#$ -cwd
#$ -V
#$ -o /data1/users/liuyang/R/rmpi.o
#$ -e /data1/users/liuyang/R/rmpi.e
mpiexec -np 1 R --slave CMD BATCH rmpi.R
But I found the R code only run on the master node, and mpi.spawn.Rslaves() only spawn 12 processes(cpu core number of the master node).
The cluster is a sge cluster, and I installed Rmpi package with openmpi version 1.4.3. So what's the reason?
I also tried to comment the mpi.spawn.Rslaves() line in the R code and changed the parameter of mpiexec to "-np $NSLOTS", but it gave no slave process error.

You need keep the mpi.spawn.Rslaves() in your R script and then run your R script by mpiexec/mpirun with "-machinefile" and "-np 1" (http://linux.die.net/man/1/mpirun) or run your R scripts by a batch scheduler such as SGE, OpenLava and so on.
Basically RMPI calls MPI API of MPI_Comm_spawn() to spawn working node. The API of MPI_Comm_spawn() gets the size and machine list from MPI context. If the RMPI scripts is executed without mpiexec/mpirun, it is Singleton MPI_INIT mode which has max ncores of the host in MPI context. If the RMPI scripts is executed with mpiexec/mpirun, the MPI context will be prepared by mpiexec/mpirun with machine list. In above case, the MPI context has machines and slots specified in machinefile. The "-np 1" means only create one instance of R script which is MPI master.

Related

How could I run Open MPI under Slurm

I am unable to run Open MPI under Slurm through a Slurm-script.
In general, I am able to obtain the hostname and run Open MPI on my machine.
$ mpirun hostname
myHost
$ cd NPB3.3-SER/ && make ua CLASS=B && mpirun -n 1 bin/ua.B.x inputua.data # Works
But if I do the same operation through the slurm-script mpirun hostname returns empty string and consequently I am unable to run mpirun -n 1 bin/ua.B.x inputua.data.
slurm-script.sh:
#!/bin/bash
#SBATCH -o slurm.out # STDOUT
#SBATCH -e slurm.err # STDERR
#SBATCH --mail-type=ALL
export LD_LIBRARY_PATH="/usr/lib/openmpi/lib"
mpirun hostname > output.txt # Returns empty
cd NPB3.3-SER/
make ua CLASS=B
mpirun --host myHost -n 1 bin/ua.B.x inputua.data
$ sbatch -N1 slurm-script.sh
Submitted batch job 1
The error I am receiving:
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
A daemon (pid unknown) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
------------------------------------------------------------------
If Slurm and OpenMPI are recent versions, make sure that OpenMPI is compiled with Slurm support (run ompi_info | grep slurm to find out) and just run srun bin/ua.B.x inputua.data in your submission script.
Alternatively, mpirun bin/ua.B.x inputua.data should work too.
If OpenMPI is compiled without Slurm support the following should work:
srun hostname > output.txt
cd NPB3.3-SER/
make ua CLASS=B
mpirun --hostfile output.txt -n 1 bin/ua.B.x inputua.data
Make sure also that by running export LD_LIBRARY_PATH="/usr/lib/openmpi/lib" you do not overwrite other library paths that are necessary. Better would probably be export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/lib/openmpi/lib" (or a more complex version if you want to avoid a leading : if it were initially empty.)
What you need is: 1) run mpirun, 2) from slurm, 3) with --host.
To determine who is responsible for this not to work (Problem 1), you could test a few things.
Whatever you test, you should test exactly the same via command line (CLI) and via slurm (S).
It is understood that some of these tests will produce different results in cases CLI and S.
A few notes are:
1) You are not testing exactly the same things in CLI and S.
2) You say that you are "unable to run mpirun -n 1 bin/ua.B.x inputua.data", while the problem is actually with mpirun --host myHost -n 1 bin/ua.B.x inputua.data.
3) The fact that mpirun hostname > output.txt returns an empty file (Problem 2) does not necessarily have the same origin as your main problem, see paragraph above. You can overcome this problem by using scontrol show hostnames
or with the environment variable SLURM_NODELIST (on which scontrol show hostnames is based), but this will not solve Problem 1.
To work around Problem 2, which is not the most important, try a few things via both CLI and S.
The slurm script below may be helpful.
#SBATCH -o slurm_hostname.out # STDOUT
#SBATCH -e slurm_hostname.err # STDERR
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/lib64/openmpi/lib"
mpirun hostname > hostname_mpirun.txt # 1. Returns values ok for me
hostname > hostname.txt # 2. Returns values ok for me
hostname -s > hostname_slurmcontrol.txt # 3. Returns values ok for me
scontrol show hostnames > hostname_scontrol.txt # 4. Returns values ok for me
echo ${SLURM_NODELIST} > hostname_slurmcontrol.txt # 5. Returns values ok for me
(for an explanation of the export command see this).
From what you say, I understand 2, 3, 4 and 5 work ok for you, and 1 does not.
So you could now use mpirun with suitable options --host or --hostfile.
Note the different format of the output of scontrol show hostnames (e.g., for me cnode17<newline>cnode18) and echo ${SLURM_NODELIST} (cnode[17-18]).
The host names could perhaps also be obtained in file names set dynamically with %h and %n in slurm.conf, look for e.g. SlurmdLogFile, SlurmdPidFile.
To diagnose/work around/solve Problem 1, try mpirun with/without --host, in CLI and S.
From what you say, assuming you used the correct syntax in each case, this is the outcome:
mpirun, CLI (original post).
"Works".
mpirun, S (comment?).
Same error as item 4 below?
Note that mpirun hostname in S should have produced similar output in your slurm.err.
mpirun --host, CLI (comment).
Error
There are no allocated resources for the application bin/ua.B.x that match the requested mapping:
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun --host, S (original post).
Error (same as item 3 above?)
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
As per comments, you may have a wrong LD_LIBRARY_PATH path set.
You may also need to use mpi --prefix ...
Related?
https://github.com/easybuilders/easybuild-easyconfigs/issues/204

How to run a MPI task?

I am newbie in Linux and recently started working with our university super-computer and I need to install my program ( GAMESS Quantum Chemistry Software ) on my own allocated space. I have installed and ran it successfully under 'sockets' but actually I need to run it under 'mpi' ( otherwise there will be little advantage of using a super-computer ).
System Setting:
OS: Linux64 , Redhat, intel
MPI: impi
compiler: ifort
modules: slurm , intel/intel-15.0.1 , intel/impi-15.0.1
This software runs ' rungms ' and receives arguments as:
rungms [fileName][Version][CPU count ] ( for example: ./rungms Opt 00 4 )
Here is my bash file ( my feeling is this is the main culprit for my problem !):
#!/bin/bash
#Based off of Monte's Original Script for Torque:
#https://gist.github.com/mlunacek/6306340#file-matlab_example-pbs
#These are SBATCH directives specifying name of file, queue, the
#Quality of Service, wall time, Node Count, #of CPUS, and the
#destination output file (which appends node hostname and JobID)
#SBATCH -J OptMPI
#SBATCH --qos janus-debug
#SBATCH -t 00-00:10:00
#SBATCH -N2
#SBATCH --ntasks-per-node=1
#SBATCH -o output-OptMPI-%N-JobID-%j
#NOTE: This Module Will Be Replaced With Slurm Specific:
module load intel/impi-15.0.1
mpirun /projects/augenda/gamess/rungms Opt 00 2 > OptMPI.out
As I said before, the program is compiled for mpi ( and not 'sockets' ) .
My problem is when I run run sbatch Opt.sh , I receive this error:
srun: error: PMK_KVS_Barrier duplicate request from task 1
when I change -N number , sometimes I receive error saying (4 !=2
).
with odd number of -N I receive error saying it expects even number of processes.
What am I missing ?
Here is the code from our super-computer website as a bash file example
The Slurm Workload Manager has a few ways of invoking an Intel MPI process. Likely, all you have to do is use srun rather than mpirun in your case. If errors are still present, refer here for alternative ways to invoke Intel MPI jobs; it's rather dependent on how the HPC admins configured the system.

Run a script in the same shell(bash)

My problem is specific to the running of SPECCPU2006(a benchmark suite).
After I installed the benchmark, I can invoke a command called "specinvoke" in terminal to run a specific benchmark. I have another script, where part of the codes are like following:
cd (specific benchmark directory)
specinvoke &
pid=$!
My goal is to get the PID of the running task. However, by doing what is shown above, what I got is the PID for the "specinvoke" shell command and the real running task will have another PID.
However, by running specinvoke -n ,the real code running in the specinvoke shell will be output to the stdout. For example, for one benchmark,it's like this:
# specinvoke r6392
# Invoked as: specinvoke -n
# timer ticks over every 1000 ns
# Use another -n on the command line to see chdir commands and env dump
# Starting run for copy #0
../run_base_ref_gcc43-64bit.0000/milc_base.gcc43-64bit < su3imp.in > su3imp.out 2>> su3imp.err
Inside it it's running a binary.The code will be different from benchmark to benchmark(by invoking under different benchmark directory). And because "specinvoke" is installed and not just a script, I can not use "source specinvoke".
So is there any clue? Is there any way to directly invoke the shell command in the same shell(have same PID) or maybe I should dump the specinvoke -n and run the dumped materials?
You can still do something like:
cd (specific benchmark directory)
specinvoke &
pid=$(pgrep milc_base.gcc43-64bit)
If there are several invocation of the milc_base.gcc43-64bit binary, you can still use
pid=$(pgrep -n milc_base.gcc43-64bit)
Which according to the man page:
-n
Select only the newest (most recently started) of the matching
processes
when the process is a direct child of the subshell:
ps -o pid= -C=milc_base.gcc43-64bit --ppid $!
when not a direct child, you could get the info from pstree:
pstree -p $! | grep -o 'milc_base.gcc43-64bit(.*)'
output from above (PID is in brackets): milc_base.gcc43-64bit(9837)

Upstart: each process on different core

I'm trying to use upstart to launch multiple instances of node.js - each on separate cpu core listening on different port.
Launch configuration:
start on startup
task
env NUM_WORKERS=2
script
for i in `seq 1 $NUM_WORKERS`
do
start worker N=$i
done
end script
Worker configuration:
instance $N
script
export HOME="/node"
echo $$ > /var/run/worker-$N.pid
exec sudo -u root /usr/local/bin/node /node/server.js >> /var/log/worker-$N.sys.log 2>&1
end script
How do I specify that each process should be launched on a separate core to scale node.js inside the box?
taskset allows you to set CPU affinities for any Linux process. But Linux kernel already favors keeping a process on the same CPU to optimize performance.

R programming - submitting jobs on a multiple node linux cluster using PBS

I am running R on a multiple node Linux cluster. I would like to run my analysis on R using scripts or batch mode without using parallel computing software such as MPI or snow.
I know this can be done by dividing the input data such that each node runs different parts of the data.
My question is how do I go about this exactly? I am not sure how I should code my scripts. An example would be very helpful!
I have been running my scripts so far using PBS but it only seems to run on one node as R is a single thread program. Hence, I need to figure out how to adjust my code so it distributes labor to all of the nodes.
Here is what I have been doing so far:
1) command line:
> qsub myjobs.pbs
2) myjobs.pbs:
> #!/bin/sh
> #PBS -l nodes=6:ppn=2
> #PBS -l walltime=00:05:00
> #PBS -l arch=x86_64
>
> pbsdsh -v $PBS_O_WORKDIR/myscript.sh
3) myscript.sh:
#!/bin/sh
cd $PBS_O_WORKDIR
R CMD BATCH --no-save my_script.R
4) my_script.R:
> library(survival)
> ...
> write.table(test,"TESTER.csv",
> sep=",", row.names=F, quote=F)
Any suggestions will be appreciated! Thank you!
-CC
This is rather a PBS question; I usually make an R script (with Rscript path after #!) and make it gather a parameter (using commandArgs function) that controls which "part of the job" this current instance should make. Because I use multicore a lot I usually have to use only 3-4 nodes, so I just submit few jobs calling this R script with each of a possible control argument values.
On the other hand your use of pbsdsh should do its job... Then the value of PBS_TASKNUM can be used as a control parameter.
This was an answer to a related question - but it's an answer to the comment above (as well).
For most of our work we do run multiple R sessions in parallel using qsub (instead).
If it is for multiple files I normally do:
while read infile rest
do
qsub -v infile=$infile call_r.pbs
done < list_of_infiles.txt
call_r.pbs:
...
R --vanilla -f analyse_file.R $infile
...
analyse_file.R:
args <- commandArgs()
infile=args[5]
outfile=paste(infile,".out",sep="")...
Then I combine all the output afterwards...
This problem seems very well suited for use of GNU parallel. GNU parallel has an excellent tutorial here. I'm not familiar with pbsdsh, and I'm new to HPC, but to me it looks like pbsdsh serves a similar purpose as GNU parallel. I'm also not familiar with launching R from the command line with arguments, but here is my guess at how your PBS file would look:
#!/bin/sh
#PBS -l nodes=6:ppn=2
#PBS -l walltime=00:05:00
#PBS -l arch=x86_64
...
parallel -j2 --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
Rscript myscript.R {} :::: infilelist.txt
where infilelist.txt lists the data files you want to process, e.g.:
inputdata01.dat
inputdata02.dat
...
inputdata12.dat
Your myscript.R would access the command line argument to load and process the specified input file.
My main purpose with this answer is to point out the availability of GNU parallel, which came about after the original question was posted. Hopefully someone else can provide a more tangible example. Also, I am still wobbly with my usage of parallel, for example, I'm unsure of the -j2 option. (See my related question.)

Resources