How do you specify nodes on `mpirun`'s command line? - linux

How do I use mpirun's -machine flag?
To select which cluster node to execute on, I figured out to use mpirun's -machinefile option like this
> mpirun -machinefile $HOME/utils/Host_file -np <integer> <executable-filename>
Host_file contains a list of the nodes, one on each line.
But I want to submit a whole bunch of processes with different arguments and I don't want them running on the same node. That is, I want to do something like
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 1
nano Host_file % change the first node name
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 2
nano Host_file
> mpirun -machinefile $HOME/utils/Host_file -np 1 filename 3
nano Host_file
...
I could use the -machine flag and then just type a different node for each execution. But I can't get it to work. For example
> mpirun -machine node21-ib -np 1 FPU
> mpirun -machine node21 -np 1 FPU
always executes on the master node.
I also tried the -nodes option
> mpirun -nodes node21-ib -np 1 FPU
> mpirun -nodes node21 -np 1 FPU
But that just executes on my current node.
Similarly, I've tried the -nolocal and -exclude options without success.
So I have a simple question: How do I use the -machine option? Or is there a better way to do this (for a Linux newbie)?
I'm using the following version of MPI, which seems to have surprisingly little documentation on the web (so far, the entirety of the documentation I have comes from > mpirun --help).
> mpichversion
MPICH Version: 1.2.7
MPICH Release date: $Date: 2005/06/22 16:33:49$
MPICH Patches applied: none
MPICH configure: --with-device=ch_gen2 --with-arch=LINUX -prefix=/usr/local/mvapich-gcc --with-romio --without-mpe -lib=-L/usr/lib64 -Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread
MPICH Device: ch_gen2
Thanks for your help.

What you need is to specific a hosts file
for example at your mpirun command try mpirun -np 4 -hostfile hosts ./exec
where hosts contains your ip address generally 192.168.1.201:8 where the last digit is the maximum number of cores, separate each node by a newline. Ideally you should install some cluster management software like torque and maui for example.

Related

How to run slurm in the background?

To use allocated resources by slurm interactively and in the background, I use salloc -n 12 -t 20:00:00&. The problem is that this command does not redirect me to the compute node and if I run a program it uses resources of the login node. Could you please help me to find the right command?
salloc -n 12 -t 20:00:00 a.out </dev/null&
but it fails :
salloc: error: _fork_command: Unable to find command "a.out"
Any help is highly appreciated.
Is a.out in your path? e.g. what does which a.out return?
you only need to execute salloc -n 12 -t 20:00:00&. Then use ssh to connect to the allocated node (for example, ssh node013).

Proper way to use taskset command

I am getting my available CPU that i have with this command
cat /proc/cpuinfo | grep processor |wc -l
It says, i have available 4 cores (actually 2 physical cores and others logicals)
Then i run my task python3 mytask.py from the command line. After run my program, i want to change its pinned core, as only in core0 or core3 or only core0 and core2
I know i can do it with os.sched_setaffinity() function but i want to do that using taskset command
I am trying this ;
taskset -pc 2 <pid> Can i run this command only checking my available CPU number ?
or do i have to check eligible cores for my task before the run taskset command ?
will linux kernel give me a guarantee to accept my new affinity list if it is between 0 and 4 ?
For example i have 4 CPUs available, and when i want to change kworker thread affinity core0 to core1, it failed. Then i checked allowed CPUs for kworker thread with this command ;
cat /proc/6/status |grep "Cpus_allowed_list:"
it says current affinity list: 0
Do i need to check "Cpus_allowed_list" when i want to run taskset command to change affinity list ?

How could I run Open MPI under Slurm

I am unable to run Open MPI under Slurm through a Slurm-script.
In general, I am able to obtain the hostname and run Open MPI on my machine.
$ mpirun hostname
myHost
$ cd NPB3.3-SER/ && make ua CLASS=B && mpirun -n 1 bin/ua.B.x inputua.data # Works
But if I do the same operation through the slurm-script mpirun hostname returns empty string and consequently I am unable to run mpirun -n 1 bin/ua.B.x inputua.data.
slurm-script.sh:
#!/bin/bash
#SBATCH -o slurm.out # STDOUT
#SBATCH -e slurm.err # STDERR
#SBATCH --mail-type=ALL
export LD_LIBRARY_PATH="/usr/lib/openmpi/lib"
mpirun hostname > output.txt # Returns empty
cd NPB3.3-SER/
make ua CLASS=B
mpirun --host myHost -n 1 bin/ua.B.x inputua.data
$ sbatch -N1 slurm-script.sh
Submitted batch job 1
The error I am receiving:
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
A daemon (pid unknown) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
------------------------------------------------------------------
If Slurm and OpenMPI are recent versions, make sure that OpenMPI is compiled with Slurm support (run ompi_info | grep slurm to find out) and just run srun bin/ua.B.x inputua.data in your submission script.
Alternatively, mpirun bin/ua.B.x inputua.data should work too.
If OpenMPI is compiled without Slurm support the following should work:
srun hostname > output.txt
cd NPB3.3-SER/
make ua CLASS=B
mpirun --hostfile output.txt -n 1 bin/ua.B.x inputua.data
Make sure also that by running export LD_LIBRARY_PATH="/usr/lib/openmpi/lib" you do not overwrite other library paths that are necessary. Better would probably be export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/lib/openmpi/lib" (or a more complex version if you want to avoid a leading : if it were initially empty.)
What you need is: 1) run mpirun, 2) from slurm, 3) with --host.
To determine who is responsible for this not to work (Problem 1), you could test a few things.
Whatever you test, you should test exactly the same via command line (CLI) and via slurm (S).
It is understood that some of these tests will produce different results in cases CLI and S.
A few notes are:
1) You are not testing exactly the same things in CLI and S.
2) You say that you are "unable to run mpirun -n 1 bin/ua.B.x inputua.data", while the problem is actually with mpirun --host myHost -n 1 bin/ua.B.x inputua.data.
3) The fact that mpirun hostname > output.txt returns an empty file (Problem 2) does not necessarily have the same origin as your main problem, see paragraph above. You can overcome this problem by using scontrol show hostnames
or with the environment variable SLURM_NODELIST (on which scontrol show hostnames is based), but this will not solve Problem 1.
To work around Problem 2, which is not the most important, try a few things via both CLI and S.
The slurm script below may be helpful.
#SBATCH -o slurm_hostname.out # STDOUT
#SBATCH -e slurm_hostname.err # STDERR
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/lib64/openmpi/lib"
mpirun hostname > hostname_mpirun.txt # 1. Returns values ok for me
hostname > hostname.txt # 2. Returns values ok for me
hostname -s > hostname_slurmcontrol.txt # 3. Returns values ok for me
scontrol show hostnames > hostname_scontrol.txt # 4. Returns values ok for me
echo ${SLURM_NODELIST} > hostname_slurmcontrol.txt # 5. Returns values ok for me
(for an explanation of the export command see this).
From what you say, I understand 2, 3, 4 and 5 work ok for you, and 1 does not.
So you could now use mpirun with suitable options --host or --hostfile.
Note the different format of the output of scontrol show hostnames (e.g., for me cnode17<newline>cnode18) and echo ${SLURM_NODELIST} (cnode[17-18]).
The host names could perhaps also be obtained in file names set dynamically with %h and %n in slurm.conf, look for e.g. SlurmdLogFile, SlurmdPidFile.
To diagnose/work around/solve Problem 1, try mpirun with/without --host, in CLI and S.
From what you say, assuming you used the correct syntax in each case, this is the outcome:
mpirun, CLI (original post).
"Works".
mpirun, S (comment?).
Same error as item 4 below?
Note that mpirun hostname in S should have produced similar output in your slurm.err.
mpirun --host, CLI (comment).
Error
There are no allocated resources for the application bin/ua.B.x that match the requested mapping:
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
mpirun --host, S (original post).
Error (same as item 3 above?)
There are no allocated resources for the application
bin/ua.B.x
that match the requested mapping:
------------------------------------------------------------------
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
...
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
As per comments, you may have a wrong LD_LIBRARY_PATH path set.
You may also need to use mpi --prefix ...
Related?
https://github.com/easybuilders/easybuild-easyconfigs/issues/204

Linux perf tool run issues

I am using perf tool to bench mark one of my projects. The issue I am facing is that wo get automatihen I run perf tool on my machine, everything works fine.
However, I am trying to run perf in automation servers to make it part of my check in process but I am getting the following error from automation servers
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Error:
Permission error - are you root?
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv
fp: Terminated
I tried changing /proc/sys/kernel/perf_event_paranoid to -1 and 0 but still see the same issue.
Anybody seen this before? Why would I need to run the command as root? I am able to run it on my machine without sudo.
by the way, the command is like this:
perf record -m 32 -F 99 -p xxxx -a -g --call-graph fp
You can't use -a (full system profiling) and sample kernel from non-root user: http://man7.org/linux/man-pages/man1/perf-record.1.html
Try running it without -a option and with event limited to userspace events by :u suffix:
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cycles:u
Or use software event for virtualized platforms without PMU passthrough
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cpu-clock:u

Hide this OpenMPI message

Anytime I run an MPI program with "mpirun -n 1 myprogram" I get this message:
Reported: 1 (out of 1) daemons - 1 (out of 1) procs
How do I disable this message? I am using Open MPI 1.6.5
For some reason the value of the orte_report_launch_progress MCA parameter is set to true. This could either be coming from the system-wide Open MPI configuration file or from an environment variable named OMPI_MCA_orte_report_launch_progress. In any case, you may override it by passing --mca orte_report_launch_progress 0 to mpirun:
mpirun --mca orte_report_launch_progress 0 -n 1 myprogram
If the value is coming from the system-wide Open MPI configuration, you may also override it by appending the following to $HOME/.openmpi/mca-params.conf (create the file /and the directory/ if it doesn't exist):
orte_report_launch_progress = 0

Resources