Open MPI, determine rank of process to send to - openmpi

I have two different executables each with a specific role. One of the two processes sends the other information by calling MPI_isend. But how do I know the rank of the other process?
I found out that when I run my stack as follows, that exe1, the receiving process, seems to always have rank 0, exe2 seems to always have rank 1. Therefore, if I send to rank 0 from exe2, the message is received. But am I missing anything here, it seems so complicated?
mpirun -np 1 exe1 : -np 1 exe2

Mapping of processes to ranks in Open MPI can be controlled with various CLI arguments to mpiexec with newer versions (like 1.7.x) supporting much finer control than older versions. By default ranks follow the order in which processes are placed in the slots provided. Therefore -np 1 exe1 : -np 1 exe2 will always result in exe1 being rank 0 and exe2 being rank 1 in MPI_COMM_WORLD. If you use -np 3 exe1 : -np 2 exe2 instead, you will get the following:
rank executable
------------------
0 exe1
1 exe1
2 exe1
3 exe2
4 exe2
It is also possible to start exe1 and exe2 as separate MPI jobs and make them connect to each other over an intercommunicator but that is considered an advanced MPI topic.

Another solution would be, to have the receiving process, exe1, send a message with its rank first. When the second process listens for messages from any source with the tag of that message, it will receive the rank of the first process.

Related

Read from standard input with all MPI processes

So far I've been using OPEN(fid, FILE='IN', ...) and it seems that all MPI processes read the same file IN without interfering with each other.
Furthermore, in order to allow the input file being chosen among several, I simply made the IN file a symbolic link pointing to the desired input. This means that when I want to change the input file I have to run ln -sf desidered-input IN before running the program (mpirun -n $np ./program).
I'd really like to be able to run the progam as mpirun -n $np ./program < input-file. To do so I removed the OPEN statement, and the corresponding CLOSE statement, and changed all READ(fid,*) statements to READ(INPUT_UNIT,*) (I'm using ISO_FORTRAN_ENV module).
But, after all edits, I've realized that only one process (always 0, I noticed) reads from it, since all others reach EOF immediately. Here is a MWE, using OpenMPI 2.0.1.
! cat main.f90
program main
use, intrinsic :: iso_fortran_env
use mpi
implicit none
integer :: myid, x, ierr, stat
x = 12
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, myid, ierr)
read(input_unit,*, iostat=stat) x
if (is_iostat_end(stat)) write(output_unit,*) myid, "I'm out"
if (.not. is_iostat_end(stat)) write(output_unit,*) myid, "I'm in", myid, x
call mpi_finalize(ierr)
end program main
that can be compiled with mpifort -o main main.f90, run with mpirun -np 4 ./main, and which results in this output
1 I'm out
2 I'm out
3 I'm out
17 this is my input from keyboard
0 I'm in 0 17
I know that MPI has proper routines to perform parallel I/O, but I've found nothing about reading from standard input.
You are seeing the expected behaviour with OpenMPI. By default, mpirun
directs UNIX standard input to /dev/null on all processes except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process inherits standard input from mpirun.
The option --stdin can be used to direct standard input to another process, but not to direct to all.
One could also note that the behaviour of redirection of standard input isn't consistent across MPI implementations (the notion isn't specified by the MPI standard). For example, using Intel MPI there is the -s option to that mpirun. mpirun -np 4 -s all ./main does allow all processes access to mpirun's standard input. There's also no guarantee that processes without that redirection will fail, rather than wait, to read.

Cannot allocate exclusively a CPU for my process

I have a simple mono-threaded application that does almost pure processing
It uses two int buffers of the same size
It reads one-by-one all the values of the first buffer
each value is a random index in the second buffer
It reads the value at the index in the second buffer
It sums all the values taken from the second buffer
It does all the previous steps for bigger and bigger
At the end, I print the number of voluntary and involuntary CPU context switches
If the size of the buffers become quite big, my PC starts to slow down: why? I have 4 cores with hyper-threading so 3 cores are remaing. Only one is 100% busy. Is it because my process uses almost 100% for the "RAM-bus"?
Then, I created a CPU-set that I want to dedicate to my process (my CPU-set contains both CPU-threads of the same core)
$ cat /sys/devices/system/cpu/cpu3/topology/core_id
3
$ cat /sys/devices/system/cpu/cpu7/topology/core_id
3
$ cset set -c 3,7 -s my_cpuset
$ cset set -l
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-7 y 0 y 934 1 /
my_cpuset 3,7 n 0 n 0 0 /my_cpuset
It seems that absolutely no task at all is running on my CPU-set. I can relaunch my process and while it is running, I launch:
$ taskset -c 7 ./TestCpuset # Here, I launch my process
...
$ ps -mo pid,tid,fname,user,psr -p 25244 # 25244 being the PID of my process
PID TID COMMAND USER PSR
25244 - TestCpus phil -
- 25244 - phil 7
PSR = 7: my process is well running on the expected CPU-thread. I hope it is the only one running on it but at the end, my process displays:
Number of voluntary context switch: 2
Number of involuntary context switch: 1231
If I had involuntary context switches, it means that other processes are running on my core: How is it possible? What must I do in order to get Number of involuntary context switch = 0?
Last question: When my process is running, if I launch
$ cset set -l
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-7 y 0 y 1031 1 /
my_cpuset 3,7 n 0 n 0 0 /my_cpuset
Once again I get 0 tasks on my CPU-set. But I know that there is a process running on it: it seems that a task is not a process?
If the size of the buffers become quite big, my PC starts to slow down: why? I have 4 cores with hyper-threading so 3 cores are remaing. Only one is 100% busy. Is it because my process uses almost 100% for the "RAM-bus"?
You reached the hardware performance limit of a single-threaded application, that is 100% CPU time on the single CPU your program is allocated to. Your application thread will not run on more than one CPU at a time (reference).
What must I do in order to get Number of involuntary context switch = 0?
Aren't you missing --cpu_exclusive option in cset set command?
By the way, if you want to achieve lower execution time, i suggest you to make a multithreaded application and let operating system, and the hardware beneath parallelize execution instead. Locking a process to a CPU set and preventing it from doing context-switch might degrade the operating system performance and is not a portable solution.

How can i gather threads number per process and find the max in bash terminal?

I want to gather how many threads does a process use (from their PID/status i guess) and after that i wanna compare them and output the biggest number of them. For example i wanna gather all threads per chromium's processes and then compare the numbers and output the max. Any ideas?
E.g
2131 Threads : 20 , 2341 Threads : 10 , 2200 Threads : 5
Max Threads = 20
in ubuntu you can get no of thread for process using this command
ps -o nlwp `pgrep process_name`
nlwp stands for number of light weight process

OpenMPI mpirun universe size

I do not know if I perhaps understand this incorrectly. But here is what I want to achieve with OpenMPI in particular just starting with mpirun:
I want to create a single process using the -np parameter that specifies the world size as 1
I then want to set the universe size to some arbitrary number (for argument sake 10), how do I do this?
The following two commands:
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universe_size,
&flag);
yield the output of world_size as 1 and universe_size as 1.
Ok, so I found 2 ways of doing this:
Implicit: mpirun -np 1 -H localhost,localhost,...,localhost executable
Explicit: just assign a value to universe_size in the application itself, it will work fine.
Thank you for anyone that looked at this.

Cygwin top command - See processes for all users

Does anybody know how to see the processes for all users using top command in Cygwin (part of procps library under System).
I know this can be done in *nix but I am struggling in Cygwin. I have tried using pslist but it does not behave in a putty SSH console.
I need to have a solution where I can see a top like dialog using SSH. I do not have any NTLM SSO access to the Win2k3 guest at all so ssh is the only way in.
top only displays Cygwin processes. ps -W will list Windows processes as well.
Manytimes the command "tasklist" gets the job done more effectively. It built into windows, just make sure your System32 folder is part of your bash profile PATH. There is also procps itself. You should also try using mintty for your terminal. You could always try attaching any of these task apps to screen, and or using watch to poll the information.
It seems you can do something like:
wmic process get ProcessId,Name,UserModeTime,KernelModeTime /EVERY:1
The User and Kernel mode times there seem to be expressed in 1/10,000,000th of second.
You should be able to post-process that output to get the CPU-usage per second.
Here using cygwin's perl:
wmic process get ProcessId,Name,UserModeTime,KernelModeTime /EVERY:1 |
perl -lne '
if (/\S/) {
my ($k,$c,$p,$u) = split /\s{2,}/;
$n{"$p\t$c"}=$k+$u;
} else {
my %c;
for my $k (keys %n) {
$c{$k} = $n{$k} - $o{$k} if defined $o{$k}
}
print "$_\t" . $c{$_}/1e5 for (sort {$c{$b}<=>$c{$a}} keys %c)[0..20];
%o = %n; %n = undef; print ""
}'
Outputs something like:
0 System Idle Process 588.12377
2196 sh.exe 107.00075
248 svchost.exe 85.80055
7140 explorer.exe 26.52017
[...]
every second.
Note that if the System Idle Process shows just under 800% on an idle system, that's because your system has 8 CPU cores (well at least 8 threads) as that counts the CPU time of all CPUs.
Also note that the EVERY:1 above is a lie. wmic doesn't seem to give that output every second. More likely, it sleeps roughly 1 second between each report and doesn't compensate for the time it takes to compute the report. So in practice, it will run every 1 second and a bit which means those percentages are not very accurate and slightly overestimated.

Resources