How does OpenMPI set process' rank - openmpi

I would like to "override" my processes' ranks, based on some algorithm, after they have been launched via mpirun.
I believed mpirun just sets the environment variables, like OMPI_COMM_WORLD_RANK and OMPI_COMM_WORLD_SIZE, for the launched processes, and MPI_Init or MPI_Comm_rank then retrieve the values of these variable. However, re-setting these variables doesn't affect the rank.
So, how does MPI_Comm_rank(MPI_COMM_WORLD, &rank) actually figure out the rank?

Related

linux taskset: Does a thread of a multi-thread process always run on a particular core?

I use the taskset to set a multi-thread process to run on a Linux host as below:
task -c 1,2 ./myprocess
Will a particular thread always run on a particular CPU, for example, thread 1 always run on c1? or it will run on c1 or c2 at different times?
No, the filter is applied to the whole process and threads can move between (the restricted list of) cores. If you want threads not to move, then you need set the affinity of each thread separately (eg. using pthread_setaffinity_np for example). Note that you can check the affinity of threads of a given process with the great hwloc tool (hwloc-ps -t).
Note that some libraries/frameworks have ways to do that more easily. This is the case for OpenMP programs where you can use environment variables like OMP_PLACES to set the affinity of each thread.

How to bind a process to a set of cpu in golang?

I use os/exec pkg to have a process run. I want to check it cpu affinity and modify it to bind the process to a specific cpu set. I find
func SchedSetaffinity(pid int, set *CPUSet) error
This function is in golang.org/x/sys/unix package. However, it says it just bind a thread to a specific cpu. I don't know wheter it works on process. And I wonder how to get the CPUSet. Is it a value I need to define?
Taskset : To enable a process run on a specific CPU, you use the command 'taskset' in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" where the mask represents the cores in which the particular process shall run, provided the whole program runs with GOMAXPROCS=1.
pthread_setaffinity_np : You can use cgo and arrive on a logic that calls pthread_setaffinity_np, as Go uses pthreads in cgo mode. (The pthread_attr_setaffinity_np() function sets the CPU affinity mask attribute of the thread attributes object referred to by attr to the value specified in cpuset. )
Go helps in incorporation of affinity control via "SchedSetaffinity" that can be checked for confining a thread to specific cores. Accordingly , you can arrive on a logic for usage of "SchedSetaffinity(pid int, set *CPUSet)" that sets the CPU affinity mask of the thread specified by pid. If pid is 0 the calling thread is used.
It should be noted that GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. If it is > 1 then, you may use runtime.LockOSThread of Go that shall pin the current goroutine to the current thread that is is running on . The calling goroutine will always execute in that thread, and no other goroutine will execute in it, until the calling goroutine has made as many calls to UnlockOSThread as to LockOSThread.
cgroups : There is also option of using cgroups that helps in organizing the processes hierarchically and distribution of system resources along the hierarchy in a controlled and configurable manner. Here, there is subsystem termed as cpuset that enables assigning individual CPUs (on a multicore system) and memory nodes to process in a cgroup. The cpuset lists CPUs to be used by tasks within this cgroup. The CPU numbers are comma-separated numbers or ranges. For example:
#cat cpuset.cpus
0-4,6,8-10
A process is confined to run only on the CPUs in the cpuset it belongs to, and to allocate memory only on the memory nodes in that cpuset. It should be noted that all processes are put in the cgroup that the parent process belongs to at the time on creation and a process can be migrated to another cgroup. Migration of a process doesn't affect already existing descendant processes.

How can I determine the number of threads Matlab is using?

When I run simply "matlab", maxNumCompThreads returns 4.
When I run "matlab -singleCompThread", maxNumCompThreads returns 1.
However in both instances, ps uH p <PID> | wc -l (which I picked up from another question on SO to determine the number of threads a process is using) returns 35.
What gives? Can somebody explain to me what the 35 represents, and whether or not I can trust maxNumCompThreads as indicating that Matlab is only using one thread?
The number of threads used by MATLAB for computation (maxNumCompThreads) is different from the number of threads MATLAB.exe uses to manage its internal functions: the interpreter, memory manager, command line, who knows what else. If you were writing MATLAB, imagine the number of threads required to manage the various ongoing, independent tasks. Perhaps have a look at the Octave or FreeMat code to get an idea.
Many of the threads you see are used by the JVM that MATLAB launches. You could try the flag "-nojvm" to cut things down further. Obviously, without the JVM, functionality is very limited. "-singleCompThread" limits only the threads used by numeric computation such as MATLAB's intrinsic multithreading as well as threads used by external libraries such as MKL and FFTW.

Are process ids non-negative in Linux?

I am implementing a syscall that is called in user-space, lets say by foo.
The syscall accesses foo's task_struct ( throught the global pointer current), prints it's name and pid, then goes on to foo's parent-process, foo's parent's parent etc. Prints all their names and pids up to and including the init process's.
The pid=1 is reserved for init, the pid=0 is reserved for swapper.
According to swapper's task_struct, it's parent process is itself.
Swapper (or sched) always has pid=0 and is always init's parent-process?
Are all pids non-negative? Is it ok for me to make that assumption?
To answer your questions more concisely:
IBM's Inside the Linux boot process describes the swapper process as the process having a PID value of 0. At startup this process runs /sbin/init (or another process given as parameter by the bootloader), which will typically be the process with PID 1.
In Unix systems PID values are allocated sequentially, starting from the first process and up to a maximum value specified by /proc/sys/kernel/pid_max. Therefore you can safely go under the assumption that all valid PIDs have a positive value, while negatives boil down to error values and such.
A good idea would also be accounting for zombie processes, since they can receive "special treatment" in the process tree when/if they are adopted by init.
It is always positive or 0. The kernel sources define it to be of pid_t type which, afaik is considered to be an unsigned (although it is defined as signed in order to be able to make calls such as fork return negative numbers in case of errors).
Yes, they are actually always positive.
You can verify this by several POSIX system calls, such as wait, that use negative values to represent things such as all child processes of your own, or the like, and only positive values represent valid PID's.
Example: http://linux.die.net/man/2/wait

Is it possible to completely manage the life cycle of a process and its forks?

Consider a system that manages user-defined programs:
A program can be anything. Its command line is defined by non-privileged users in some configuration file. It could be /bin/ls, it could be /usr/sbin/apache; the user may specify whatever he is permitted to start.
Each program is run as a non-root user.
Any given user can configure any number of programs.
Each program runs for as long as it wants.
Each program may call fork(), exec() etc.
Each program may set itself as a session leader (ie., setsid()).
The system that starts the programs might not run continuously. It starts a program, then quits.
The action "stop all of program P's processes, including children/forks" must be possible.
The action "find all processes belonging to program P" must be possible.
Here's the question: How can one provide such a system within the Linux process model?
The naive method:
Start program with fork(), exec(), setuid(), etc..
Write the child PID (plus its start timestamp, from /proc/stat, to uniquely and permanently identify it) to a file.
To stop a single process, set SIGTERM to PID.
To find all processes, inspect /proc to build the process hiearchy based on the PID.
This method has a big hole: Any process may fork and break out of its process group. It's not sufficient to look at the process hierarchy. After a program has created new processes, it's not possible to trace their origin back to the original program.
A workaround would be to ensure that each program is started with a unique UID. This is not desirable or particularly workable, since a (human) user may define any number of programs; the system would then have to programmatically create new, unique users for each program.
My only idea so far is to inject a special, reserved environment variable into the program's initial process, ie., run the program with env PROGRAM=myprogram <command line>. The system could then mandate that all processes must inherit their parent's environment. At regular intervals, the system could trawl /proc and forcibly kill any process missing the PROGRAM environment variable.
Are there any secrets in the Linux syscall API that I could use?
(1) The action "stop all of program P's processes, including children/forks" must be possible. (2) The action "find all processes belonging to program P" must be possible.
cgroups implement this, and systemd is perhaps the heaviest user to date to make use of (2) to achieve (1). You can break out of progress groups, but not cgroups.

Resources