Running one Open MPI program from another Open MPI program with different number of processes - openmpi

Is it possible to run an open mpi program with 4 processes from another open mpi program which is started using 2 processes. For example, I have program1.exe and program2.exe. My goal is to run program1.exe using 2 processes of which 1 process does some work whereas, another process runs another program "program2.exe" with 4 processes. To run program1.exe I use "mpirun -np 2 program1.exe". From within this program I want to run program2.exe as "mpirun -np 4 program2.exe". Is this possible?

Related

Can threads of a process made run on different CPUS

Would like to know, if thread of a process can be made to run on different set of CPU's in Linux?
For instance, let's say we start a process with 30 threads then first 15 threads from this process is made to run on core 0-14 using taskset program, and rest of threads on core 15-29?
Is above configuration possible?

Python 3.6: Will running the same multi-threaded python script in separate CMD sessions get around Pythons GIL issue

I have a Python 3.6 script which is multi-threaded and runs on a windows 10 machine with 12 cores.
I want to run multiple instances of the script but I am worried about Pythons GIL issue in terms of the performance of each instance of the script.
Would doing any of the below work around this as I am under the impression that when I run an instance of the script, the Python process it runs within is running on just one CPU core and each thread called from within the script 'time slices' on that one core...
Therefore would:
A: Starting each instance of the script in its own CMD window allow the OS to automatically deal with starting each scripts parent Python processes on its own core and prevent any locking taking place...
or
B: Starting each CMD session via a shortcut that sets its affinity to a specific core and then run the Python script so that the python process and its threads run on the specific core the CMD process has been set to use...
or
C: My understanding of how threading and the Python GIL works is not correct and I need understand...
Any help will be much appreciated.
I want to run multiple instances of the script
Then go ahead and do it!
I am worried about Pythons GIL issue in terms of the performance of each instance of the script.
The "GIL issue" is only an issue within any single multi-threaded Python process: No more than one thread belonging to the same Python process will ever execute Python code at the same instant. But, each Python process has its own GIL, and nothing prevents a thread in one Python process from running at the same time as a thread in some other Python process.
would (A) Starting each instance of the script in its own CMD window [...run each] Python processes on its own core.
No, because it is not normal for any process on an SMP platform (i.e., most multi-processor systems) to have "its own core." Cores are a resource that the operating system uses to run threads/processes, and applications normally have no reason to know or care which core is running which thread/process at any given moment in time.
(B) Starting each CMD session via a shortcut that sets its affinity to a specific core.
Some operating systems let you do that, but usually there is no benefit unless your application has very special needs. Normally, you should just trust the operating system scheduler to schedule processes and threads as efficiently as possible.

What process number should I put in my supervisor config file?

So how many processes should I run at the same time for a project in my supervisor config file under numprocs?
And what are the advantages of having multiple ones running at the same time? Wouldn't it be faster if there was just 1 process?
numprocs controls how many processes supervisord will run at the same time. If you just want to run a simple program, you'd leave this unset; the default is 1.
This setting would be useful if you have a server process which needs many copies running as the targets of a load balancer, for example. Or if you have a program which can run one instance per CPU core to do some work in parallel. But most programs wouldn't fit this description.

Hyrbid MPI / OpenMP

I've been trying to use OpenMPI with OpenMP and when I run try to run 2 MPI processes and 4 threads on one machine, all threads are executed on the same core at 25% usage instead of on 4 separate cores. I was able to fix this using --enable-mpi-threads when building OpenMPI; but now I am having an issue with this being a duel CPU machine.
There are 8 cores per processor, 2 processors in each server. If I run 2 MPI processes and 8 threads then everything is fine as long as the 2 processes started on separate processors, but if I try and do 1 MPI process with 16 threads it reverts to stacking every thread on one core.
Has anyone had any experience running OpenMPI and OpenMP together?

How to run processes piped with bash on multiple cores?

I have a simple bash script that pipes output of one process to another. Namely:.
dostuff | filterstuff
It happens that on my Linux system (openSUSE if it matters, kernel 2.6.27) these both processes run on a single core. However, running different processes on different cores is a default policy that doesn't happen to trigger in this case.
What component of the system is responsible for that and what should I do to utilize multicore feature?
Note that there's no such problem on 2.6.30 kernel.
Clarification: Having followed Dennis Williamson's advice, I made sure with top program, that piped processes are indeed always run on the same processor. Linux scheduler, which usually does a really good job, this time doesn't do it.
I figure that something in bash prevents OS from doing it. The thing is that I need a portable solution for both multi-core and single-core machines. The taskset solution proposed by Dennis Williamson won't work on single-core machines. Currently I'm using:,
dostuff | taskset -c 0 filterstuff
but this seems like a dirty hack. Could anyone provide a better solution?
Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.
If your kernel is built with CONFIG_SCHED_DEBUG=y,
# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features
should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)
If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.
Give this a try to set the CPU (processor) affinity:
taskset -c 0 dostuff | taskset -c 1 filterstuff
Edit:
Try this experiment:
create a file called proctest and chmod +x proctest with this as the contents:
#!/bin/bash
while true
do
ps
sleep 2
done
start this running:
./proctest | grep bash
in another terminal, start top - make sure it's sorting by %CPU
let it settle for several seconds, then quit
issue the command ps u
start top -p with a list of the PIDs of the highest several processes, say 8 of them, from the list left on-screen by the exited top plus the ones for proctest and grep which were listed by ps - all separated by commas, like so (the order doesn't matter):
top -p 1234, 1255, 1211, 1212, 1270, 1275, 1261, 1250, 16521, 16522
add the processor field - press f then j then Space
set the sort to PID - press Shift+F then a then Space
optional: press Shift+H to turn on thread view
optional: press d and type .09 and press Enter to set a short delay time
now watch as processes move from processor to processor, you should see proctest and grep bounce around, sometimes on the same processor, sometimes on different ones
The Linux scheduler is designed to give maximum throughput, not do what you imagine is best. If you're running processes which are connected with a pipe, in all likelihood, one of them is blocking the other, then they swap over. Running them on separate cores would achieve little or nothing, so it doesn't.
If you have two tasks which are both genuinely ready to run on the CPU, I'd expect to see them scheduled on different cores (at some point).
My guess is, what happens is that dostuff runs until the pipe buffer becomes full, at which point it can't run any more, so the "filterstuff" process runs, but it runs for such a short time that dostuff doesn't get rescheduled until filterstuff has finished filtering the entire pipe buffer, at which point dostuff then gets scheduled again.

Resources