mpiexec : why is my python program running on only one CPU? - python-3.x

I am trying to parallelize a python program (program_to_parallelize.py) into 16 subprocesses on my 16 cores machine. I use this code, which is part of a Python script :
import subprocess
subprocess.call("mpiexec -n 16 python program_to_parallelize.py", shell=True)
This runs without any error but when I look at CPUs usage, I see that all subprocesses are running on one single CPU. (Click
here to see what I get when typing "top 1" in command line) But I would prefer that the 16 processes each take 100% of one CPU rather than all sharing the first one.
I am working on a 16 cores Ubuntu 16.04.6 LTS.
I use version 3.0.3 of mpi4py
I use version 3.3.2 of mpiexec

I figured it out actually. One solution is to bind each process to a CPU after starting the execution. To do this, you can use this command :
taskset -pc [CPU number] [process ID]
for example :
taskset -pc 2 3039
You can find more details about how to assign a process to a CPU on this website : https://www.hecticgeek.com/2012/03/assign-process-cpu-ubuntu-linux/

Related

Count active processes in Linux Terminal

I'm looking for 2 Ubuntu terminal commands that will:
Show currently running processes (or tasks).
Count how many currently running processes (or tasks) there are.
I am windows user and I'm not sure if it's named tasks or processes, but I'm looking for the same thing that's displayed when I open windows task manager.
It's very easy using the ps utility. That will return a list of
PID: Process identification number.
TTY: The type of terminal the process is running on.
TIME: Total amount of CPU usage.
CMD: The name of the command that started the process.
You can also combine/pipe it with wc to count the lines produced by it, hence counting the number of processes, like this:
$ ps -e | wc -l
You can find more info here and here.

Proper way to use taskset command

I am getting my available CPU that i have with this command
cat /proc/cpuinfo | grep processor |wc -l
It says, i have available 4 cores (actually 2 physical cores and others logicals)
Then i run my task python3 mytask.py from the command line. After run my program, i want to change its pinned core, as only in core0 or core3 or only core0 and core2
I know i can do it with os.sched_setaffinity() function but i want to do that using taskset command
I am trying this ;
taskset -pc 2 <pid> Can i run this command only checking my available CPU number ?
or do i have to check eligible cores for my task before the run taskset command ?
will linux kernel give me a guarantee to accept my new affinity list if it is between 0 and 4 ?
For example i have 4 CPUs available, and when i want to change kworker thread affinity core0 to core1, it failed. Then i checked allowed CPUs for kworker thread with this command ;
cat /proc/6/status |grep "Cpus_allowed_list:"
it says current affinity list: 0
Do i need to check "Cpus_allowed_list" when i want to run taskset command to change affinity list ?

Two xwin-xdg-menu processes with high CPU consumption

I have a Windows 7 computer with intel i7 with 2 cores and hyperthreading and a linux virtual machine in a cloud. I don't like VNC (it's laggy) so I use X windowing.
I start my Cygwin XWin with the following command:
C:\cygwin64\bin\run.exe --quote /usr/bin/bash.exe -l -c "cd; /usr/bin/xinit /etc/X11/xinit/startxwinrc -- /usr/bin/XWin :0 -multiwindow -listen tcp"
It's working otherwise just as intended but for some reason it's spawning two xwin-xdg-menu processes of which the other one is consuming 25% of my CPU. When I kill it, the CPU usage returns to normal and everything is working fine, including the other xwin-xdg-menu process.
I tried also this:
C:\cygwin64\bin\XWin.exe :0 -multiwindow -listen tcp
but it makes the application run slowly and with a bad resolution.
Is there a way to start X with listen-tcp with an adapted resolution to my multiple screens I have and without having to manually kill the extra process every time?
It seems I'm not the only one with this problem but for now I haven't found any solution to this.
https://cygwin.com/ml/cygwin/2017-05/msg00345.html
https://superuser.com/questions/1210325/cygwin-at-spi-bus-launcher-and-xwin-xdg-menu-high-cpu (I don't have problems with at-spi-bus-launcher though)
Solution:
Create a ~/.startxwinrc file, and add one line:
exec sleep infinity
Make ~/.startxwinrc executable by running chmod +x ~/.startxwinrc.
Reason I suspect this worked:
startxwin searches for a ~/.startxwinrc file to execute when launching. If startxwin does not find a ~/.startxwinrc file, startxwin will follow the default routine outlined in /etc/X11/xinit/startxwinrc.
The default routine launches /usr/bin/xwin-xdg-menu, somehow causing me to have two xwin-xdg-menu processes, one of them with very high cpu. Creating ~/.startxwinrc bypasses the default routine, disabling /usr/bin/xwin-xdg-menu from launching altogether.
exec sleep infinity keeps the x server alive after launching.

Getting Julia SharedArrays to play nicely with Sun Grid Engine

I have been trying to get a Julia program to run correctly in an SGE environment with SharedArrays. I read several threads on Julia and SGE, but most of them seem to deal with MPI. The function bind_pe_procs() from this Gist seems to correctly bind the processes to a local environment. A script like
### define bind_pe_procs() as in Gist
### ...
println("Started julia")
bind_pe_procs()
println("do SharedArrays initialize correctly?")
x = SharedArray(Float64, 3, pids = procs(), init = S -> S[localindexes(S)] = 1.0)
pids = procs(x)
println("number of workers: ", length(procs()))
println("SharedArrays map to ", length(pids), " workers")
yields the following output:
starting qsub script file
Mon Oct 12 15:13:38 PDT 2015
calling mpirun now
exception on 2: exception on exception on 4: exception on exception on 53: : exception on exception on exception on Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-8.local","n"=>"5"}]compute-0-8.local
ASCIIString["compute-0-8.local","compute-0-8.local","compute-0-8.local","compute-0-8.local"]adding machines to current system
done
do SharedArrays initialize correctly?
number of workers: 5
SharedArrays map to 5 workers
Curiously, this doesn't seem to work if I need to load the array from a file and convert to SharedArray format with the command convert(SharedArray, vec(readdlm(FILEPATH))). If the script is
println("Started julia")
bind_pe_procs()
### script reads arrays from file and converts to SharedArrays
println("running script...")
my_script()
then the result is garbage:
starting qsub script file
Mon Oct 19 09:18:29 PDT 2015
calling mpirun now Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-5.local","n"=>"11"}]compute-0-5.local
ASCIIString["compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0- 5.local"]adding machines to current system
done
running script...
Current number of processes: [1,2,3,4,5,6,7,8,9,10,11]
SharedArray y is seen by [1] processes
### tons of errors here
### important one is "SharedArray cannot be used on a non-participating process"
so somehow the SharedArrays do not map correctly to all cores. Does anybody have any suggestions or insights into this problem?
One workaround that I have used in my work is to simply force SGE to submit a job to a particular node and then limit the parallel environment to the number of cores that I want to use.
Below I provide an SGE qsub script for a 24-core node where I want to use only 6 cores.
#!/bin/bash
# lots of available SGE script options, only relevant ones included below
# request processes in parallel environment
#$ -pe orte 6
# use this command to dump job on a particular queue/node
#$ -q all.q#compute-0-13
/share/apps/julia-0.4.0/bin/julia -p 5 MY_SCRIPT.jl
Pro: plays nicely with SharedArrays.
Con: the job will wait in the queue until the node has sufficient cores available.

How to run a MPI task?

I am newbie in Linux and recently started working with our university super-computer and I need to install my program ( GAMESS Quantum Chemistry Software ) on my own allocated space. I have installed and ran it successfully under 'sockets' but actually I need to run it under 'mpi' ( otherwise there will be little advantage of using a super-computer ).
System Setting:
OS: Linux64 , Redhat, intel
MPI: impi
compiler: ifort
modules: slurm , intel/intel-15.0.1 , intel/impi-15.0.1
This software runs ' rungms ' and receives arguments as:
rungms [fileName][Version][CPU count ] ( for example: ./rungms Opt 00 4 )
Here is my bash file ( my feeling is this is the main culprit for my problem !):
#!/bin/bash
#Based off of Monte's Original Script for Torque:
#https://gist.github.com/mlunacek/6306340#file-matlab_example-pbs
#These are SBATCH directives specifying name of file, queue, the
#Quality of Service, wall time, Node Count, #of CPUS, and the
#destination output file (which appends node hostname and JobID)
#SBATCH -J OptMPI
#SBATCH --qos janus-debug
#SBATCH -t 00-00:10:00
#SBATCH -N2
#SBATCH --ntasks-per-node=1
#SBATCH -o output-OptMPI-%N-JobID-%j
#NOTE: This Module Will Be Replaced With Slurm Specific:
module load intel/impi-15.0.1
mpirun /projects/augenda/gamess/rungms Opt 00 2 > OptMPI.out
As I said before, the program is compiled for mpi ( and not 'sockets' ) .
My problem is when I run run sbatch Opt.sh , I receive this error:
srun: error: PMK_KVS_Barrier duplicate request from task 1
when I change -N number , sometimes I receive error saying (4 !=2
).
with odd number of -N I receive error saying it expects even number of processes.
What am I missing ?
Here is the code from our super-computer website as a bash file example
The Slurm Workload Manager has a few ways of invoking an Intel MPI process. Likely, all you have to do is use srun rather than mpirun in your case. If errors are still present, refer here for alternative ways to invoke Intel MPI jobs; it's rather dependent on how the HPC admins configured the system.

Resources