Getting Julia SharedArrays to play nicely with Sun Grid Engine

Getting Julia SharedArrays to play nicely with Sun Grid Engine - multithreading

I have been trying to get a Julia program to run correctly in an SGE environment with SharedArrays. I read several threads on Julia and SGE, but most of them seem to deal with MPI. The function bind_pe_procs() from this Gist seems to correctly bind the processes to a local environment. A script like
### define bind_pe_procs() as in Gist
### ...
println("Started julia")
bind_pe_procs()
println("do SharedArrays initialize correctly?")
x = SharedArray(Float64, 3, pids = procs(), init = S -> S[localindexes(S)] = 1.0)
pids = procs(x)
println("number of workers: ", length(procs()))
println("SharedArrays map to ", length(pids), " workers")
yields the following output:
starting qsub script file
Mon Oct 12 15:13:38 PDT 2015
calling mpirun now
exception on 2: exception on exception on 4: exception on exception on 53: : exception on exception on exception on Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-8.local","n"=>"5"}]compute-0-8.local
ASCIIString["compute-0-8.local","compute-0-8.local","compute-0-8.local","compute-0-8.local"]adding machines to current system
done
do SharedArrays initialize correctly?
number of workers: 5
SharedArrays map to 5 workers
Curiously, this doesn't seem to work if I need to load the array from a file and convert to SharedArray format with the command convert(SharedArray, vec(readdlm(FILEPATH))). If the script is
println("Started julia")
bind_pe_procs()
### script reads arrays from file and converts to SharedArrays
println("running script...")
my_script()
then the result is garbage:
starting qsub script file
Mon Oct 19 09:18:29 PDT 2015
calling mpirun now Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-5.local","n"=>"11"}]compute-0-5.local
ASCIIString["compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0- 5.local"]adding machines to current system
done
running script...
Current number of processes: [1,2,3,4,5,6,7,8,9,10,11]
SharedArray y is seen by [1] processes
### tons of errors here
### important one is "SharedArray cannot be used on a non-participating process"
so somehow the SharedArrays do not map correctly to all cores. Does anybody have any suggestions or insights into this problem?

One workaround that I have used in my work is to simply force SGE to submit a job to a particular node and then limit the parallel environment to the number of cores that I want to use.
Below I provide an SGE qsub script for a 24-core node where I want to use only 6 cores.
#!/bin/bash
# lots of available SGE script options, only relevant ones included below
# request processes in parallel environment
#$ -pe orte 6
# use this command to dump job on a particular queue/node
#$ -q all.q#compute-0-13
/share/apps/julia-0.4.0/bin/julia -p 5 MY_SCRIPT.jl
Pro: plays nicely with SharedArrays.
Con: the job will wait in the queue until the node has sufficient cores available.

Related

mpiexec : why is my python program running on only one CPU?

I am trying to parallelize a python program (program_to_parallelize.py) into 16 subprocesses on my 16 cores machine. I use this code, which is part of a Python script :
import subprocess
subprocess.call("mpiexec -n 16 python program_to_parallelize.py", shell=True)
This runs without any error but when I look at CPUs usage, I see that all subprocesses are running on one single CPU. (Click
here to see what I get when typing "top 1" in command line) But I would prefer that the 16 processes each take 100% of one CPU rather than all sharing the first one.
I am working on a 16 cores Ubuntu 16.04.6 LTS.
I use version 3.0.3 of mpi4py
I use version 3.3.2 of mpiexec

I figured it out actually. One solution is to bind each process to a CPU after starting the execution. To do this, you can use this command :
taskset -pc [CPU number] [process ID]
for example :
taskset -pc 2 3039
You can find more details about how to assign a process to a CPU on this website : https://www.hecticgeek.com/2012/03/assign-process-cpu-ubuntu-linux/

How to control multithreaded background jobs in for loop in shell script

I found that my linux workstation with 12 CPUs had almost stopped to work after I executed a shell script (tcsh) having a for-loop where more than hundreds of loops are executed simultaneously by adding '&' at the end of the command. Is there any way to control the number or executing time for background processes in the for-loop using tcsh?

GNU Parallel is made for this kind of situations.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

How can I get detailed job run info from SLURM (e.g. like that produced for "standard output" by LSF)?

When using bsub with LSF, the -o option gave a lot of details such as when the job started and ended and how much memory and CPU time the job took. With SLURM, all I get is the same standard output that I'd get from running a script without LSF.
For example, given this Perl 6 script:
warn "standard error stream";
say "standard output stream";
Submitted thus:
sbatch -o test.o%j -e test.e%j -J test_warn --wrap 'perl6 test.p6'
Resulted in the file test.o34380:
Testing standard output
and the file test.e34380:
Testing standard Error in block <unit> at test.p6:2
With LSF, I'd get all kinds of details in the standard output file, something like:
Sender: LSF System <lsfadmin#my_node>
Subject: Job 347511: <test> Done
Job <test> was submitted from host <my_cluster> by user <username> in cluster <my_cluster_act>.
Job was executed on host(s) <my_node>, in queue <normal>, as user <username> in cluster <my_cluster_act>.
</home/username> was used as the home directory.
</path/to/working/directory> was used as the working directory.
Started at Mon Mar 16 13:10:23 2015
Results reported at Mon Mar 16 13:10:29 2015
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
perl6 test.p6
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 0.19 sec.
Max Memory : 0.10 MB
Max Swap : 0.10 MB
Max Processes : 2
Max Threads : 3
The output (if any) follows:
standard output stream
PS:
Read file <test.e_347511> for stderr output of this job.
Update:
One or more -v flags to sbatch gives more preliminary information, but doesn't change the standard output.
Update 2:
Use seff JOBID for the desired info (where JOBID is the actual number). Just be aware that it collects data once a minute, so it might say that your max memory usage was 2.2GB, even though your job was killed due to using more than the 4GB of memory you requested.

For recent jobs, try
sacct -l
Look under the "Job Accounting Fields" section of the documentation for descriptions of each of the three dozen or so columns in the output.
For just the job ID, maximum RAM used, maximum virtual memory size, start time, end time, CPU time in seconds, and the list of nodes on which the jobs ran. By default this just gives info on jobs run the same day (see --starttime or --endtime options for getting info on jobs from other days):
sacct --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTimeRAW,NodeList
This will give you output like:
JobID MaxRSS MaxVMSize Start End CPUTimeRAW NodeList
------------ ------- ---------- ------------------- ------------------- ---------- --------
36511 2015-04-29T11:34:37 2015-04-29T11:34:37 0 c50b-20
36511.batch 660K 181988K 2015-04-29T11:34:37 2015-04-29T11:34:37 0 c50b-20
36514 2015-04-29T12:18:46 2015-04-29T12:18:46 0 c50b-20
36514.batch 656K 181988K 2015-04-29T12:18:46 2015-04-29T12:18:46 0 c50b-20
Use --state COMPLETED for checking previously completed jobs. When checking a state other than RUNNING, you have to give a start or end time.
sacct --starttime 08/01/15 --state COMPLETED --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTImeRaw,NodeList,ReqCPUS,ReqMem,Elapsed,Timelimit
You can also get work directory about the job using scontrol:
scontrol show job 36514
Which will give you output like:
JobId=36537 JobName=sbatch
UserId=username(123456) GroupId=my_group(678)
......
WorkDir=/path/to/work/dir
However, by default, scontrol can only access that information for about five minutes after the job finishes, after which it is purged from memory.

At the end of each job I use to insert
sstat -j $SLURM_JOB_ID.batch --format=JobID,MaxVMSize
to add RAM usage to the standard output.

How to run a MPI task?

I am newbie in Linux and recently started working with our university super-computer and I need to install my program ( GAMESS Quantum Chemistry Software ) on my own allocated space. I have installed and ran it successfully under 'sockets' but actually I need to run it under 'mpi' ( otherwise there will be little advantage of using a super-computer ).
System Setting:
OS: Linux64 , Redhat, intel
MPI: impi
compiler: ifort
modules: slurm , intel/intel-15.0.1 , intel/impi-15.0.1
This software runs ' rungms ' and receives arguments as:
rungms [fileName][Version][CPU count ] ( for example: ./rungms Opt 00 4 )
Here is my bash file ( my feeling is this is the main culprit for my problem !):
#!/bin/bash
#Based off of Monte's Original Script for Torque:
#https://gist.github.com/mlunacek/6306340#file-matlab_example-pbs
#These are SBATCH directives specifying name of file, queue, the
#Quality of Service, wall time, Node Count, #of CPUS, and the
#destination output file (which appends node hostname and JobID)
#SBATCH -J OptMPI
#SBATCH --qos janus-debug
#SBATCH -t 00-00:10:00
#SBATCH -N2
#SBATCH --ntasks-per-node=1
#SBATCH -o output-OptMPI-%N-JobID-%j
#NOTE: This Module Will Be Replaced With Slurm Specific:
module load intel/impi-15.0.1
mpirun /projects/augenda/gamess/rungms Opt 00 2 > OptMPI.out
As I said before, the program is compiled for mpi ( and not 'sockets' ) .
My problem is when I run run sbatch Opt.sh , I receive this error:
srun: error: PMK_KVS_Barrier duplicate request from task 1
when I change -N number , sometimes I receive error saying (4 !=2
).
with odd number of -N I receive error saying it expects even number of processes.
What am I missing ?
Here is the code from our super-computer website as a bash file example

The Slurm Workload Manager has a few ways of invoking an Intel MPI process. Likely, all you have to do is use srun rather than mpirun in your case. If errors are still present, refer here for alternative ways to invoke Intel MPI jobs; it's rather dependent on how the HPC admins configured the system.

linux batch jobs in parallel

I have seven licenses of a particular software. Therefore, I want to start 7 jobs simultaneously. I can do that using '&'. Now, 'wait' command waits till the end of all of those 7 processes to be finished to spawn the next 7. Now, I would like to write the shell script where after I start the first seven, as and when a job gets completed I would like to start another. This is because some of those 7 jobs might take very long while some others get over really quickly. I don't want to waste time waiting for all of them to finish. Is there a way to do this in linux? Could you please help me?
Thanks.

GNU parallel is the way to go. It is designed for launching multiples instances of a same command, each with a different argument retrieved either from stdin or an external file.
Let's say your licensed script is called myScript, each instance having the same options --arg1 --arg2 and taking a variable parameter --argVariable for each instance spawned, those parameters being stored in file myParameters :
cat myParameters | parallel -halt 1 --jobs 7 ./myScript --arg1 --argVariable {} --arg2
Explanations :
-halt 1 tells parallel to halt all jobs if one fails
--jobs 7 will launch 7 instances of myScript
On a debian-based linux system, you can install parallel using :
sudo apt-get install parallel
As a bonus, if your licenses allow it, you can even tell parallel to launch these 7 instances amongst multiple computers.

You could check how many are currently running and start more if you have less than 7:
while true; do
if [ "`ps ax -o comm | grep process-name | wc -l`" -lt 7 ]; then
process-name &
fi
sleep 1
done

Write two scripts. One which restarts a job everytime it is finished and one that starts 7 times the first script.
Like:
script1:
./script2 job1
...
./script2 job7
and
script2:
while(...)
./jobX

I found a fairly good solution using make, which is a part of the standard distributions. See here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting Julia SharedArrays to play nicely with Sun Grid Engine - multithreading

Related

mpiexec : why is my python program running on only one CPU?

How to control multithreaded background jobs in for loop in shell script

How can I get detailed job run info from SLURM (e.g. like that produced for "standard output" by LSF)?

How to run a MPI task?

linux batch jobs in parallel

Categories

Resources