Multithreading on SLURM - linux

I have a Perl script that forks using the Parallel::ForkManager module.
To my knowledge, if I fork 32 child processes and ask the SLURM scheduler to run the job on 4 nodes, 8 processors per node, the code will execute each child process on every core.
Someone in my lab said that if I run a job on multiple nodes that the other nodes are not used, and I'm wasting time and money. Is this accurate?
If I use a script that forks am I limited to one node with SLURM?

As far as I know Parallel::ForkManager doesn't make use of MPI, so if you're using mpirun I don't see how it's going to communicate across nodes. A simple test is to have each child output hostname.
One thing that commonly happens with non-MPI software launched with mpirun is that you duplicate all your effort across all nodes, so that they are all doing the exact same thing instead of sharing the work. If you use Parallel::MPI it should work just fine.

Related

How to tell a single threaded process to use alternately multiple cpus

I need for a specific reason of my work to configure a process in Ubuntu 14.10 Linux distro to use alternately different CPUs.
I know with taskset I can pin a process on a specified CPUs but how I tell that process to use let say 50% of CPU 0 and 50% of CPU 1.
UPDATE
The process I'm runing is single threaded, maybe I'm wrong to try to tell a single threaded process to use two CPUs with roundrobin algorithm.
Thank you for your insights and helps.
Regards.
I would use task set to pin two processes to the two different cores, then use a message passing protocol to switch the processing from one core to the other and back again. I'm assuming you don't want to concurrently execute a task across two cores.

How to control the number of threads/cores used?

I am running Spark on a local machine, with 8 cores, and I understand that I can use "local[num_threads]" as the master, and use "num_threads" in the bracket to specify the number of threads used by Spark.
However, it seems that Spark often uses more threads than I required. For example, if I only specify 1 thread for Spark, by using the top command on Linux, I can still observe that the cpu usage is often more than 100% and even 200%, implying that more than 1 threads are actually used by Spark.
This may be a problem if I need to run multiple programs concurrently. How can I control the number of threads/cores used strictly by Spark?
Spark uses one thread for its scheduler, which explains the usage pattern you see it. If you launch n threads in parallel, you'll get n+1 cores used.
For details, see the scheduling doc.

SGE low priority single core jobs preventing multicore jobs

I'm running SGE (6.2u5p2) on our beowulf cluster. I've got a couple of users who submit 10s of thousands of short (<15minute) jobs that are a low priority (i.e. they've set the jobs low priority so anyone can jump ahead of them). This works really well for other users running single core jobs, however anyone wishing to run something with multiple threads aren't able to. The single core jobs keep skipping ahead, never allowing (say 6 cores) to be available.
I don't really want to separate the users into two queues (i.e. single and multicore) since those using the multicore jobs use it briefly and then there are multiple cores left unused.
Is there a way in SGE to allow multi core jobs to reserve slots?
Many thanks,
Rudiga
As "High Performance Mark" eludes to, using the -R option may help. See:
http://www.ace-net.ca/wiki/Scheduling_Policies_and_Mechanics#Reservation

Will forking more workers allow me to balance CPU-heavy work?

I love node.js' evented model, but it only takes you so far - when you have a function (say, a request handler for HTTP connections) that does a lot of heavy work on the CPU, it's still "blocking" until its function returns. That's to be expected. But what if I want to balance this out a bit, so that a given requests takes longer to process but the overall response time is shorter, using the operarting system's ability to schedule the processes?
My production code uses node's wonderfully simple Cluster module to fork a number of workers equal to the number of cores the system's CPU has. Would it be bad to fork more than this - perhaps two or three workers per core? I know there'll be a memory overhead here, but memory is not my limitation. What reading I did mentioned that you want to avoid "oversubscribing", but surely on a modern system you're not going crazy by having two or three processes vying for time on the processor.
I think your idea sounds like a good one; especially because many processors support hyperthreading. Hyperthreading is not magical and won't suddenly double your application's speed or throughput but it can make sense to have another thread ready to execute in a core when the first thread needs to wait for a memory request to be filled.
Be careful when you start multiple workers: the Linux kernel really prefers to keep processes executing on the same processor for their entire lifetime to provide for strong cache affinity. This makes enough sense. But I've seen several CPU-hungry processes vying for a single core or worse a single hyperthread instance rather than the system re-balancing the processes across all cores or all siblings. Check your processor affinities by running ps -eo pid,psr,comm (or whatever your favorite ps(1) command is; add the psr column).
To combat this you might want to start your workers with an explicitly limited CPU affinity:
taskset -c 0,1 node worker 1
taskset -c 2,3 node worker 2
taskset -c 4,5 node worker 3
taskset -c 6,7 node worker 4
Or perhaps start eight, one per HT sibling, or eight and confine each one to their own set of CPUs, or perhaps sixteen, confine four per core or two per sibling, etc. (You can go nuts trying to micromanage. I suggest keeping it simple if you can.) See the taskset(1) manpage for details.

using qsub (sge) with multi-threaded applications

i wanted to submit a multi-threaded job to the cluster network i'm working with -
but the man page about qsub is not clear how this is done - By default i guess it just sends it as a normal job regardless of the multi-threading - but this might cause problems, i.e. sending many multi-threaded jobs to the same computer, slowing things down.
Does anyone know how to accomplish this? thanks.
The batch server system is sge.
In SGE/UGE the configuration is set by the administrator so you have to check what they've called the parallel environments
qconf -spl
make
our_paraq
look for one with $pe_slots in the config
qconf -sp make
qconf -sp our_paraq
qsub with that environment and number of cores you want to use
qsub -pe our_paraq 8 -cwd ./myscript
If you're using mpi you have more choices for the config allocation rule ($pe_slots above) like $round_robin and $fill_up, but this should get you going.
If your job is multithreaded, you can harness the advantage of multithreading even in SGE. In SGE single job can use one or many CPUs. If you submit a job that uses single processor and you have a program that have many threads than a single processor can handle, problem occurs. Verify how many processors your job is using and how many threads per CPU your program is creating.
In my case i have a java program that uses one processor with two threads, it works pretty efficiently. i submit same java program for execution to many CPUs with 2 threads each to make it parallel as i have not use MPI.
The answer by the user "j_m" is very helpful, but in my case I needed to both request multiple cores AND submit my job to a specific node. After a copious amount of searching, I finally found a solution that worked for me and I'm posting it here so that other people who might have a similar problem don't have to go through the same pain (please note that I'm leaving this as an answer instead of a reply because I don't have enough reputation for making replies):
qsub -S /bin/sh -cwd -l h=$NODE_NAME -V -pe $ENV_NAME $N_OF_CORES $SCRIPT_NAME
I think the variables $NODE_NAME, $N_OF_CORES and $SCRIPT_NAME are pretty straightforward. You can easily find $ENV_NAME by following the answer by "j_m".

Resources