using qsub (sge) with multi-threaded applications - linux

i wanted to submit a multi-threaded job to the cluster network i'm working with -
but the man page about qsub is not clear how this is done - By default i guess it just sends it as a normal job regardless of the multi-threading - but this might cause problems, i.e. sending many multi-threaded jobs to the same computer, slowing things down.
Does anyone know how to accomplish this? thanks.
The batch server system is sge.

In SGE/UGE the configuration is set by the administrator so you have to check what they've called the parallel environments
qconf -spl
make
our_paraq
look for one with $pe_slots in the config
qconf -sp make
qconf -sp our_paraq
qsub with that environment and number of cores you want to use
qsub -pe our_paraq 8 -cwd ./myscript
If you're using mpi you have more choices for the config allocation rule ($pe_slots above) like $round_robin and $fill_up, but this should get you going.

If your job is multithreaded, you can harness the advantage of multithreading even in SGE. In SGE single job can use one or many CPUs. If you submit a job that uses single processor and you have a program that have many threads than a single processor can handle, problem occurs. Verify how many processors your job is using and how many threads per CPU your program is creating.
In my case i have a java program that uses one processor with two threads, it works pretty efficiently. i submit same java program for execution to many CPUs with 2 threads each to make it parallel as i have not use MPI.

The answer by the user "j_m" is very helpful, but in my case I needed to both request multiple cores AND submit my job to a specific node. After a copious amount of searching, I finally found a solution that worked for me and I'm posting it here so that other people who might have a similar problem don't have to go through the same pain (please note that I'm leaving this as an answer instead of a reply because I don't have enough reputation for making replies):
qsub -S /bin/sh -cwd -l h=$NODE_NAME -V -pe $ENV_NAME $N_OF_CORES $SCRIPT_NAME
I think the variables $NODE_NAME, $N_OF_CORES and $SCRIPT_NAME are pretty straightforward. You can easily find $ENV_NAME by following the answer by "j_m".

Related

How to optimize multithreaded program for use in LSF?

I am working on a multithreaded number crunching app, let's call it myprogram. I plan to run myprogram on IBM's LSF grid. LSF allows a job to scheduled on CPUs from different machines. For example, bsub -n 3 ... myprogram ... can allocate two CPUs from node1 and one CPU from node2.
I know that I can ask LSF to allocate all 3 cores in the same node, but I am interested in the case where my job is scheduled onto different nodes.
How does LSF manage this? Will myprogram be run in two different processes in node1 and node2?
Does LSF automatically manage data transfer between node1 and node2?
Anything I can do in myprogram to make this easy for LSF to manage? Should I be making use of any LSF libraries?
Answer to Q1
When you submit a job like bsub -n 3 myprogram, all LSF does is allocate 3 slots across 1-3 hosts. One of these hosts will be designated as the "first execution host", and LSF will dispatch and run a single instance of myprogram on that host.
If you want to run myprogram in parallel, LSF has a command called blaunch which will essentially launch one instance of a program per allocated core. For example, submit your job like bsub -n 3 blaunch myprogram will run 3 instances of myprogram.
Answer to Q2
By "manage data transfer" I assume you mean communication between the instances of myprogram. The answer is no, LSF is a scheduling and dispatching tool. All it does is allocation and dispatch, but it has no knowledge of what the dispatched program is doing. blaunch in turn is simply a task launcher, it just launches multiple instances of a task.
What you're after here is some kind of parallel programming framework like MPI (see for example www.openmpi.org). This provides a set of APIs and commands that allow you to write myprogram in a parallel fashion.
Once you've done that and turned your program in to mympiprogram, you can submit it to LSF like bsub -n 3 mpirun mympiprogram. The mpirun tool - at least in the case of OpenMPI (and some others) - integrates with LSF, and uses the blaunch interface under the hood to launch your tasks for you.
Answer to Q3
You don't need to use LSF libraries in your program to make anything easier for LSF, like I said what's going on inside the program is transparent to the system. LSF libraries just enable your program to become a client of the LSF system (submit jobs, query, etc...)

Multithreading on SLURM

I have a Perl script that forks using the Parallel::ForkManager module.
To my knowledge, if I fork 32 child processes and ask the SLURM scheduler to run the job on 4 nodes, 8 processors per node, the code will execute each child process on every core.
Someone in my lab said that if I run a job on multiple nodes that the other nodes are not used, and I'm wasting time and money. Is this accurate?
If I use a script that forks am I limited to one node with SLURM?
As far as I know Parallel::ForkManager doesn't make use of MPI, so if you're using mpirun I don't see how it's going to communicate across nodes. A simple test is to have each child output hostname.
One thing that commonly happens with non-MPI software launched with mpirun is that you duplicate all your effort across all nodes, so that they are all doing the exact same thing instead of sharing the work. If you use Parallel::MPI it should work just fine.

Perl Crons Jobs = High Server Load?

I have multiple cron jobs written in Perl and they appear to be causing a high server load.
When I run top through SSH it shows that Perl is using the most CPU and memory. However, since there's multiple cron jobs, I need to know specifically which one is using the most resources.
Is there any way to check which of the Perl files is using the most resources?
Note the PID of the process top shows is using cpu. Then do a
ps -ef | grep perl
Match the PID to one listed and you'll see the full commandline of the perl process for the high cpu job.
Well, if you look at ps -ef you should see which script maps to that process id. You could also use strace -fTt -p <pid> to attach a debugger to a specific process id to see what it's doing.
Or you could modify the script to change $0 to something meaningful that tells you which script is which.
But it's hard to say without a bit more detail. Is there any chance the script is taking longer to run than you cron job takes to 'fire'? Because if you start backlogging a cron job you'll slowly get worse as more and more start piling up behind it.

SGE low priority single core jobs preventing multicore jobs

I'm running SGE (6.2u5p2) on our beowulf cluster. I've got a couple of users who submit 10s of thousands of short (<15minute) jobs that are a low priority (i.e. they've set the jobs low priority so anyone can jump ahead of them). This works really well for other users running single core jobs, however anyone wishing to run something with multiple threads aren't able to. The single core jobs keep skipping ahead, never allowing (say 6 cores) to be available.
I don't really want to separate the users into two queues (i.e. single and multicore) since those using the multicore jobs use it briefly and then there are multiple cores left unused.
Is there a way in SGE to allow multi core jobs to reserve slots?
Many thanks,
Rudiga
As "High Performance Mark" eludes to, using the -R option may help. See:
http://www.ace-net.ca/wiki/Scheduling_Policies_and_Mechanics#Reservation

How to run processes piped with bash on multiple cores?

I have a simple bash script that pipes output of one process to another. Namely:.
dostuff | filterstuff
It happens that on my Linux system (openSUSE if it matters, kernel 2.6.27) these both processes run on a single core. However, running different processes on different cores is a default policy that doesn't happen to trigger in this case.
What component of the system is responsible for that and what should I do to utilize multicore feature?
Note that there's no such problem on 2.6.30 kernel.
Clarification: Having followed Dennis Williamson's advice, I made sure with top program, that piped processes are indeed always run on the same processor. Linux scheduler, which usually does a really good job, this time doesn't do it.
I figure that something in bash prevents OS from doing it. The thing is that I need a portable solution for both multi-core and single-core machines. The taskset solution proposed by Dennis Williamson won't work on single-core machines. Currently I'm using:,
dostuff | taskset -c 0 filterstuff
but this seems like a dirty hack. Could anyone provide a better solution?
Suppose dostuff is running on one CPU. It writes data into a pipe, and that data will be in cache on that CPU. Because filterstuff is reading from that pipe, the scheduler decides to run it on the same CPU, so that its input data is already in cache.
If your kernel is built with CONFIG_SCHED_DEBUG=y,
# echo NO_SYNC_WAKEUPS > /sys/kernel/debug/sched_features
should disable this class of heuristics. (See /usr/src/linux/kernel/sched_features.h and /proc/sys/kernel/sched_* for other scheduler tunables.)
If that helps, and the problem still happens with a newer kernel, and it's really faster to run on separate CPUs than one CPU, please report the problem to the Linux Kernel Mailing List so that they can adjust their heuristics.
Give this a try to set the CPU (processor) affinity:
taskset -c 0 dostuff | taskset -c 1 filterstuff
Edit:
Try this experiment:
create a file called proctest and chmod +x proctest with this as the contents:
#!/bin/bash
while true
do
ps
sleep 2
done
start this running:
./proctest | grep bash
in another terminal, start top - make sure it's sorting by %CPU
let it settle for several seconds, then quit
issue the command ps u
start top -p with a list of the PIDs of the highest several processes, say 8 of them, from the list left on-screen by the exited top plus the ones for proctest and grep which were listed by ps - all separated by commas, like so (the order doesn't matter):
top -p 1234, 1255, 1211, 1212, 1270, 1275, 1261, 1250, 16521, 16522
add the processor field - press f then j then Space
set the sort to PID - press Shift+F then a then Space
optional: press Shift+H to turn on thread view
optional: press d and type .09 and press Enter to set a short delay time
now watch as processes move from processor to processor, you should see proctest and grep bounce around, sometimes on the same processor, sometimes on different ones
The Linux scheduler is designed to give maximum throughput, not do what you imagine is best. If you're running processes which are connected with a pipe, in all likelihood, one of them is blocking the other, then they swap over. Running them on separate cores would achieve little or nothing, so it doesn't.
If you have two tasks which are both genuinely ready to run on the CPU, I'd expect to see them scheduled on different cores (at some point).
My guess is, what happens is that dostuff runs until the pipe buffer becomes full, at which point it can't run any more, so the "filterstuff" process runs, but it runs for such a short time that dostuff doesn't get rescheduled until filterstuff has finished filtering the entire pipe buffer, at which point dostuff then gets scheduled again.

Resources