How do you submit a job on multiple queues with Torque? - qsub

Using qsub I can submit a job to multiple nodes on the same queue:
qsub -I -q normal -l nodes=2:ppn=16
However I have another queue named hyper, is it possible to submit a job across two different queues?
Something conceptually like this:
qsub -I -l queue=normal:nodes=2:ppn=16,queue=hyper:nodes=2:ppn=16

No, you cannot associate multiple queues with a job. It sounds like you want to request properties/features, like this:
qsub -I -l nodes=2:ppn=16+nodes=2:ppn=16:fastmem

Related

qsub slow on cluster with a single computer

Torque is installed in a single computer, which is used for both head node
and computation node. I didnt installed Maui for job schedule, but use the built-in function of torque.
I find qsub is slow when submitting many jobs, for example:
for i in `ls *tt.sh`
do
echo $i
qsub $i
done
it take a while to submit jobs in the end of the scripts list.
This happens even if the computer is in low load. The submission
is slow if there are merely 70 scripts in the list.
Are there some options I could tweak with torque, or I have to install Maui
for job scheduling?
Thanks!

Automatic qsub job completion status notification

I have a shell script that calls five other scripts from it. The first script creates 50 qsub jobs in the cluster. Individual job execution time varies from a couple of minutes to an hour. I need to know when all the 50 jobs get finished because after completing all the jobs I need to run the second script. How to find whether all the qsub jobs are completed or not? One possible solution can be using an infinite loop and check job status by using qstate command with job ID. In this case, I need to check the job status continuously. It is not an excellent solution. Is it possible that after execution, qsub job will notify me by itself. Hence, I don't need to monitor frequently job status.
qsub is capable of handling job dependencies, using -W depend=afterok:jobid.
e.g.
#!/bin/bash
# commands to run on the cluster
COMMANDS="script1.sh script2.sh script3.sh"
# intiliaze JOBID variable
JOBIDS=""
# queue all commands
for CMD in $COMMANDS; do
# queue command and store the job id
JOBIDS="$JOBIDS:`qsub $CMD`"
done
# queue post processing, depended on the submitted jobs
qsub -W depend=afterok:$JOBIDS postprocessing.sh
exit 0
More examples can be found here http://beige.ucs.indiana.edu/I590/node45.html
I never heard about how to do that, and I would be really interested if someone came with a good answer.
In the meanwhile, I suggest that you use file tricks. Either your script outputs a file at the end, or you check for the existence of the log files (assuming they are created only at the end).
while [ ! -e ~/logs/myscript.log-1 ]; do
sleep 30;
done

Run several jobs parallelly and Efficiently

OS: Cent-OS
I have some 30,000 jobs(or Scripts) to run. Each job takes 3-5 Min. I have 48 CPUs(nproc = 48). I can use 40 CPUs to run 40 Jobs parallelly. please suggest some script or tools can handle 30,000 Jobs by running each 40 Jobs parallely.
What I had done:
I created 40 Different folders and executed the jobs parallely by creating a shell script for each directory.
I want to know better ways to handle this kind of jobs next time.
As Mark Setchell says: GNU Parallel.
find scripts/ -type f | parallel
If you insists on keeping 8 CPUs free:
find scripts/ -type f | parallel -j-8
But usually it is more efficient simply to use nice as that will give you all 48 cores when no one else needs them:
find scripts/ -type f | nice -n 15 parallel
To learn more:
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line
with love you for it.
I have used REDIS to do this sort of thing - it is very simple to install and the CLI is easy to use.
I mainly used LPUSH to push all the jobs onto a "queue" in REDIS and BLPOP to do a blocking remove of a job from the queue. So you would LPUSH 30,000 jobs (or script names or parameters) at the start, then start 40 processes in the background (1 per CPU) and each process would sit in a loop doing BLPOP to get a job, run it and do the next.
You can add layers of sophistication to log completed jobs in another "queue".
Here is a little demonstration of what to do...
First, start a Redis server on any machine in your network:
./redis-server & # start REDIS server in background
Or, you could put this in your system startup if you use it always.
Now push 3 jobs onto queue called jobs:
./redis-cli # start REDIS command line interface
redis 127.0.0.1:6379> lpush jobs "job1"
(integer) 1
redis 127.0.0.1:6379> lpush jobs "job2"
(integer) 2
redis 127.0.0.1:6379> lpush jobs "job3"
(integer) 3
See how many jobs there are in queue:
redis 127.0.0.1:6379> llen jobs
(integer) 3
Wait with infinite timeout for job
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job1"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job2"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job3"
This last one will wait a LONG time as there are no jobs in queue:
redis 127.0.0.1:6379> brpop jobs 0
Of course, this is readily scriptable:
Put 30,000 jobs in queue:
for ((i=0;i<30000;i++)) ; do
echo "lpush jobs job$i" | redis-cli
done
If your Redis server is on a remote host, just use:
redis-cli -h <HOSTNAME>
Here's how to check progress:
echo "llen jobs" | redis-cli
(integer) 30000
Or, more simply maybe:
redis-cli llen jobs
(integer) 30000
And you could start 40 jobs like this:
#!/bin/bash
for ((i=0;i<40;i++)) ; do
./Keep1ProcessorBusy $i &
done
And then Keep1ProcessorBusy would be something like this:
#!/bin/bash
# Endless loop picking up jobs and processing them
while :
do
job=$(echo brpop jobs 0 | redis_cli)
# Set processor affinity here too if you want to force it, use $1 parameter we were called with
do $job
done
Of course, the actual script or job you want run could also be stored in Redis.
As a totally different option, you could look at GNU Parallel, which is here. And also remember that you can run the output of find through xargs with the -P option to parallelise stuff.
Just execute those scripts, Linux will internally distribute those tasks properly amongst available CPUs. This is upon the Linux task scheduler. But, if you want you can also execute a task on a particular CPU by using taskset (see man taskset). You can do it from a script to execute your 30K tasks. Remember in this manual way, be sure about what you are doing.

Better way of running a series of commands simultaneously in UNIX/Linux Shell?

I want to know the good practice of performing a series of commands simultaneously in UNIX/Linux. Suppose that I have a program, program_a, which requires one parameter. I have stored parameters line by line in a file. So I wrote:
while read line
do
./program_a line > $line.log 2>&1
done < parameter_file
The problem is that execution of program_a takes long time. Because each executions of program_a for each parameter is independent, So I think these executions can be run simultaneously. I don't know if it regards to multithreading or other technique. The following is my thought. Use & to run each executions on the background.
while read line
do
./program_a line $line.log 2>&1 &
done < parameter_file
Is there any better way of launching multiple tasks?
Did you know that xargs can launch tasks in parallel? Check out -P -n parameters!
An example:
xargs -P 4 -n 1 ./program_a < parameter_file
That will start up to 4 (P=4) program_a instances for processing each line (n=1). You'll probably have to wrap program_a within a shell script or something so that child processes stdout & stderr can be redirected appropriately.
How this is better than putting processes to backgroud: Suppose you have 1000 lines in the input file, obviously you wouldn't want 1000 processes to be launched. Xargs allows you to look at it as a queue, with P workers each consuming and processing n items from it.
With GNU Parallel you can get a logfile for each parameter and run one job per CPU core:
parallel --results logdir ./program_a :::: parameter_file
Watch the intro video for a quick introduction:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). You command line
with love you for it.

How to change the queue name of all the scheduled jobs in Linux?

I have an application where jobs are fired sequentially with a set priority. Now it is fired and I am in midway. Still there are lot of jobs which are pending and are queued up. Now I want to change the queue name of all the scheduled jobs to the queue which is not loaded much (I know which one to load). Doing bmod -q <queue_name> <job_id> is not possible as there are hundreds of jobs in line.
Can anyone suggest me a quick way to do this? I tried with something like this:
bjobs | cut -d' ' -f1 | sed '1d' | xargs bmod -q high
But this is not working. Any suggestions?

Resources