Why is the status of PBS job with dependency 'Hold', not 'Queue'? - pbs

I used following command to submit my dependent job.
qsub current_job_file -W depend=afterany:previous_job_id
Then I find out my current job is under status 'H'. And it won't automatically run after the previous job finished. Is it how it suppose to be or I made a mistake somewhere? How can I let it run automatically after the previous job finish?
I also tried the following command. The result is the same.
qsub -W depend=afterany:previous_job_id current_job_file

That is how it is supposed to be. If your current job is dependent on another job and the dependency is "after" then it will be held until the other job finishes (or starts depending on what kind of dependency you have used. In your case it is "any" so it will wait for the other job to finish) and then move your current job to "Q" (queued) state for PBS scheduler to consider the job for running.

Related

Do submitted jobs take a copy the source? Queued jobs?

When submitting jobs with sbatch, is a copy of my executable taken to the compute node? Or does it just execute the file from /home/user/? Sometimes when I am unorganised I will submit a job, then change the source and re-compile to submit another job. This does not seem like a good idea, especially if the job is still in the queue. At the same time it seems like it should be allowed, and it would be much safer if at the moment of calling sbatch a copy of the source was made.
I ran some tests which confirmed (unsurprisingly) that once a job is running, recompiling the source code has no effect. But when the job is in the queue, I am not sure. It is difficult to test.
edit: man sbatch does not seem to give much insight, other than to say that the job is submitted to the Slurm controller "immediately".
The sbatch command creates a copy of the submission script and a snapshot of the environment and saves it in the directory listed as the StateSaveLocation configuration parameter. It can therefore be changed after submission without effect.
But that is not the case for the files used in the submission script. If your submission script starts an executable, if will see the "version" of the executable at the time it starts.
Modifying the program before it starts will lead to the new version being run, modifying it during the run (i.e. while it has already been read from disk and saved into memory) will lead to the old version being run.

Automatic qsub job completion status notification

I have a shell script that calls five other scripts from it. The first script creates 50 qsub jobs in the cluster. Individual job execution time varies from a couple of minutes to an hour. I need to know when all the 50 jobs get finished because after completing all the jobs I need to run the second script. How to find whether all the qsub jobs are completed or not? One possible solution can be using an infinite loop and check job status by using qstate command with job ID. In this case, I need to check the job status continuously. It is not an excellent solution. Is it possible that after execution, qsub job will notify me by itself. Hence, I don't need to monitor frequently job status.
qsub is capable of handling job dependencies, using -W depend=afterok:jobid.
e.g.
#!/bin/bash
# commands to run on the cluster
COMMANDS="script1.sh script2.sh script3.sh"
# intiliaze JOBID variable
JOBIDS=""
# queue all commands
for CMD in $COMMANDS; do
# queue command and store the job id
JOBIDS="$JOBIDS:`qsub $CMD`"
done
# queue post processing, depended on the submitted jobs
qsub -W depend=afterok:$JOBIDS postprocessing.sh
exit 0
More examples can be found here http://beige.ucs.indiana.edu/I590/node45.html
I never heard about how to do that, and I would be really interested if someone came with a good answer.
In the meanwhile, I suggest that you use file tricks. Either your script outputs a file at the end, or you check for the existence of the log files (assuming they are created only at the end).
while [ ! -e ~/logs/myscript.log-1 ]; do
sleep 30;
done

PBS/Torque - Couldn't delete completed job status information

The command 'qstat -a' outputs lots of lines of information for completed jobs all with status 'C'. It seems that they will stay forever. How to cleanup these unneeded job information since those jobs are already 'completed'? Thanks!
This is controlled by the qmgr parameter keep_completed. keep_completed specifies a number of seconds after completion a job should continue to be visible. If you would like to immediately delete a job without waiting this amount of time, you can execute
qdel -p <jobid>
Type qstat -r to get only the running jobs

Force qsub (PBS) to wait the job's end before exiting

I've been using Sun Grid Engine to run my jobs on a node of a cluster.
Usually I would wait for the job to complete before exiting and I use:
qsub -sync yes perl Script.pl
However now I don't use anymore Sun Grid Engine but PBS Pro 10.4
I'm not able to find a corresponding instruction to -sync.
Could someone help me?
Thanks in advance
PBSPro doesn't have a -sync equivalent but you might be able to use the
-I option combined with the use of expect to tell it what code to run in order to get the same effect.
The equivalent of -sync for PBS is -Wblock=true.
This prevents qsub from exiting until the job has completed. It is perhaps unusual to need this, but I found it useful when using some software that was not designed for HPC. The software executes multiple instances of a worker program, which run simultaneously. However, it then has to wait for one (or sometimes more) of the instances to complete, and do some work with the results, before spawning the next. If the worker program completes without writing a particular file, it is assumed to have failed. I was able to write a wrapper script for the worker program, to qsub it, and used the -Wblock=true option to make it wait for the worker program job to complete.

How to handle overtime crons

Suppose if i have cron tasks running every minute. And if each time, that task takes more than one minute to run, what will happen. Will the next cron wait for the first cron or will it run without any checks.
I want to run a cron task every minute and I don't over lapping cron tasks like that in case of a long running task/situation.
please help.
It depends on what you run. If it's your own script, you can implement a locking/lock checking mechanism to avoid running duplicates.
But that's not cron's job.
Yes, cron will go ahead and start your 1+ minute-running process every minute until something crashes.
You'll want to put a lock of some sort into your job if you can to basically do this at start-up:
if not get_lock()
print "Another process is running"
exit
This, of course, assumes that you own the code running. If you're running a command that you didn't code, then I'd recommend building a shell wrapper that implements the above pseudocoded logic where get_lock() will see if another process like this one is running.
As others have mentioned, CRON will run your script every minute regardless of whether another instance of your script is still running.
If you want to avoid this and don't fancy implementing your own locking mechanism then you could try using a CRON alternative called The Fat Controller which is a daemon that will continually re-run scripts. You can optionally specify an interval between runs and also optionally specify a maximum execution time so if a script goes AWOL then it can be killed.
There's some use cases and more information on the website:
http://fat-controller.sourceforge.net/

Resources