qsub: What is the standard when to get occasional updates on a submitted job? - qsub

I have just begun using an HPC, and I'm having trouble adjusting my workflow.
I submit a job using qsub myjob.sh. Then I can view the status of the job by typing qstat -u myusername. This gives me some details about my job, such as how long it has been running for.
My job is a python script that occasionally prints out an update to indicate how things are going in the program. I know that this will instead be found in outputfile once the job is done, but how can I go about monitoring this program as it runs? One way it to print the output to a file instead of printing to screen, but this seems like a bit of a hack.
Any other tips on imporving this process would be great.

Related

Do submitted jobs take a copy the source? Queued jobs?

When submitting jobs with sbatch, is a copy of my executable taken to the compute node? Or does it just execute the file from /home/user/? Sometimes when I am unorganised I will submit a job, then change the source and re-compile to submit another job. This does not seem like a good idea, especially if the job is still in the queue. At the same time it seems like it should be allowed, and it would be much safer if at the moment of calling sbatch a copy of the source was made.
I ran some tests which confirmed (unsurprisingly) that once a job is running, recompiling the source code has no effect. But when the job is in the queue, I am not sure. It is difficult to test.
edit: man sbatch does not seem to give much insight, other than to say that the job is submitted to the Slurm controller "immediately".
The sbatch command creates a copy of the submission script and a snapshot of the environment and saves it in the directory listed as the StateSaveLocation configuration parameter. It can therefore be changed after submission without effect.
But that is not the case for the files used in the submission script. If your submission script starts an executable, if will see the "version" of the executable at the time it starts.
Modifying the program before it starts will lead to the new version being run, modifying it during the run (i.e. while it has already been read from disk and saved into memory) will lead to the old version being run.

Running Cron Job Even When Not Required

I have a Cron Job scheduled to execute a command a few times a day. There are cases where the cron job isn't needed but will automatically run. If that happens the following error message shows:
PM2 [ERROR] Script already launched, add -f option to force re execution
Note: The Cron Job runs PM2 in reference to a script.
Is there any negative effect to having the cron job run even if the script is already running?
Please provided detailed information or references. Not just your opinion please.
Avoid erroneous error messages by writing a wrapper script that is run from cron instead. Inside the wrapper script, only run your job if it is not already running by querying the process table.
Assuming ksh, here's a snippet (I'm a tad rusty so syntax may need to be tweaked):
# Running will be non-zero if no match found
running=$(ps|grep MY_PROGRAM)
if [[ "$running" -gt 0 ]]; then
# run your program
else
# log its already running
fi
Not sure what detailed information or references there could be for a situation like this. It's not like someone commissions a study to look at this.
Assuming your command is intelligent enough to only allow one execution at a time (which appears to be the case judging by the error message you posted) then the only ill effect is a few CPU clock cycles (I think).

Automatic Background Perl Execution on Ubuntu

I've been troubleshooting this issue for about a week and I am nowhere, so I wanted to reach out for some help.
I have a perl script that I execute via command like, usually in a manner of
nohup ./script.pl --param arg --param2 arg2 &
I usually have about ten of these running at once to process the same type of data from different sources (that is specified through parameters). The script works fine and I can see logs for everything in nohup.out and monitor status via ps output. This script also uses a sql database to track status of various tasks, so I can track finishes of certain sources.
However, that was too much work, so I wrote a wrapper script to execute the script automatically and that is where I am running into problems. I want something exactly the same as I have, but automatic.
The getwork.pl script runs ps and parses output to find out how many other processes are running, if it is below the configured thresh it will query the database for the most out of date source and kick off the script.
The problem is that the kicked off jobs aren't running properly, sometimes they terminate without any error messages and sometimes they just hang and sit idle until I kill them.
The getwork script queries sql and gets the entire execution command via sql concatanation, so in the sql query I am doing something like CONCAT('nohup ./script.pl --arg ',param1,' --arg2 ',param2,' &') to get the command string.
I've tried everything to get these kicked off, I've tried using system (), but again, some jobs kick off, some don't, sometimes it gets stuck, sometimes jobs start and then die within a minute. If I take the exact command I used to start the job and run it in bash, it works fine.
I've tried to also open a pipe to the command like
open my $ca, "| $command" or die ($!);
print $ca $command;
close $ca;
That works just about as well as everything else I've tried. The getwork script used to be executed through cron every 30 minutes, but I scrapped that because I needed another shell wrapper script, so now there is an infinite look in the get work script that executes a function every 30 minutes.
I've also tried many variations of the execution command, including redirecting output to different files, etc... nothing seems to be consistent. Any help would be much appreciated, because I am truly stuck here....
EDIT:
Also, I've tried to add separate logging within each script, it would start a new log file with it's PID ($$). There was a bunch of weirdness there too, all log files would get created, but then some of the processes would be running and writing to the file, others would just have an empty text file and some would just have one or two log entries. Sometimes the process would still be running and just not doing anything, other times it would die with nothing in the log. Me, running the command in shell directly always works though.
Thanks in advance
You need a kind of job managing framework.
One of the bigest one is Gearman: http://www.slideshare.net/andy.sh/gearman-and-perl

Force qsub (PBS) to wait the job's end before exiting

I've been using Sun Grid Engine to run my jobs on a node of a cluster.
Usually I would wait for the job to complete before exiting and I use:
qsub -sync yes perl Script.pl
However now I don't use anymore Sun Grid Engine but PBS Pro 10.4
I'm not able to find a corresponding instruction to -sync.
Could someone help me?
Thanks in advance
PBSPro doesn't have a -sync equivalent but you might be able to use the
-I option combined with the use of expect to tell it what code to run in order to get the same effect.
The equivalent of -sync for PBS is -Wblock=true.
This prevents qsub from exiting until the job has completed. It is perhaps unusual to need this, but I found it useful when using some software that was not designed for HPC. The software executes multiple instances of a worker program, which run simultaneously. However, it then has to wait for one (or sometimes more) of the instances to complete, and do some work with the results, before spawning the next. If the worker program completes without writing a particular file, it is assumed to have failed. I was able to write a wrapper script for the worker program, to qsub it, and used the -Wblock=true option to make it wait for the worker program job to complete.

How can I know that a cron job has started or finished?

I have put a long running python program in a cron job on a server, so that I can turn off my computer without interrupting the job.
Now I would like to know if the job is correctly started, if it has finished, if there are reasons to stop at a certain point, and so on. How can I do that ?
You could have it write to a logfile, but as it sounds like this isn't possible, you could probably have cron email you the output of the job, try adding MAILTO=you#example.com to your crontab. You should also find evidence of cron activity in your system logfiles (try grep cron /var/log/* to find likely logs on your system).
If you are using cron simply as a way to run processes after you disconnect from a server, consider using screen:
type screen and press return
set your script running
type Ctrl+A Ctrl+D to detatch from the screen
The process continues running even if you log off. Later on simply
screen -r
And you will be will reattached, allowing you to review the script's output
Why not get that cron job to have a log file. Also just do a ps before shutdown.

Resources