How can I read the PBS launch script of a job that is running? - pbs

I am using /torque/4.2.5 to schedule my jobs and I need to find a way to make a copy of my PBS launching script that I used for jobs that are currently queueing or running. The plan is to make a copy of that launch script in the output folder.

TORQUE has a job logging feature that can be configured to record job scripts used at launch time.
EDIT: if you have administrator privileges and want to read the file that is stored you can inspect TORQUE_HOME/server_priv/jobid.SC
TORQUE_HOME is usually /var/spool/torque but is configurable.

Related

Run simple shell commands using Spring boot

I am new to Spring Boot.
Scenario: I want to run a scheduled batch job in Spring boot which runs every 5 minutes and executes certain commands on the Linux server. The output of the commands will be stored in a normal csv file on the server itself for further processing.
I would request you to please help as I am stuck here. Thanks!

Recover Slurm job submission script from an old job?

I accidentally removed a job submission script for a Slurm job in terminal using rm command. As far as I know there are no (relatively easy) ways of recovering that file anymore, and I hadn't saved it anywhere. I have used that job submission script many many times before, so there are a lot of Slurm job submissions (all of them finished) that have used it. Is it possible to recover that job script from an old finished job somehow?
If Slurm is configured with the ElasticSearch plugin, then you will find the submission script for all completed jobs in the ElasticSearch instance used in the setup.
Another option is to install sarchive

launching parallel bsub job in clearcase environment

ClearCase does not work in conjunction with LSF distributed multi-host parallel job if more than 1 hosts are specified.
Reason: ClearCase does not mount the file system on all hosts when dispatching multi-host simulations to the LSF system
the job is terminated because included files are not found or cannot be output because the file system does not exist on all hosts.
The ClearCase + LSF implementation has to guarantee by construction that the job is dispatched correctly in 100% of all cases, which is currently not the case.
please help me on this issue.
The LSF/Clearcase integration uses the daemon.wrap program to set the view on the execution host and then launch the job inside the view. That wrapper doesn't support cross-host parallel jobs.
You'll have to try to work around the limitation in your job script. You can disable the daemon wrapper by making sure the $CLEARCASE_ROOT is not set in your job submission environment. Then in the job script, in the execution environment, and in each process that is participating in the parallel job the job script can call cleartool setview <options> <real job command>.
If you launch your job with blaunch then it might make things easier. Without blaunch, LSF will start a single process on the first execution host. With blaunch, LSF will launch one process per slot, and launch it on all of the allocated execution hosts. With blaunch, each process can then set the view and start the real job.
Good luck!

Node js and system cron jobs

I am using node-cron to schedule some tasks inside my node app. This package has some API to create, start and stop cron jobs. However, I can't seem to find these cron jobs when I run crontab -l command in my OS terminal. I tried both on my mac os as well as on centos.
Specific question:
Does such node packages create cron jobs at the OS level?
If answer to 1 is yes, then will these cron jobs execute irregardless my node app is running or not?
If answer to 2 is yes, then how do I stop and clear out all such schedules cron jobs?
Giving a fast look at the node-cron source code, you can check that,
node-cron does not create any cron at the OS Level.
It looks like just a long time out functionality..
I suppose that if the node process will be restarted you lost the launched cronjobs.

PBS automatically restart failed jobs

I use PBS job arrays to submit a number of jobs. Sometimes a small number of jobs get screwed up and not been ran successfully. Is there a way to automatically detect the failed jobs and restart them?
pbs_server supports automatic_requeue_exit_code:
an exit code, defined by the admin, that tells pbs_server to requeue the job instead of considering it as completed. This allows the user to add some additional checks that the job can run meaningfully, and if not, then the job script exits with the specified code to be requeued.
There is also a provision for requeuing jobs in the case where the prologue fails (see the prologue/epilogue script documentation).
There are probably more sophisticated ways of doing this, but they would fall outside the realm of built-in Torque options.

Resources