Modifying an already submitted slurm batch script - slurm

A job was submitted to slurm queue using a job script. Now, if i want to update the location of executable mentioned in one of the lines of the script, is it possible?
scontrol update job ..... is not having option to update the lines in the scripts

The job scripts are located at the path configured as StateSaveLocation in the Slurm configuration but it might not be a good idea to modify it ; you might most probably break things. If you are not root on the cluster, you cannot do that anyways.
What you can do, is create a symbolic link at the place where your script expects the executable, targeting the actual executable you want to run.

Related

Do submitted jobs take a copy the source? Queued jobs?

When submitting jobs with sbatch, is a copy of my executable taken to the compute node? Or does it just execute the file from /home/user/? Sometimes when I am unorganised I will submit a job, then change the source and re-compile to submit another job. This does not seem like a good idea, especially if the job is still in the queue. At the same time it seems like it should be allowed, and it would be much safer if at the moment of calling sbatch a copy of the source was made.
I ran some tests which confirmed (unsurprisingly) that once a job is running, recompiling the source code has no effect. But when the job is in the queue, I am not sure. It is difficult to test.
edit: man sbatch does not seem to give much insight, other than to say that the job is submitted to the Slurm controller "immediately".
The sbatch command creates a copy of the submission script and a snapshot of the environment and saves it in the directory listed as the StateSaveLocation configuration parameter. It can therefore be changed after submission without effect.
But that is not the case for the files used in the submission script. If your submission script starts an executable, if will see the "version" of the executable at the time it starts.
Modifying the program before it starts will lead to the new version being run, modifying it during the run (i.e. while it has already been read from disk and saved into memory) will lead to the old version being run.

SGE: run a interacitve session (qrsh) within a batch submission (qsub)

I am trying to run a simple code within SGE. However, I get very different results from same code when running from an interactive session (qrsh) or via qsub. Mostly many codes fail to run from qsub (without any warning or error).
Is there anyway to set up an interactive session within a batch submission (running qrsh within qsub)?
qsub test.sh
-V
-cwd
source MYPATH
qrsh
MYCODEHERE
`
Not sure if what you ask is possible. I can think of two ways why you are observing different results.
1) Environment differences: Between cluster nodes
2) Incomplete outputs: Maybe the code runs into an edge cases (not enough memory etc.) and exits silently.
Not exactly what you asked for but just trying to help.
You could submit a parallel job and then use qrsh -inherit <hostname> <command> to run a command under qrsh. Unfortunately grid engine limits the number of times you can call qrsh -inherit to either the number of slots in the job or one less (dependent on the job_is_first_task setting of the PE.
However it is likely that the problems are caused by a different environment between the qrsh environment and that provided by qsub by default. If you are selecting the shell to interpret your job script in the traditional unix way (Putting #!/bin/bash or similar as the first line of your job script you could try adding a -l to that line to make it a login shell #!/bin/bash -l which is likely more similar to what you get with qrsh.

How to schedule a job to run only once in unix

I am building a unix package in which there is a script, according to the client requirements, script should run only once (not on daily basis). I am not sure how will it work?
I mean, do i need to schedule it ? if yes , then I could not find any cron command for it.
If no, then how will script get execute when package is being installed?
Cron is used for repetitive executions of the same command or script. If you look for a way to schedule an only once script run i would suggest you to use "at" command
Schedule script to run in 10minutes from now:
at now +10 minutes
sh <path_to_my_script>
Then ctrl+d to save
Instead of minutes you can use hours|days|weeks and so on. For reference https://linux.die.net/man/1/at
List queued jobs:
atq
To remove a given scheduled job:
atrm <job_id>
Note: please ensure you have previleges to run the the scheduled script or run it with sudo.
There are two general ways to schedule a task in Unix/Linux:
cron is useful for repeatable tasks.
at is useful for single tasks.
So, in your case, I'd use at for schedule your task.
If it is only once at installing package, you do not need cron at all, just make the user execute it. Lets say you instruct the user to execute something like ./install, and include the script and that's it.
But as someone previously said, first define what is the meaning of once.
Regards.

Will conflicts arise between larger and smaller files in CRON job?

I have set a cron job for a file for every 6 hours. The file may run for 4hours.
If i set cron for another file , will it affect the previous one which may run for 4hours?
No. If the job is not working on same resources, it wont conflict even if it's running simultaneously.
The cron daemon doesn't check to see if anything else by the same name is running, if that is what you mean, so cron will not care. However, if your script creates temporary files, for example, without using helper-tools like "mktemp" they could conflict with each other - so that will depend how well written your script is.

Running a script within cron job

we have a hourly.sh script that contains abc.py script.
1. when i run the abc.py script independently it runs fine.
2. when i run an empty hoursly.sh (without abc.py script inside) it runs fine too.
But when hourly.sh is ran with abc.py inside, it hits memory related issues ("16214 Segmentation fault (core dumped)"). Just to provide an additional data point, there is no other script running at the same time as this script which can put more burden on the system.
What could cause a script to fail when triggered via cron?
There's always the possiblity that the differences in runtime environment cause problems. Take a look at the process parameters (Number of files etc.) which you can select using the "ulimit" command.
Maybe take a look at quotas for the user the cronjob is run, maybe the PATH environment.

Resources