Is the `after_script` always executed, even for cancelled jobs? - gitlab

The documentation isn't clear on whether the after_script is executed for cancelled jobs:
after_script is used to define the command that will be run after for all jobs, including failed ones.
I'm doing potentially critical cleanup in the after_script and while cancelled jobs should be rare, I'd like to know that my clean up is guaranteed to happen.

No, I ran some tests and here are the behaviours I exhibited:
after_script:
- echo "This is not executed when a job is cancelled."
- echo "A failing command, like this one, doesn't fail the job." && false
- echo "This is not executed because the previous command failed."
1. after_script is not executed when a job is cancelled
There's an open issue for this on gitlab.com, so if this is affecting you, head over there and make some noise.
2. If a command in the after_script fails, the rest aren't executed
This is quite easy to work around:
after_script:
- potentially failing command || true
- next command
Replace potentially failing command with your command and the next command will execute regardless of whether potentially failing command passed or failed.
One could argue that this behaviour is actually desired, as it gives some flexibility to the user, but it might be counterintuitive to some.

Related

Gitlab: Fail job in "after_script"?

Consider this .gitlab-ci.yml:
variables:
var1: "bob"
var2: "bib"
job1:
script:
- "[[ ${var1} == ${var2} ]]"
job2:
script:
- echo "hello"
after_script:
- "[[ ${var1} == ${var2} ]]"
In this example, job1 fails as expected but job2 succeeds, incomprehensibly. Can I force a job to fail in the after_script section?
Note: exit 1 has the same effect as "[[ ${var1} == ${var2} ]]".
The status of a job is determined solely by its script:/before_script: sections (the two are simply concatenated together to form the job script).
after_script: is a completely different construct -- it is not part of the job script. It is mainly for taking actions after a job is completed. after_script: runs even when jobs fail beforehand, for example.
Per the docs: (emphasis added on the last bullet)
Scripts you specify in after_script execute in a new shell, separate from any before_script or script commands. As a result, they:
Have the current working directory set back to the default (according to the variables which define how the runner processes Git requests).
Don’t have access to changes done by commands defined in the before_script or script, including:
Command aliases and variables exported in script scripts.
Changes outside of the working tree (depending on the runner executor), like software installed by a before_script or script script.
Have a separate timeout, which is hard-coded to 5 minutes.
Don’t affect the job’s exit code. If the script section succeeds and the after_script times out or fails, the job exits with code 0 (Job Succeeded).

How would you check if SLURM or MOAB/Torque is available on an environment?

The title kind of says it all. I'm looking for a command line test to check if either SLURM, or MOAB/Torque is available for submitting jobs too.
My thought is to check if the command qstat finishes with exit code, or if squeue finished with exit code zero. Would this be the best way of doing this?
One of the most lightweight way to do that is to test for the presence of sbatch for instance, with
which sbatch
The which command exits with 0 exit code if the command is found in the PATH.
Make sure to test in order, as, for instance, a Slurm cluster could have a qsub command available to emulate PBS or Torque.

Concurrency with shell scripts in failure-prone environments

Good morning all,
I am trying to implement concurrency in a very specific environment, and keep getting stuck. Maybe you can help me.
this is the situation:
-I have N nodes that can read/write in a shared folder.
-I want to execute an application in one of them. this can be anything, like a shell script, an installed code, or whatever.
-To do so, I have to send the same command to all of them. The first one should start the execution, and the rest should see that somebody else is running the desired application and exit.
-The execution of the application can be killed at any time. This is important because does not allow relying on any cleaning step after the execution.
-if the application gets killed, the user may want to execute it again. He would then send the very same command.
My current approach is to create a shell script that wraps the command to be executed. This could also be implemented in C. Not python or other languages, to avoid library dependencies.
#!/bin/sh
# (folder structure simplified for legibility)
mutex(){
lockdir=".lock"
firstTask=1 #false
if mkdir "$lockdir" &> /dev/null
then
controlFile="controlFile"
#if this is the first node, start coordinator
if [ ! -f $controlFile ]; then
firstTask=0 #true
#tell the rest of nodes that I am in control
echo "some info" > $controlFile
fi
# remove control File when script finishes
trap 'rm $controlFile' EXIT
fi
return $firstTask
}
#The basic idea is that a task executes the desire command, stated as arguments to this script. The rest do nothing
if ! mutex ;
then
exit 0
fi
#I am the first node and the only one reaching this, so I execute whatever
$#
If there are no failures, this wrapper works great. The problem is that, if the script is killed before the execution, the trap is not executed and the control file is not removed. Then, when we execute the wrapper again to restart the task, it won't work as every node will think that somebody else is running the application.
A possible solution would be to remove the control script just before the "$#" call, but that it would lead to some race condition.
Any suggestion or idea?
Thanks for your help.
edit: edited with correct solution as future reference
Your trap syntax looks wrong: According to POSIX, it should be:
trap [action condition ...]
e.g.:
trap 'rm $controlFile' HUP INT TERM
trap 'rm $controlFile' 1 2 15
Note that $controlFile will not be expanded until the trap is executed if you use single quotes.

How to queue up a job

Is it possible to queue up a job that depends on a running job's output, so the new job waits until the running job terminates?
Hypothetical example: You should have run:
./configure && make
but you only ran:
./configure
and now you want to tell make to get on with it once configure (successfully) finishes, while you go do something useful like have a nap? The same scenario occurs with many other time-consuming jobs.
(The basic job control commands -- fg, bg, jobs, kill, &, ctrl-Z -- don't do this, as far as I know. The question arose on bash/Ubuntu, but I'd be interested in a general *nix solution, if it exists.)
I presume you're typing these commands at a shell prompt, and the ./configure command is still running.
./configure
# oops, forgot type type "make"
[ $? -eq 0 ] && make
The command [ $? -eq 0 ] will succeed if and only if the ./configure command succeeds.
(You could also use [ $? = 0 ], which does a string comparison rather than a numeric comparison.)
(If you're willing to assume that the ./configure command will succeed, you can just type make.)
Stealing and updating an idea from chepner's comment, another possibility is to suspend the job by typing Ctrl-Z, then put it in the background (bg), then:
wait %% && make
wait %% waits for "current" job, which is "the last job stopped while it was in the foreground or started in the background". This can be generalized to wait for any job by replacing %% by a different job specification.
You can simplify this to
wait && make
if you're sure you have no other background jobs (wait with no arguments waits for all background jobs to finish).
Referring to the previous process return code with $?:
test $? -eq 0 && make
I'm not sure to understand your needs, but I often use batch(1) (from package atd on Debian) to compile, in a here document like this:
batch << EOJ
make > _make.log 2>&1
EOJ
Of course it makes only sense if your configure did run successfully and completely
Then in some terminal I might follow the compilation with tail -f _make.log (provided I am in the good directory). You can get a coffee -or a lunch, or sleep a whole night- (and even logout) during the compilation.

How to handle error/exception in shell script?

Below is my script that I am executing in the bash. And it works fine.
fileexist=0
for i in $( ls /data/read-only/clv/daily/Finished-HADOOP_EXPORT_&processDate#.done); do
mv /data/read-only/clv/daily/Finished-HADOOP_EXPORT_&processDate#.done /data/read-only/clv/daily/archieve-wip/
fileexist=1
done
Problem Statement:-
In my above shell script which has to be run daily using cron job, I don't have any error/exception handling mechanism. Suppose if anything gets wrong then I don't know what's has happened?
As after the above script is executed, there are some other scripts that will be dependent on the data provided by above script, so I always get's complaint from the other people who are depending on my script data that something wrong has happened.
So is there any way I can get notified if anything wrong has happened in my script? Suppose if the cluster is having some maintenance and at that time I am running my script, so definitely it will be failing for sure, so can I be notified if my above scripts failed, so that I will be sure something wrong has happened.
Hope my question is clear enough.
Any thoughts will be appreciated.
You can check for the exit status of each command, as freetx answered, but this is manual error checking rather than exception handling. The standard way to get the equivalent of exception handling in sh is to start the script with set -e. That tells sh to exit with a non-zero status as soon as any executed command fails (i.e. exits with a non-zero exit status).
If it is intended for some command in such a script to (possibly) fail, you can use the construct COMMAND || true, which will force a zero exit status for that expression. For example:
#!/bin/sh
# if any of the following fails, the script fails
set -e
mkdir -p destdir/1/2
mv foo destdir/1/2
touch /done || true # allowed to fail
Another way to ensure that you are notified when things go wrong in a script invoked by cron is to adhere to the Unix convention of printing nothing unless an error ocurred. Successful runs will then pass without notice, and unsuccessful runs will cause the cron daemon to notify you of the error via email. Note that local mail delivery must be correctly configured on your system for this to work.
Its customary for every unix command line utility to return 0 upon success and non-zero on failure. Therefore you can use the $? pattern to display the last return value and handle things accordingly.
For instance:
> ls
> file1 file2
> echo $?
> 0
> ls file.no.exist
> echo $?
> 1
Therefore, you can use this as rudimentary error detection to see if something goes wrong. So the normal approach would be
some_command
if [ $? -gt 0 ]
then
handle_error here
fi
well if other scripts are on the same machine, then you could do a pgrep in other scripts for this script if found to sleep for a while and try other scripts later rechecking process is gone.
If script is on another machine or even local the other method is to produce a temp file on remote machine accessible via a running http browser that other scripts can check status i.e. running or complete
You could also either wrap script around another that looks for these errors and emails you if it finds it if not sends result as per normal to who ever
go=0;
function check_running() {
running=`pgrep -f your_script.sh|wc -l `
if [ $running -gt 1 ]; then
echo "already running $0 -- instances found $running ";
go=1;
}
check_running;
if [ $go -ge 1 ];then
execute your other script
else
sleep 120;
check_running;
fi

Resources