Recover after "kill 0" - linux

I have a script that invokes kill 0. I want to invoke that script from another script, and have the outer script continue to execute. (kill 0 sends a signal, defaulting to SIGTERM, to every process in the process group of the calling process; see man 2 kill.)
kill0.sh:
#!/bin/sh
kill 0
caller.sh:
#!/bin/sh
echo BEFORE
./kill0.sh
echo AFTER
The current behavior is:
$ ./caller.sh
BEFORE
Terminated
$
How can I modify caller.sh so it prints AFTER after invoking kill0.sh?
Modifying kill0.sh is not an option. Assume that kill0.sh might read from stdin and write to stdout and/or stderr before invoking kill 0, and I don't want to interfere with that. I still want the kill 0 command to kill the kill0.sh process itself; I just don't want it to kill the caller as well.
I'm using Ubuntu 16.10 x86_64, and /bin/sh is a symlink to dash. That shouldn't matter, and I prefer answers that don't depend on that.
This is of course a simplified version of a larger set of scripts, so I'm at some risk of having an XY problem, but I think that a solution to the problem as stated here should let me solve the actual problem. (I have a wrapper script that invokes a specified command, capturing and displaying its output, with some other bells and whistles.)

One solution
You need to trap the signal in the parent, but enable it in the child. So a script like run-kill0.sh could be:
#!/bin/sh
echo BEFORE
trap '' TERM
(trap 15; exec ./kill0.sh)
echo AFTER
The first trap disables the TERM signal. The second trap in the sub-shell re-enables the signal (using the signal number instead of the name — see below) before running the kill0.sh script. Using exec is a minor optimization — you can omit it and it will work the same.
Digression on obscure syntactic details
Why 15 instead of TERM in the sub-shell? Because when I tested it with TERM instead of 15, I got:
$ sh -x run-kill0.sh
+ echo BEFORE
BEFORE
+ trap '' TERM
+ trap TERM
trap: usage: trap [-lp] [arg signal_spec ...]
+ echo AFTER
AFTER
$
When I used 15 in place of TERM (twice), I got:
$ sh -x run-kill0.sh
+ echo BEFORE
BEFORE
+ trap '' 15
+ trap 15
+ exec ./kill0.sh
Terminated: 15
+ echo AFTER
AFTER
$
Using TERM in place of the first 15 would also work.
Bash documentation on trap
Studying the Bash manual for trap shows:
trap [-lp] [arg] [sigspec …]
The commands in arg are to be read and executed when the shell receives signal sigspec. If arg is absent (and there is a single sigspec) or equal to ‘-’, each specified signal’s disposition is reset to the value it had when the shell was started.
A second solution
The second sentence is the key: trap - TERM should (and empirically does) work.
#!/bin/sh
echo BEFORE
trap '' TERM
(trap - TERM; exec ./kill0.sh)
echo AFTER
Running that yields:
$ sh -x run-kill0.sh
+ echo BEFORE
BEFORE
+ trap '' TERM
+ trap - TERM
+ exec ./kill0.sh
Terminated: 15
+ echo AFTER
AFTER
$
I've just re-remembered why I use numbers and not names (but my excuse is that the shell — it wasn't Bash in those days — didn't recognize signal names when I learned it).
POSIX documentation for trap
However, in Bash's defense, the POSIX spec for trap says:
If the first operand is an unsigned decimal integer, the shell shall treat all operands as conditions, and shall reset each condition to the default value. Otherwise, if there are operands, the first is treated as an action and the remaining as conditions.
If action is '-', the shell shall reset each condition to the default value. If action is null ( "" ), the shell shall ignore each specified condition if it arises.
This is clearer than the Bash documentation, IMO. It states why trap 15 works. There's also a minor glitch in the presentation. The synopsis says (on one line):
trap n [condition...]trap [action condition...]
It should say (on two lines):
trapn[condition...]
trap [action condition...]

Related

Arithmetic expression in redirection

What is the difference between these two:
cnt=1
head -n $((++cnt)) /etc/passwd >/dev/null
echo $cnt # prints 2
and
cnt=1
date >$((++cnt)).txt # creates file "2.txt"
echo $cnt # prints 1
My question is why in the second example 1 is printed.
Note:
cnt=1
(cnt=5)
echo $cnt # prints 1
I know why this will print 1. Is the redirection executed in a subshell too? If yes, where is that described?
I don't have a concrete citation for why this behavior exists, but going off the notes in SC2257* there are some interesting points to note in the manual.
When a simple command other than a builtin or shell function is to be executed, it is invoked in a separate execution environment
§3.7.3 Command Execution Environment
This reflects what SC2257 notes, though it's unclear about which environment the redirection's value is evaluated in. However §3.1.1 Shell Operation seems to say that redirection happens before this execution (sub)environment is invoked:
Basically, the shell does the following:
...
Performs the various shell expansions....
Performs any necessary redirections and removes the redirection operators and their operands from the argument list.
Executes the command.
We can see that this isn't limited to arithmetic expansions but also other state-changing expansions like :=:
$ bash -c 'date >"${word:=wow}.txt"; echo "word=${word}"'
word=
$ bash -c 'echo >"${word:=wow}.txt"; echo "word=${word}"'
word=wow
Interestingly, this does not appear to be a (well-defined) subshell environment, because BASH_SUBSHELL remains set to 0:
$ date >"${word:=$BASH_SUBSHELL}.txt"; ls
0.txt
We can also check some other shells, and see that zsh has the same behavior, though dash does not:
$ zsh -c 'date >"${word:=wow}.txt"; echo "word=${word}"'
word=
$ zsh -c 'echo >"${word:=wow}.txt"; echo "word=${word}"'
word=wow
$ dash -c 'date >"${word:=wow}.txt"; echo "word=${word}"'
word=wow
$ dash -c 'echo >"${word:=wow}.txt"; echo "word=${word}"'
word=wow
I skimmed the zsh guide but didn't find an exact mention of this behavior there either.
Needless to say, this does not appear to be well-documented behavior, so it's fortunate that ShellCheck can help catch it. It does however appear to be long-standing behavior, it's reproducible in Bash 3, 4, and 5.
* Unfortunately the commit that added SC2257 doesn't link to an Issue or any other further context.
Shellcheck's advice is sound; sometimes redirections are performed in subshells. However, the crux of this behavior is when expansions occur:
bind_int_variable variables.c:3410 cnt = 2, late binding
expr_bind_variable expr.c:336
exp0 expr.c:1040
exp1 expr.c:1007
exppower expr.c:962
expmuldiv expr.c:887
exp3 expr.c:861
expshift expr.c:837
exp4 expr.c:807
exp5 expr.c:785
expband expr.c:767
expbxor expr.c:748
expbor expr.c:729
expland expr.c:702
explor expr.c:674
expcond expr.c:627
expassign expr.c:512
expcomma expr.c:492
subexpr expr.c:474
evalexp expr.c:439
param_expand subst.c:9498 parameter expansion, including arith subst
expand_word_internal subst.c:9990
shell_expand_word_list subst.c:11335
expand_word_list_internal subst.c:11459
expand_words_no_vars subst.c:10988
redirection_expand redir.c:287 expansions post-fork()
do_redirection_internal redir.c:844
do_redirections redir.c:230 redirections are done in child process
execute_disk_command execute_cmd.c:5418 fork to run date(1)
execute_simple_command execute_cmd.c:4547
execute_command_internal execute_cmd.c:842
execute_command execute_cmd.c:394
reader_loop eval.c:175
main shell.c:805
When execute_disk_command() is called, it forks and then executes date(1). After the fork() and before the execve(), redirections and additional expansions are done (via do_redirections()). Variables expanded and bound post-fork will not reflect in the parent shell.
From BASH's perspective, however, this is just a simple command rather than a subshell command. This is an implicit subshell.
See execute_disk_command() in execute_cmd.c
Execute a simple command that is hopefully defined in a disk file
somewhere.
1) fork ()
2) connect pipes
3) look up the command
4) do redirections
5) execve ()
6) If the execve failed, see if the file has executable mode set.
If so, and it isn't a directory, then execute its contents as
a shell script.
(references taken from commit 9e49d343e3cd7e20dad1b86ebfb764e8027596a7 [browse tree])
try this
let cnt=1
echo `date` > "$((++cnt))".txt
echo $cnt
(editing my answer)
#alaniwi pointed out that strace clearly shows a clone. And I agree with P.P that it does not indicate a subshell being there and having scoured the documentation I did not find a direct reference for it.
EDIT:command substitution, process substitutions forks a subshell. I do want to quote the following from https://www.gnu.org/software/bash/manual/bash.html#Command-Execution-Environment
Command substitution, commands grouped with parentheses, and asynchronous commands are invoked in a subshell environment ...
This may hold some more answers :
https://www.gnu.org/software/bash/manual/bash.html (search for command substitution)
https://www.tldp.org/LDP/abs/html/subshells.html
https://tldp.org/LDP/abs/html/process-sub.html

How can I "delay" in Zsh with a float-point number?

In Zsh there is a wait (for a process or job) command, a while (Seconds == Delay) command, and a sched (Do later if shell still running) command, but no "delay" command. If there were, I fear it would be limited to whole second delays. I need a "delay" statement that can essentially cause the procedure/task to do almost nothing for the time specified in a fixed point number or until an certain clock time.
Most scripts would use "sleep", but I would like to have the delay timer run without having to open the IO; I am seeking the ideal that nearly anything can be accomplished from within Zsh.
Does anyone know how to perhaps make a function (or maybe builtin/module) perform a floating point idle delay in seconds?
I'll argue that you are making the wrong assumption. zsh is a shell, and therefore its purpose is to be a shell. One important point in being a shell is to be a POSIX compatible shell. Moreover since zsh is fully backward compatible with bash, which in turn is fully backward compatible with the bourne shell that should be a POISX shell.
That means that zsh must have access to sleep since sleep is required for a POSIX shell.
And that is as far as we go with the POSIX compatibility argument. Now for a practical use argument. Most systems will use GNU coreutils sleep to implement sleep which allow for floating point arguments. Therefore the following is POSIX portable:
if ! sleep 0.03; then
sleep 1
fi
And should work as a fine grained delay in most cases, whilst still not break in the rare cases the OS does not use GNU coreutils. As far as I am aware these rare cases are just AIX and HP-UX systems.
It seems that as long as the I/O is confined to built-ins, the I/O doesn't create a noticeable lag and doesn't depend on anything outside of Zsh. With helpful input from grochmal and a number of experiments I have come up with a simple looped file descriptor for the read built-in command with the : (null) built-in command:
: $(read -u 1 -t 10)
The standard out of the read command is connected to Zsh for expansion as an argument to : (null), so it is guaranteed to receive no input. Knowing that it will never receive input, we have the read command listen to the standard out with -u 1. The timeout option of Zsh read accepts floating point numbers; It should be consistent on any system that runs Zsh. Finally, even if shell option ERREXIT is on the read timeout-failure status should not be a problem, because read is actually running in a sub-shell, destined to end anyway, and : always returns true. If ERRRETURN option is on, I don't know that behavior yet, but the fix would be to add ||: to the end of the read command.
Now it is possible to create a function or alias-to-anonymous-function that interprets any manner of argument or input to reliably create a delay in a floating point number of seconds:
# function sleep { -- optional switch-out for the system command
# after POSIX & GNU compatibility verified.
function delay {
emulate -LR zsh -o extendedglob -o nullglob
local Delay=1.
if [[ $1 == (#b)([[:digit:]](#c1,).(#c0,1)[[:digit:]](#c0,))(s|m|h|d|w|) ]]
then
if [[ $match[2] == (s|) ]] Delay=$match[1]
if [[ $match[2] == (m) ]] Delay=$[ $match[1] * 60. ** 1 ]
if [[ $match[2] == (h) ]] Delay=$[ $match[1] * 60. ** 2 ]
if [[ $match[2] == (d) ]] Delay=$[ ($match[1] * 60. ** 2) * 24 ]
if [[ $match[2] == (w) ]] Delay=$[ (($match[1] * 60. ** 2) * 24) * 7 ]
: $(read -u 1 -t $Delay)
else
print -u 2 "Invalid delay time: $1"
return 1
fi
}

Bash and Dash inconsistently check command substitution error codes with `errexit`

I seem to have encountered a very, very strange inconsistency in the way both dash and bash check for error conditions with the errexit option.
Using both dash and bash without the set -e/set -o errexit option, the following program:
foo()
{
echo pre
bar=$(fail)
echo post
}
foo
will print the following (with slightly different error strings for dash):
pre
./foo.sh: line 4: fail: command not found
post
With the errexit option, it will print the following:
pre
./foo.sh: line 4: fail: command not found
Surprisingly, however, if bar is local, the program will always echo both pre and post. More specifically, using both dash and bash with our without the errexit option, the following program:
foo()
{
echo pre
local bar=$(fail)
echo post
}
foo
will print the following:
pre
./foo.sh: line 4: fail: command not found
post
In other words, it seems that the return value of a command substitution that is assigned to a local variable is not checked by errexit, but it is if the variable is global.
I would have been inclined to think this was simply a corner case bug, if it didn't happen with both shells. Since dash is specifically designed to be POSIX conformant I wonder if this behavior is actually specified by the POSIX standard, though I have a hard time imagining how that would make sense.
dash(1) has this to say about errexit:
If not interactive, exit immediately if any untested command fails. The exit status of a command is considered to be explicitly tested if the command is used to control an if, elif, while, or until; or if the command is the left hand operand of an “&&” or “||” operator.
bash(1) is somewhat more verbose, but I have a hard time making sense of it:
Exit immediately if a pipeline (which may consist of a single simple command), a list, or a compound command (see SHELL GRAMMAR above), exits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test following the if or elif reserved words, part of any command executed in a && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command's return value is being inverted with !. If a compound command other than a subshell returns a non-zero status because a command failed while -e was being ignored, the shell does not exit. A trap on ERR, if set, is executed before the shell exits. This option applies to the shell environment and each subshell environment separately (see COMMAND EXECUTION ENVIRONMENT above), and may cause subshells to exit before executing all the commands in the subshell.
If a compound command or shell function executes in a context where -e is being ignored, none of the commands executed within the compound command or function body will be affected by the -e setting, even if -e is set and a command returns a failure status. If a compound command or shell function sets -e while executing in a context where -e is ignored, that setting will not have any effect until the compound command or the command containing the function call completes.
TL;DR The exit status of local "hides" the exit status of any command substitutions appearing in one of its arguments.
The exit status of a variable assignment is poorly documented (or at least, I couldn't find any specifics in a quick skim of the various man pages and the POSIX spec). As far as I can tell, the exit status is taken as the exit status of the last command substitution that occurs in the value of the assignment, or 0 if there are no command substitutions. Non-final command substitutions appear to be included in the list of "tested" situations, as an assignment like
foo=$(false)$(true)
does not exit with errexit set.
local, however, is a command itself whose exit status is normally 0, independent of any command substitutions that occur in its arguments. That is, while
foo=$(false)
has an exit status of 1,
local foo=$(false)
will have an exit status of 0, with any command substitutions in an argument seemingly considered "tested" for the purposes of errexit.
Try this:
#!/bin/bash
set -e
foo()
{
echo pre
local bar
bar=$(fail)
echo post
}
foo
exit
!! OR !!
#!/bin/bash
foo()
{
set -e
echo pre
local bar
bar=$(fail)
echo post
}
foo
exit
OUTPUT:
$ ./errexit_function
pre
./errexit_function: line 8: fail: command not found
$ echo $?
127
As far as I can tell this is a work around for a bug in bash, but try this,
#!/bin/bash
set -e
foo()
{
echo true || return_value=$?
echo the command returned a value of ${return_value:-0}
$(fail) || return_value=$?
echo the command returned a value of ${return_value:-0}
echo post
}
foo
exit

Exit code of variable assignment to command substitution in Bash

I am confused about what error code the command will return when executing a variable assignment plainly and with command substitution:
a=$(false); echo $?
It outputs 1, which let me think that variable assignment doesn't sweep or produce new error code upon the last one. But when I tried this:
false; a=""; echo $?
It outputs 0, obviously this is what a="" returns and it override 1 returned by false.
I want to know why this happens, is there any particularity in variable assignment that differs from other normal commands? Or just be cause a=$(false) is considered to be a single command and only command substitution part make sense?
-- UPDATE --
Thanks everyone, from the answers and comments I got the point "When you assign a variable using command substitution, the exit status is the status of the command." (by #Barmar), this explanation is excellently clear and easy to understand, but speak doesn't precise enough for programmers, I want to see the reference of this point from authorities such as TLDP or GNU man page, please help me find it out, thanks again!
Upon executing a command as $(command) allows the output of the command to replace itself.
When you say:
a=$(false) # false fails; the output of false is stored in the variable a
the output produced by the command false is stored in the variable a. Moreover, the exit code is the same as produced by the command. help false would tell:
false: false
Return an unsuccessful result.
Exit Status:
Always fails.
On the other hand, saying:
$ false # Exit code: 1
$ a="" # Exit code: 0
$ echo $? # Prints 0
causes the exit code for the assignment to a to be returned which is 0.
EDIT:
Quoting from the manual:
If one of the expansions contained a command substitution, the exit
status of the command is the exit status of the last command
substitution performed.
Quoting from BASHFAQ/002:
How can I store the return value and/or output of a command in a
variable?
...
output=$(command)
status=$?
The assignment to output has no effect on command's exit status, which
is still in $?.
Note that this isn't the case when combined with local, as in local variable="$(command)". That form will exit successfully even if command failed.
Take this Bash script for example:
#!/bin/bash
function funWithLocalAndAssignmentTogether() {
local output="$(echo "Doing some stuff.";exit 1)"
local exitCode=$?
echo "output: $output"
echo "exitCode: $exitCode"
}
function funWithLocalAndAssignmentSeparate() {
local output
output="$(echo "Doing some stuff.";exit 1)"
local exitCode=$?
echo "output: $output"
echo "exitCode: $exitCode"
}
funWithLocalAndAssignmentTogether
funWithLocalAndAssignmentSeparate
Here is the output of this:
nick.parry#nparry-laptop1:~$ ./tmp.sh
output: Doing some stuff.
exitCode: 0
output: Doing some stuff.
exitCode: 1
This is because local is actually a builtin command, and a command like local variable="$(command)" calls local after substituting the output of command. So you get the exit status from local.
I came across the same problem yesterday (Aug 29 2018).
In addition to local mentioned in Nick P.'s answer and #sevko's comment in the accepted answer, declare in global scope also has the same behavior.
Here's my Bash code:
#!/bin/bash
func1()
{
ls file_not_existed
local local_ret1=$?
echo "local_ret1=$local_ret1"
local local_var2=$(ls file_not_existed)
local local_ret2=$?
echo "local_ret2=$local_ret2"
local local_var3
local_var3=$(ls file_not_existed)
local local_ret3=$?
echo "local_ret3=$local_ret3"
}
func1
ls file_not_existed
global_ret1=$?
echo "global_ret1=$global_ret1"
declare global_var2=$(ls file_not_existed)
global_ret2=$?
echo "global_ret2=$global_ret2"
declare global_var3
global_var3=$(ls file_not_existed)
global_ret3=$?
echo "global_ret3=$global_ret3"
The output:
$ ./declare_local_command_substitution.sh 2>/dev/null
local_ret1=2
local_ret2=0
local_ret3=2
global_ret1=2
global_ret2=0
global_ret3=2
Note the values of local_ret2 and global_ret2 in the output above. The exit codes are overwritten by local and declare.
My Bash version:
$ echo $BASH_VERSION
4.4.19(1)-release
(not an answer to original question but too long for comment)
Note that export A=$(false); echo $? outputs 0! Apparently the rules quoted in devnull's answer no longer apply. To add a bit of context to that quote (emphasis mine):
3.7.1 Simple Command Expansion
...
If there is a command name left after expansion, execution proceeds as described below. Otherwise, the command exits. If one of the expansions contained a command substitution, the exit status of the command is the exit status of the last command substitution performed. If there were no command substitutions, the command exits with a status of zero.
3.7.2 Command Search and Execution [ — this is the "below" case]
IIUC the manual describes var=foo as special case of var=foo command... syntax (pretty confusing!). The "exit status of the last command substitution" rule only applies to the no-command case.
While it's tempting to think of export var=foo as a "modified assignment syntax", it isn't — export is a builtin command (that just happens to take assignment-like args).
=> If you want to export a var AND capture command substitution status, do it in 2 stages:
A=$(false)
# ... check $?
export A
This way also works in set -e mode — exits immediately if the command substitution return non-0.
As others have said, the exit code of the command substitution is the exit code of the substituted command, so
FOO=$(false)
echo $?
---
1
However, unexpectedly, adding export to the beginning of that produces a different result:
export FOO=$(false)
echo $?
---
0
This is because, while the substituted command false fails, the export command succeeds, and that is the exit code returned by the statement.

Unable to run BASH script in current environment multiple times

I have a bash script that I use to move from source to bin directories from anywhere I currently am (I call this script, 'teleport'). Since it basically is just a glorified 'cd' command, I have to run it in the current shell (i.e. . ./teleport.sh ). I've set up an alias in my .bashrc file so that 'teleport' matches '. teleport.sh'.
The first time I run it, it works fine. But then, if I run it again after it has run once, it doesn't do anything. It works again if I close my terminal and then open a new one, but only the first time. My intuition is that there is something internally going on with BASH that I'm not familiar with, so I thought I would run it through the gurus here to see if I can get an answer.
The script is:
numargs=$#
function printUsage
{
echo -e "Usage: $0 [-o | -s] <PROJECT>\n"
echo -e "\tMagically teleports you into the main source directory of a project.\n"
echo -e "\t PROJECT: The current project you wish to teleport into."
echo -e "\t -o: Teleport into the objdir.\n"
echo -e "\t -s: Teleport into the source dir.\n"
}
if [ $numargs -lt 2 ]
then
printUsage
fi
function teleportToObj
{
OBJDIR=${HOME}/Source/${PROJECT}/obj
cd ${OBJDIR}
}
function teleportToSrc
{
cd ${HOME}/Source/${PROJECT}/src
}
while getopts "o:s:" opt
do
case $opt in
o)
PROJECT=$OPTARG
teleportToObj
;;
s)
PROJECT=$OPTARG
teleportToSrc
;;
esac
done
My usage of it is something like:
sjohnson#corellia:~$ cd /usr/local/src
sjohnson#corellia:/usr/local/src$ . ./teleport -s some-proj
sjohnson#corellia:~/Source/some-proj/src$ teleport -o some-proj
sjohnson#corellia:~/Source/some-proj/src$
<... START NEW TERMINAL ...>
sjohnson#corellia:~$ . ./teleport -o some-proj
sjohnson#corellia:~/Source/some-proj/obj$
The problem is that getopts necessarily keeps a little bit of state so that it can be called in a loop, and you're not clearing that state. Each time it's called, it processes one more argument, and it increments the shell's OPTIND variable so it'll know which argument to process the next time it's called. When it's done with all the arguments, it returns 1 (false) every time it's invoked, which makes the while exit.
The first time you source your script, it works as expected. The second (and third, fourth...) time, getopts does nothing but return false.
Add one line to reset the state before you start looping:
unset OPTIND # clear state so getopts will start over
while getopts "o:s:" opt
do
# ...
done
(I assume there's a typo in your transcript, since it shows you invoking the script -- not sourcing it -- on the second try, but that's not the real problem here.)
The problem is that the first time you call is you are sourcing the script (thats what ". ./teleport") does which runs the script in the current shell thus preserving the cd. The second time you call it, it isn't sourced so you create a subshell, cd to the appropriate directory, and then exit the subshell putting you right back where you called the script from!
The way to make this work is simply to make teleportToSrc and teleportToObj aliases or functions in the current shell (i.e. outside a script)

Resources