Bash Command Substitution Giving Weird Inconsistent Output - linux

For some reasons not relevant to this question, I am running a Java server in a bash script not directly but via command substitution under a separate sub-shell, and in the background. The intent is for the subcommand to return the process id of the Java server as its standard output. The fragement in question is as follows:
launch_daemon()
{
/bin/bash <<EOF
$JAVA_HOME/bin/java $JAVA_OPTS -jar $JAR_FILE daemon $PWD/config/cl.yml <&- &
pid=\$!
echo \${pid} > $PID_FILE
echo \${pid}
EOF
}
daemon_pid=$(launch_daemon)
echo ${daemon_pid} > check.out
The Java daemon in question prints to standard error and quits if there is a problem in initialization, otherwise it closes standard out and standard err and continues on its way. Later in the script (not shown) I do a check to make sure the server process is running. Now on to the problem.
Whenever I check the $PID_FILE above, it contains the correct process id on one line.
But when I check the file check.out, it sometimes contains the correct id, other times it contains the process id repeated twice on the same line separated by a space charcater as in:
34056 34056
I am using the variable $daemon_pid in the script above later on in the script to check if the server is running, so if it contains the pid repeated twice this totally throws off the test and it incorrectly thinks the server is not running. Fiddling with the script on my server box running CentOS Linux by putting in more echo statements etc. seems to flip the behavior back to the correct one of $daemon_pid containing the process id just once, but if I think that has fixed it and check in this script to my source code repo and do a build and deploy again, I start seeing the same bad behavior.
For now I have fixed this by assuming that $daemon_pid could be bad and passing it through awk as follows:
mypid=$(echo ${daemon_pid} | awk '{ gsub(" +.*",""); print $0 }')
Then $mypid always contains the correct process id and things are fine, but needless to say I'd like to understand why it behaves the way it does. And before you ask, I have looked and looked but the Java server in question does NOT print its process id to its standard out before closing standard out.
Would really appreciate expert input.

Following the hint by #WilliamPursell, I tracked this down in the bash source code. I honestly don't know whether it is a bug or not; all I can say is that it seems like an unfortunate interaction with a questionable use case.
TL;DR: You can fix the problem by removing <&- from the script.
Closing stdin is at best questionable, not just for the reason mentioned by #JonathanLeffler ("Programs are entitled to have a standard input that's open.") but more importantly because stdin is being used by the bash process itself and closing it in the background causes a race condition.
In order to see what's going on, consider the following rather odd script, which might be called Duff's Bash Device, except that I'm not sure that even Duff would approve: (also, as presented, it's not that useful. But someone somewhere has used it in some hack. Or, if not, they will now that they see it.)
/bin/bash <<EOF
if (($1<8)); then head -n-$1 > /dev/null; fi
echo eight
echo seven
echo six
echo five
echo four
echo three
echo two
echo one
EOF
For this to work, bash and head both have to be prepared to share stdin, including sharing the file position. That means that bash needs to make sure that it flushes its read buffer (or not buffer), and head needs to make sure that it seeks back to the end of the part of the input which it uses.
(The hack only works because bash handles here-documents by copying them into a temporary file. If it used a pipe, it wouldn't be possible for head to seek backwards.)
Now, what would have happened if head had run in the background? The answer is, "just about anything is possible", because bash and head are racing to read from the same file descriptor. Running head in the background would be a really bad idea, even worse than the original hack which is at least predictable.
Now, let's go back to the actual program at hand, simplified to its essentials:
/bin/bash <<EOF
cmd <&- &
echo \$!
EOF
Line 2 of this program (cmd <&- &) forks off a separate process (to run in the background). In that process, it closes stdin and then invokes cmd.
Meanwhile, the foreground process continues reading commands from stdin (its stdin fd hasn't been closed, so that's fine), which causes it to execute the echo command.
Now here's the rub: bash knows that it needs to share stdin, so it can't just close stdin. It needs to make sure that stdin's file position is pointing to the right place, even though it may have actually read ahead a buffer's worth of input. So just before it closes stdin, it seeks backwards to the end of the current command line. [1]
If that seek happens before the foreground bash executes echo, then there is no problem. And if it happens after the foreground bash is done with the here-document, also no problem. But what if it happens while the echo is working? In that case, after the echo is done, bash will reread the echo command because stdin has been rewound, and the echo will be executed again.
And that's precisely what is happening in the OP. Sometimes, the background seek completes at just the wrong time, and causes echo \${pid} to be executed twice. In fact, it also causes echo \${pid} > $PID_FILE to execute twice, but that line is idempotent; had it been echo \${pid} >> $PID_FILE, the double execution would have been visible.
So the solution is simple: remove <&- from the server start-up line, and optionally replace it with </dev/null if you want to make sure the server can't read from stdin.
Notes:
Note 1: For those more familiar with bash source code and its expected behaviour than I am, I believe that the seek and close takes place at the end of case r_close_this: in function do_redirection_internal in redir.c, at approximately line 1093:
check_bash_input (redirector);
close_buffered_fd (redirector);
The first call does the lseek and the second one does the close. I saw the behaviour using strace -f and then searched the code for a plausible looking lseek, but I didn't go to the trouble of verifying in a debugger.

Related

continuous bash script - show custom error when process killed

I am trying to write a script that keeps checking if process "XXX" gets killed and if it does, show error message with text from /proc/XXX_ID/fd/1 and start again.
I am trying to make it work on custom thinclient distro where is no more packages than needed. Linux 4.6.3TS_SMP i686
I am new in scripting in Bash and I can't seem to get it working. I were googling and trying different things last two days and I moved nowhere. What am I doing wrong?
#!/bin/bash
while [ true ] ; do
process_ID="$(pgrep XXX)"
tail -f /proc/${process_ID}/fd/1 > /bin/test2.txt
Everything works till now. Now I need to somehow check if test2.txt is empty or not and if its not, use text from it as error message and start checking again. I tried something like this
if [ -s /bin/test2.txt ]
then
err="$(cat /bin/test2.txt)"
notify-send "${err}"
else
fi
done
How about this:
output_filepath=$(readlink /proc/$pid/fd/1)
while ps $pid > /dev/null
do
sleep 1
done
tail "$output_filepath"
The whole idea only works if the stdout (fd 1) of the process is redirected into a file which can be read by someone else. Such a redirect will result in /proc/<pid>/fd/1 being a symlink. We read and memorize the target of that symlink in the first line.
Then we wait until the ps of the given PID fails which typically means the process has terminated. Then we tail the file at the memorized path.
This approach has several weaknesses. The process (or its caller) could remove, modify, or rename the output file at termination time, or reading it can somehow be impossible (permissions, etc.). The output could be redirected to something not being a file (e.g. a pipe, a socket, etc.), then tailing it won't work.
ps not failing does not necessarily mean that the same process is still there. It's next to impossible that a PID is reused that quickly, but not completely impossible.

Bash completion sometimes meshes up my terminal when the completion function reads a file

So I've been having a problem with some cli programs. Sometimes when I kill the running process with Ctrl+C, it leaves the terminal in a weird state (e.g. echo is turned off). Now that is to be expected for many cases, as killing a process does not give it a chance to restore the terminal's state. But I've discovered that for many other cases, bash completion is the culprit. As an example, try the following:
Start a new bash session as follows: bash --norc to ensure that no completions are loaded.
Define a completion function: _completion_test() { grep -q foo /dev/null; return 1; }.
Define a completion that uses the above function: complete -F _completion_test rlwrap.
Type exactly the following: r l w r a p Space c a t Tab Enter (i.e. rlwrap cat followed by a Tab and then by an Enter).
Wait for a second and then kill the process with Ctrl+C.
The echo of the terminal should have not been turned off. So if you type any character, it will not be echoed by the terminal.
What is really weird is that if I remove the seemingly harmless grep -q foo /dev/null from the completion function, everything works correctly. In fact, adding a grep -q foo /dev/null (or even something even simpler such as cat /dev/null) to any completion function that was installed in my system, causes the same issue. I have also reproduced the problem with programs that don't use readline and without Ctrl+C (e.g. find /varTab| head, with the above completion defined for find).
Why does this happen?
Edit: Just to clarify, the above is a contrived example. In reality, what I am trying to do, is more like this:
_completion_test() {
if grep -q "$1" /some/file; then
#do something
else
#do something else
fi
}
For a more concrete example, try the following:
_completion_test() {
if grep -q foo /dev/null; then
COMPREPLY=(cats)
else
return 1
fi
}
But the mere fact that I am calling grep, causes the problem. I don't see why I can't call grep in this case.
Well, the answer to this is very simple; it's a bug:
This happens when a programmable completion function calls an external command during the execution of a completion function. Bash saves the tty state after every successful job completes, so it can restore it if a job is killed by a signal and leaves the terminal in an undesired state. In this case, we need to suppress that if the job that completes is run during programmable completion, since the terminal settings at that time are as readline sets them for line editing. This fix will be in the release version of bash-4.4.
You're just doing a wrong implementation of the completion function. See the manual
-F function
The shell function function is executed in the current shell environment. When it is executed, $1 is the name of the command
whose arguments are being completed, $2 is the word being completed,
and $3 is the word preceding the word being completed, as described
above (see Programmable Completion). When it finishes, the possible
completions are retrieved from the value of the COMPREPLY array
variable.
for example the following implemenation:
_completion_test() { COMPREPLY=($(cat /dev/null)); return 1; }
doesn't break the terminal.
Regarding your original question why your completion function breaks terminal, I've played a little with strace and I saw that there are ioctl calls with -echo argument. I assume that when you are terminating it with Ctrl+C the ioctl with echo argument just isn't being called in order to restore the original state. Typing stty echo will bring the echo back.

crash-stopping bash pipeline [duplicate]

This question already has answers here:
How do you catch error codes in a shell pipe?
(5 answers)
Closed 7 years ago.
I have a pipeline, say a|b where if a runs into a problem, I want to stop the whole pipeline.
'a exiting with exit=1 doesn't do this as often 'b doesn't care about return codes.
e.g.
echo 1|grep 0|echo $? <-- this shows that grep did exit=1
but
echo 1|grep 0 | wc <--- wc is unfazed by grep's exit here
If I ran the pipeline as a subprocess of an owning process, any of the pipeline processes could kill the owning process. But this seems a bit clumsy -- but it would zap the whole pipeline.
Not possible with basic shell constructs, probably not possible in shell at all.
Your first example doesn't do what you think. echo doesn't use standard input, so putting it on the right side of a pipe is never a good idea. The $? that you're echoing is not the exit value of the grep 0. All commands in a pipeline run simultaneously. echo has already been started, with the existing value of $?, before the other commands in the pipeline have finished. It echoes the exit value of whatever you did before the pipeline.
# The first command is to set things up so that $? is 2 when the
# second command is parsed.
$ sh -c 'exit 2'
$ echo 1|grep 0|echo $?
2
Your second example is a little more interesting. It's correct to say that wc is unfazed by grep's exit status. All commands in the pipeline are children of the shell, so their exit statuses are reported to the shell. The wc process doesn't know anything about the grep process. The only communication between them is the data stream written to the pipe by grep and read from the pipe by wc.
There are ways to find all the exit statuses after the fact (the linked question in the comment by shx2 has examples) but a basic rule that you can't avoid is that the shell will always wait for all the commands to finish.
Early exits in a pipeline sometimes do have a cascade effect. If a command on the right side of a pipe exits without reading all the data from the pipe, the command on the left of that pipe will get a SIGPIPE signal the next time it tries to write, which by default terminates the process. (The 2 phrases to pay close attention to there are "the next time it tries to write" and "by default". If a the writing process spends a long time doing other things between writes to the pipe, it won't die immediately. If it handles the SIGPIPE, it won't die at all.)
In the other direction, when a command on the left side of a pipe exits, the command on the right side of that pipe gets EOF, which does cause the exit to happen fairly soon when it's a simple command like wc that doesn't do much processing after reading its input.
With direct use of pipe(), fork(), and wait3(), it would be possible to construct a pipeline, notice when one child exits badly, and kill the rest of them immediately. This requires a language more sophisticated than the shell.
I tried to come up with a way to do it in shell with a series of named pipes, but I don't see it. You can run all the processes as separate jobs and get their PIDs with $!, but the wait builtin isn't flexible enough to say "wait for any child in this set to exit, and tell me which one it was and what the exit status was".
If you're willing to mess with ps and/or /proc you can find out which processes have exited (they'll be zombies), but you can't distinguish successful exit from any other kind.
Write
set -e
set -o pipefail
at the beginning of your file.
-e will exit on an error and -o pipefail will produce an errorcode on each stage of you "pipeline"

Concurrency with shell scripts in failure-prone environments

Good morning all,
I am trying to implement concurrency in a very specific environment, and keep getting stuck. Maybe you can help me.
this is the situation:
-I have N nodes that can read/write in a shared folder.
-I want to execute an application in one of them. this can be anything, like a shell script, an installed code, or whatever.
-To do so, I have to send the same command to all of them. The first one should start the execution, and the rest should see that somebody else is running the desired application and exit.
-The execution of the application can be killed at any time. This is important because does not allow relying on any cleaning step after the execution.
-if the application gets killed, the user may want to execute it again. He would then send the very same command.
My current approach is to create a shell script that wraps the command to be executed. This could also be implemented in C. Not python or other languages, to avoid library dependencies.
#!/bin/sh
# (folder structure simplified for legibility)
mutex(){
lockdir=".lock"
firstTask=1 #false
if mkdir "$lockdir" &> /dev/null
then
controlFile="controlFile"
#if this is the first node, start coordinator
if [ ! -f $controlFile ]; then
firstTask=0 #true
#tell the rest of nodes that I am in control
echo "some info" > $controlFile
fi
# remove control File when script finishes
trap 'rm $controlFile' EXIT
fi
return $firstTask
}
#The basic idea is that a task executes the desire command, stated as arguments to this script. The rest do nothing
if ! mutex ;
then
exit 0
fi
#I am the first node and the only one reaching this, so I execute whatever
$#
If there are no failures, this wrapper works great. The problem is that, if the script is killed before the execution, the trap is not executed and the control file is not removed. Then, when we execute the wrapper again to restart the task, it won't work as every node will think that somebody else is running the application.
A possible solution would be to remove the control script just before the "$#" call, but that it would lead to some race condition.
Any suggestion or idea?
Thanks for your help.
edit: edited with correct solution as future reference
Your trap syntax looks wrong: According to POSIX, it should be:
trap [action condition ...]
e.g.:
trap 'rm $controlFile' HUP INT TERM
trap 'rm $controlFile' 1 2 15
Note that $controlFile will not be expanded until the trap is executed if you use single quotes.

Edit shell script while it's running

Can you edit a shell script while it's running and have the changes affect the running script?
I'm curious about the specific case of a csh script I have that batch runs a bunch of different build flavors and runs all night. If something occurs to me mid operation, I'd like to go in and add additional commands, or comment out un-executed ones.
If not possible, is there any shell or batch-mechanism that would allow me to do this?
Of course I've tried it, but it will be hours before I see if it worked or not, and I'm curious about what's happening or not happening behind the scenes.
It does affect, at least bash in my environment, but in very unpleasant way. See these codes. First a.sh:
#!/bin/sh
echo "First echo"
read y
echo "$y"
echo "That's all."
b.sh:
#!/bin/sh
echo "First echo"
read y
echo "Inserted"
echo "$y"
# echo "That's all."
Do
$ cp a.sh run.sh
$ ./run.sh
$ # open another terminal
$ cp b.sh run.sh # while 'read' is in effect
$ # Then type "hello."
In my case, the output is always:
hello
hello
That's all.
That's all.
(Of course it's far better to automate it, but the above example is readable.)
[edit] This is unpredictable, thus dangerous. The best workaround is , as described here put all in a brace, and before the closing brace, put "exit". Read the linked answer well to avoid pitfalls.
[added] The exact behavior depends on one extra newline, and perhaps also on your Unix flavor, filesystem, etc. If you simply want to see some influences, simply add "echo foo/bar" to b.sh before and/or after the "read" line.
Try this... create a file called bash-is-odd.sh:
#!/bin/bash
echo "echo yes i do odd things" >> bash-is-odd.sh
That demonstrates that bash is, indeed, interpreting the script "as you go". Indeed, editing a long-running script has unpredictable results, inserting random characters etc. Why? Because bash reads from the last byte position, so editing shifts the location of the current character being read.
Bash is, in a word, very, very unsafe because of this "feature". svn and rsync when used with bash scripts are particularly troubling, because by default they "merge" the results... editing in place. rsync has a mode that fixes this. svn and git do not.
I present a solution. Create a file called /bin/bashx:
#!/bin/bash
source "$1"
Now use #!/bin/bashx on your scripts and always run them with bashx instead of bash. This fixes the issue - you can safely rsync your scripts.
Alternative (in-line) solution proposed/tested by #AF7:
{
# your script
exit $?
}
Curly braces protect against edits, and exit protects against appends. Of course, we'd all be much better off if bash came with an option, like -w (whole file), or something that did this.
Break your script into functions, and each time a function is called you source it from a separate file. Then you could edit the files at any time and your running script will pick up the changes next time it gets sourced.
foo() {
source foo.sh
}
foo
Good question!
Hope this simple script helps
#!/bin/sh
echo "Waiting..."
echo "echo \"Success! Edits to a .sh while it executes do affect the executing script! I added this line to myself during execution\" " >> ${0}
sleep 5
echo "When I was run, this was the last line"
It does seem under linux that changes made to an executing .sh are enacted by the executing script, if you can type fast enough!
An interesting side note - if you are running a Python script it does not change. (This is probably blatantly obvious to anyone who understands how shell runs Python scripts, but thought it might be a useful reminder for someone looking for this functionality.)
I created:
#!/usr/bin/env python3
import time
print('Starts')
time.sleep(10)
print('Finishes unchanged')
Then in another shell, while this is sleeping, edit the last line. When this completes it displays the unaltered line, presumably because it is running a .pyc? Same happens on Ubuntu and macOS.
I don't have csh installed, but
#!/bin/sh
echo Waiting...
sleep 60
echo Change didn't happen
Run that, quickly edit the last line to read
echo Change happened
Output is
Waiting...
/home/dave/tmp/change.sh: 4: Syntax error: Unterminated quoted string
Hrmph.
I guess edits to the shell scripts don't take effect until they're rerun.
If this is all in a single script, then no it will not work. However, if you set it up as a driver script calling sub-scripts, then you might be able to change a sub-script before it's called, or before it's called again if you're looping, and in that case I believe those changes would be reflected in the execution.
I'm hearing no... but what about with some indirection:
BatchRunner.sh
Command1.sh
Command2.sh
Command1.sh
runSomething
Command2.sh
runSomethingElse
Then you should be able to edit the contents of each command file before BatchRunner gets to it right?
OR
A cleaner version would have BatchRunner look to a single file where it would consecutively run one line at a time. Then you should be able to edit this second file while the first is running right?
Use Zsh instead for your scripting.
AFAICT, Zsh does not exhibit this frustrating behavior.
usually, it uncommon to edit your script while its running. All you have to do is to put in control check for your operations. Use if/else statements to check for conditions. If something fail, then do this, else do that. That's the way to go.
Scripts don't work that way; the executing copy is independent from the source file that you are editing. Next time the script is run, it will be based on the most recently saved version of the source file.
It might be wise to break out this script into multiple files, and run them individually. This will reduce the execution time to failure. (ie, split the batch into one build flavor scripts, running each one individually to see which one is causing the trouble).

Resources