Explanation needed for tee, process substitution, redirect...and different behaviors in Bash and Z shell ('zsh')

Explanation needed for tee, process substitution, redirect...and different behaviors in Bash and Z shell ('zsh') - linux

Recently in my work, I am facing an interesting problem regarding tee and process substitution.
Let's start with examples:
I have three little scripts:
$ head *.sh
File one.sh
#!/bin/bash
echo "one starts"
if [ -p /dev/stdin ]; then
echo "$(cat /dev/stdin) from one"
else
echo "no stdin"
fi
File two.sh
#!/bin/bash
echo "two starts"
if [ -p /dev/stdin ]; then
echo "$(cat /dev/stdin) from two"
else
echo "no stdin"
fi
File three.sh
#!/bin/bash
echo "three starts"
if [ -p /dev/stdin ]; then
sed 's/^/stdin for three: /' /dev/stdin
else
echo "no stdin"
fi
All three scripts read from standard input and print something to standard output.
The one.sh and two.sh are quite similar, but the three.sh is a bit different. It just adds some prefix to show what it reads from the standard input.
Now I am going to execute two commands:
1: echo "hello" | tee >(./one.sh) >(./two.sh) | ./three.sh
2: echo "hello" | tee >(./one.sh) >(./two.sh) >(./three.sh) >/dev/null
First in Bash and then in Z shell (zsh).
Bash (GNU bash, version 5.0.17(1))
$ echo "hello" | tee >(./one.sh) >(./two.sh) |./three.sh
three starts
stdin for three: hello
stdin for three: one starts
stdin for three: two starts
stdin for three: hello from two
stdin for three: hello from one
Why are the outputs of one.sh and two.sh mixed with the origin "hello" and passed to three.sh? I expected to see the output of one and two in standard output and only the "hello" is going to pass to three.sh.
Now the other command:
$ echo "hello" | tee >(./one.sh) >(./two.sh) >(./three.sh) >/dev/null
one starts
two starts
three starts
stdin for three: hello
hello from two
hello from one
<---!!!note here I don't have prompt unless I press Enter or Ctrl-c)
I redirect all standard output to /dev/null. Why do I see all output from all process substitution this time? Does it seem this behavior conflict with the one above?
Why don't I have the prompt after having executed the command?
Why does the command start in order one->two->three, but outputs come in 3->2->1? Even if I added sleep 3 in three.sh, the output is always 3-2-1. I know it should have something to do with standard input blocking, but I'd learn the exact reason.
Zsh (zsh 5.8 (x86_64-pc-linux-gnu))
Both commands,
echo "hello" | tee >(./one.sh) >(./two.sh) >(./three.sh) >/dev/null
echo "hello" | tee >(./one.sh) >(./two.sh) |./three.sh
Give the expected result:
one starts
three starts
two starts
hello from two
hello from one
stdin for three: hello
It works as expected. But the order of the output is random, it seems that Z shell does something non-blocking here, and the order of the output is dependent on how long each script has been running. What exactly leads to the result?

echo "hello"|tee >(./one.sh) >(./two.sh) |./three.sh
There are two possible order of operations for the tee part of the pipeline
First
Redirect standard output to a pipe that's connected to ./three.sh's standard input.
Set up the pipes and subprocesses for the command substitutions. They inherit the same redirected standard output pipe used by tee.
Execute tee.
Second
Set up the pipes and subprocesses for the the command substitutions. They share the same default standard output - to the terminal.
Redirect tee's standard output to a pipe that's connected to ./three.sh's standard input. This redirection doesn't affect the pipes set up in step 1.
Execute tee.
bash uses the first set of operations, zsh uses the second. In both cases, the order you see output from your shell scripts in is controlled by your OS's process scheduler and might as well be random. In the case where you redirect tee's standard output to /dev/null, they both seem to follow the second scenario and set up the subprocesses before the parent tee's redirection. This inconsistency on bash's part does seem unusual and a potential source of subtle bugs.
I can't replicate the missing prompt issue, but that's with bash 4.4.20 - I don't have 5 installed on this computer.

Related

Bash command with pipe('|') alway return exit code of 0, even in error case [duplicate]

I want to execute a long running command in Bash, and both capture its exit status, and tee its output.
So I do this:
command | tee out.txt
ST=$?
The problem is that the variable ST captures the exit status of tee and not of command. How can I solve this?
Note that command is long running and redirecting the output to a file to view it later is not a good solution for me.

There is an internal Bash variable called $PIPESTATUS; it’s an array that holds the exit status of each command in your last foreground pipeline of commands.
<command> | tee out.txt ; test ${PIPESTATUS[0]} -eq 0
Or another alternative which also works with other shells (like zsh) would be to enable pipefail:
set -o pipefail
...
The first option does not work with zsh due to a little bit different syntax.

Dumb solution: Connecting them through a named pipe (mkfifo). Then the command can be run second.
mkfifo pipe
tee out.txt < pipe &
command > pipe
echo $?

using bash's set -o pipefail is helpful
pipefail: the return value of a pipeline is the status of
the last command to exit with a non-zero status,
or zero if no command exited with a non-zero status

There's an array that gives you the exit status of each command in a pipe.
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo $?
0
$ cat x| sed 's///'
cat: x: No such file or directory
$ echo ${PIPESTATUS[*]}
1 0
$ touch x
$ cat x| sed 's'
sed: 1: "s": substitute pattern can not be delimited by newline or backslash
$ echo ${PIPESTATUS[*]}
0 1

This solution works without using bash specific features or temporary files. Bonus: in the end the exit status is actually an exit status and not some string in a file.
Situation:
someprog | filter
you want the exit status from someprog and the output from filter.
Here is my solution:
((((someprog; echo $? >&3) | filter >&4) 3>&1) | (read xs; exit $xs)) 4>&1
echo $?
See my answer for the same question on unix.stackexchange.com for a detailed explanation and an alternative without subshells and some caveats.

By combining PIPESTATUS[0] and the result of executing the exit command in a subshell, you can directly access the return value of your initial command:
command | tee ; ( exit ${PIPESTATUS[0]} )
Here's an example:
# the "false" shell built-in command returns 1
false | tee ; ( exit ${PIPESTATUS[0]} )
echo "return value: $?"
will give you:
return value: 1

So I wanted to contribute an answer like lesmana's, but I think mine is perhaps a little simpler and slightly more advantageous pure-Bourne-shell solution:
# You want to pipe command1 through command2:
exec 4>&1
exitstatus=`{ { command1; printf $? 1>&3; } | command2 1>&4; } 3>&1`
# $exitstatus now has command1's exit status.
I think this is best explained from the inside out - command1 will execute and print its regular output on stdout (file descriptor 1), then once it's done, printf will execute and print icommand1's exit code on its stdout, but that stdout is redirected to file descriptor 3.
While command1 is running, its stdout is being piped to command2 (printf's output never makes it to command2 because we send it to file descriptor 3 instead of 1, which is what the pipe reads). Then we redirect command2's output to file descriptor 4, so that it also stays out of file descriptor 1 - because we want file descriptor 1 free for a little bit later, because we will bring the printf output on file descriptor 3 back down into file descriptor 1 - because that's what the command substitution (the backticks), will capture and that's what will get placed into the variable.
The final bit of magic is that first exec 4>&1 we did as a separate command - it opens file descriptor 4 as a copy of the external shell's stdout. Command substitution will capture whatever is written on standard out from the perspective of the commands inside it - but since command2's output is going to file descriptor 4 as far as the command substitution is concerned, the command substitution doesn't capture it - however once it gets "out" of the command substitution it is effectively still going to the script's overall file descriptor 1.
(The exec 4>&1 has to be a separate command because many common shells don't like it when you try to write to a file descriptor inside a command substitution, that is opened in the "external" command that is using the substitution. So this is the simplest portable way to do it.)
You can look at it in a less technical and more playful way, as if the outputs of the commands are leapfrogging each other: command1 pipes to command2, then the printf's output jumps over command 2 so that command2 doesn't catch it, and then command 2's output jumps over and out of the command substitution just as printf lands just in time to get captured by the substitution so that it ends up in the variable, and command2's output goes on its merry way being written to the standard output, just as in a normal pipe.
Also, as I understand it, $? will still contain the return code of the second command in the pipe, because variable assignments, command substitutions, and compound commands are all effectively transparent to the return code of the command inside them, so the return status of command2 should get propagated out - this, and not having to define an additional function, is why I think this might be a somewhat better solution than the one proposed by lesmana.
Per the caveats lesmana mentions, it's possible that command1 will at some point end up using file descriptors 3 or 4, so to be more robust, you would do:
exec 4>&1
exitstatus=`{ { command1 3>&-; printf $? 1>&3; } 4>&- | command2 1>&4; } 3>&1`
exec 4>&-
Note that I use compound commands in my example, but subshells (using ( ) instead of { } will also work, though may perhaps be less efficient.)
Commands inherit file descriptors from the process that launches them, so the entire second line will inherit file descriptor four, and the compound command followed by 3>&1 will inherit the file descriptor three. So the 4>&- makes sure that the inner compound command will not inherit file descriptor four, and the 3>&- will not inherit file descriptor three, so command1 gets a 'cleaner', more standard environment. You could also move the inner 4>&- next to the 3>&-, but I figure why not just limit its scope as much as possible.
I'm not sure how often things use file descriptor three and four directly - I think most of the time programs use syscalls that return not-used-at-the-moment file descriptors, but sometimes code writes to file descriptor 3 directly, I guess (I could imagine a program checking a file descriptor to see if it's open, and using it if it is, or behaving differently accordingly if it's not). So the latter is probably best to keep in mind and use for general-purpose cases.

(command | tee out.txt; exit ${PIPESTATUS[0]})
Unlike #cODAR's answer this returns the original exit code of the first command and not only 0 for success and 127 for failure. But as #Chaoran pointed out you can just call ${PIPESTATUS[0]}. It is important however that all is put into brackets.

In Ubuntu and Debian, you can apt-get install moreutils. This contains a utility called mispipe that returns the exit status of the first command in the pipe.

Outside of bash, you can do:
bash -o pipefail -c "command1 | tee output"
This is useful for example in ninja scripts where the shell is expected to be /bin/sh.

The simplest way to do this in plain bash is to use process substitution instead of a pipeline. There are several differences, but they probably don't matter very much for your use case:
When running a pipeline, bash waits until all processes complete.
Sending Ctrl-C to bash makes it kill all the processes of a pipeline, not just the main one.
The pipefail option and the PIPESTATUS variable are irrelevant to process substitution.
Possibly more
With process substitution, bash just starts the process and forgets about it, it's not even visible in jobs.
Mentioned differences aside, consumer < <(producer) and producer | consumer are essentially equivalent.
If you want to flip which one is the "main" process, you just flip the commands and the direction of the substitution to producer > >(consumer). In your case:
command > >(tee out.txt)
Example:
$ { echo "hello world"; false; } > >(tee out.txt)
hello world
$ echo $?
1
$ cat out.txt
hello world
$ echo "hello world" > >(tee out.txt)
hello world
$ echo $?
0
$ cat out.txt
hello world
As I said, there are differences from the pipe expression. The process may never stop running, unless it is sensitive to the pipe closing. In particular, it may keep writing things to your stdout, which may be confusing.

PIPESTATUS[#] must be copied to an array immediately after the pipe command returns.
Any reads of PIPESTATUS[#] will erase the contents.
Copy it to another array if you plan on checking the status of all pipe commands.
"$?" is the same value as the last element of "${PIPESTATUS[#]}",
and reading it seems to destroy "${PIPESTATUS[#]}", but I haven't absolutely verified this.
declare -a PSA
cmd1 | cmd2 | cmd3
PSA=( "${PIPESTATUS[#]}" )
This will not work if the pipe is in a sub-shell. For a solution to that problem,
see bash pipestatus in backticked command?

Base on #brian-s-wilson 's answer; this bash helper function:
pipestatus() {
local S=("${PIPESTATUS[#]}")
if test -n "$*"
then test "$*" = "${S[*]}"
else ! [[ "${S[#]}" =~ [^0\ ] ]]
fi
}
used thus:
1: get_bad_things must succeed, but it should produce no output; but we want to see output that it does produce
get_bad_things | grep '^'
pipeinfo 0 1 || return
2: all pipeline must succeed
thing | something -q | thingy
pipeinfo || return

Pure shell solution:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (cat || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
hello world
And now with the second cat replaced by false:
% rm -f error.flag; echo hello world \
| (cat || echo "First command failed: $?" >> error.flag) \
| (false || echo "Second command failed: $?" >> error.flag) \
| (cat || echo "Third command failed: $?" >> error.flag) \
; test -s error.flag && (echo Some command failed: ; cat error.flag)
Some command failed:
Second command failed: 1
First command failed: 141
Please note the first cat fails as well, because it's stdout gets closed on it. The order of the failed commands in the log is correct in this example, but don't rely on it.
This method allows for capturing stdout and stderr for the individual commands so you can then dump that as well into a log file if an error occurs, or just delete it if no error (like the output of dd).

It may sometimes be simpler and clearer to use an external command, rather than digging into the details of bash. pipeline, from the minimal process scripting language execline, exits with the return code of the second command*, just like a sh pipeline does, but unlike sh, it allows reversing the direction of the pipe, so that we can capture the return code of the producer process (the below is all on the sh command line, but with execline installed):
$ # using the full execline grammar with the execlineb parser:
$ execlineb -c 'pipeline { echo "hello world" } tee out.txt'
hello world
$ cat out.txt
hello world
$ # for these simple examples, one can forego the parser and just use "" as a separator
$ # traditional order
$ pipeline echo "hello world" "" tee out.txt
hello world
$ # "write" order (second command writes rather than reads)
$ pipeline -w tee out.txt "" echo "hello world"
hello world
$ # pipeline execs into the second command, so that's the RC we get
$ pipeline -w tee out.txt "" false; echo $?
1
$ pipeline -w tee out.txt "" true; echo $?
0
$ # output and exit status
$ pipeline -w tee out.txt "" sh -c "echo 'hello world'; exit 42"; echo "RC: $?"
hello world
RC: 42
$ cat out.txt
hello world
Using pipeline has the same differences to native bash pipelines as the bash process substitution used in answer #43972501.
* Actually pipeline doesn't exit at all unless there is an error. It executes into the second command, so it's the second command that does the returning.

Why not use stderr? Like so:
(
# Our long-running process that exits abnormally
( for i in {1..100} ; do echo ploop ; sleep 0.5 ; done ; exit 5 )
echo $? 1>&2 # We pass the exit status of our long-running process to stderr (fd 2).
) | tee ploop.out
So ploop.out receives the stdout. stderr receives the exit status of the long running process. This has the benefit of being completely POSIX-compatible.
(Well, with the exception of the range expression in the example long-running process, but that's not really relevant.)
Here's what this looks like:
...
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
ploop
5
Note that the return code 5 does not get output to the file ploop.out.

Log command line plus its output in a bash script

Is there a way for a script to log both, the command line run (including piped ones) plus its output without duplicating the line for the command?
The intention is that the script should have a clean output, but should log verbosely into a log file (so no set -x). Apart from the output, it shall also log the command line causing the output, which could be a piped command-one liner.
The most basic approach is to duplicate the command line in the script and then dump it into the log followed by the captured output of the actual command being run:
echo "command argument1 \"quoted argument2\" | grep -oE \"some output\"" >> file.log
output="$(command argument1 "quoted argument2" 2>&1 | grep -oE "some output")"
echo "${output}" >> file.log
This has the side effect that quoted sections would need to be escaped for the log, which can lead to errors resulting in confusion.
If none of the commands were piped, one could store the command line in an array and then "run" the array.
command=(command argument1 "quoted argument2")
echo "${command[#]}" >> file.log
output="$("${command[#]}" 2>&1)"
echo "${output}" >> file.log
Though with this approach "quoted argument2" would become quoted argument2 in the log.
Is there a way (in bash) to realize this without having to duplicate the commands?

You could play with redirections, switch the x option on and off on demand, unset PS4 to get rid of the leading + , and define log_on and log_off functions for easier coding. Something like this:
$ cat script.sh
#!/usr/bin/env bash
function log_on {
exec 3>&1 4>&2
exec &> >( sed -E '/^(set \+x|log_off)$/d' >> file.log )
ps4=$PS4
PS4=
set -x
}
function log_off {
set +x
exec 1>&3 2>&4
PS4=$ps4
}
echo something not logged
log_on
echo something logged
log_off
echo something else not logged
$ rm -f file.log
$ ./script.sh
something not logged
something else not logged
$ cat file.log
echo something logged
something logged
The exec <redirection> commands look a bit cryptic (as most redirections) but they are rather simple:
exec 3>&1 4>&2 makes copies of file descriptors fd1 and fd2 (stdout and stderr by default) to be able to restore these in log_off. After this fd3 and fd4 are copies of fd1 and fd2, respectively. Pick other fd than 3 or 4 if you already use them.
exec &> >( sed ... ) redirect fd1 and fd2 to the standard input of a sed command.
The sed command sed -E '/^(set \+x|log_off)$/d' >> file.log deletes lines containing only set + or log_off and appends its output to file.log. Without this sed command you would always see the two following lines:
log_off
set +x
in your logs, after a group of logged commands.
exec 1>&3 2>&4 restores fd1 and fd2 from their copies in fd3 and fd4.
The rest is straightforward: save PS4 in ps4 such that it can be restored, enable/disable the x option. This should be easy to adapt or extend if needed.
The x option displays the simple commands separately. It breaks pipes, for instance. If you prefer a command log that looks more like the commands you wrote you can replace set -/+x by set -/+v.

IMHO this has already been answered here:
For simplicity the set linux command is what you need.
set -x or set -v

Where does echo output go to, when running a bash script?

The following will output to stdout if the script is run in a terminal:
echo "some message"
If the script is called by another script, where does the output go to? Is there any significant overhead involved?
I'm using GNU bash, version 4.3.33.
Many thanks

The output of
echo "some message"
should go to the stdout except (not exclusive) in cases where
You have a o/p redirection, as given below, which affect the echo statement.
exec 1>/dev/null # 1 is the file descriptor for stdout, this should be before the echo
./script >outfile # The whole output is redirected to a file
You have a do-nothing directive(:) before the echo command
: echo "some message" # Does nothing

awk output when run in background

Im wondering why awk print different output when run in background
My script:
#!/bin/bash
echo "Name of shell is $SHELL"
relase=`uname -r`
echo "Release is: $relase"
if [ $SHELL != "/bin/bash" ] || [ $relase != "3.13.0-32-generic" ] ; then
echo "Warning, different configuration"
fi
if [ $# -eq 0 ] ; then
echo "Insert name of shell"
read sname
else
sname=$1
fi
awk -v sname="$sname" 'BEGIN {FS=":"} {if ($7 == sname) print $1 }' </etc/passwd &
When i run awk without ampersand, output is:
petr#PetrLinux-VirtualBox:~/Documents$ ./script1 /bin/bash
Name of shell is /bin/bash
Release is: 3.13.0-32-generic
root
petr
but when i run awk with ampersand - in background, output is folowing:
petr#PetrLinux-VirtualBox:~/Documents$ ./script1 /bin/bash
Name of shell is /bin/bash
Release is: 3.13.0-32-generic
petr#PetrLinux-VirtualBox:~/Documents$ root
petr
First record (root) is not printed on single line. Please tell me why ańd if there is way how to print on single line while running on background. Thanks.

What you see is a mix of two outputs. The first output is of your shell, printing the command prompt (petr#PetrLinux-VirtualBox:~/Documents$). The second output is root from your script.
As your shell script runs in the background, you now have two processes writing to your terminal window: the bash (printing the prompt), and your script, printing the awk-output. This then just mixes up.
The only way to prevent that is to redirect the output of the script to a file or other device, instead of your console. For example:
$ ./script1 /bin/bash &> output.txt &

The output is the same. It just appears to be different because two processes write on the same channel (your terminal) and mix their output. One process is the awk script and the other is your shell which prints a new prompt.
There is no way to determine the precise point in which the output will switch from one process to the other. It can be different on different systems (with the same software), it can also depend on the load of the computer and lots of other things.
The only decent solution is to redirect the output into a different stream, e. g. a file using > outfile.

How to redirect stdout/stderr when /dev/null is not writable for normal users

How to disable stdout or stderr in bash scripts temporarily?
Of course the most common way is to redirect stdout or stderr to /dev/null.
But on some systems /dev/null may be unwritable for normal users.
I am writing some scripts that is aim to be portable, so I do not prefer using /dev/null
Some blogs/posts say that >&- can close stdout, but when I tried echo 123 >&- in a bash terminal, it just failed with the message "bash: echo: write error: Bad file descriptor"
Surely I can do it by redirecting stdout or stderr to a tmp file like this:
some_command > /tmp/null
But what I want is a more "elegant" way
I think perhaps I can achieve this by using pipe like this:
some_command | :
But in this way, it may "pollutes" the exit code of the original command

Here is a possible way to do what you want:
( my_cmd 3>&1 1>&2 2>&3- ) | :
This temporarily send stdout to a new file handle, 3 and redirect stderr to stdout so that the stderr pipes into the command (in this case, :). Then the new file handle is routed back out to stdout. These avoid piping the stdout of my_cmd into :. The - in closes the handle after it's used.
To check the exist status of my_cmd after the above you examine the environment variable $PIPESTATUS[0]. $PIPESTATUS is a bash environment array variable that holds the exit status of each piped command in the last pipe that was done.
I think the really correct answer is to investigate why /dev/null isn't world writable. Having it not so is an off-standard system configuration and may cause system problems. The above work-around is a little messy by comparison.

Based on what I wrote earlier and #nos's comment above, here's an example:
(assuming you have no file called 'zzz' in your current directory, and that '.' is readable)
#!/bin/bash
set -o pipefail
ls . 2>&1 |:
echo $?
ls zzz 2>&1 |:
echo $?
The pipelines succeed and fail silently and maintain the exit code. Note that you can probably still make a pipeline example where this would not produce the desired results. I haven't come up with one in my head yet, but that doesn't mean it's not out there. The best answer, as many have noted already, is to fix the system so that /dev/null is world writable.
EDIT: Changed /bin/sh to /bin/bash, although this probably isn't necessary. But since I haven't tested this against a true Bourne Shell, I decided to err on the side of caution.
EDIT: Another script, showing several different redirections, and using the |& shortcut for 2>&1 |. If you run this, you'll notice that some of the ls failures return a 141 exit status rather than the expected 2. This is a broken pipe exit status, but still represents a failure.
#!/bin/bash
set -o pipefail
# start with commands that should succeed
# redirect everything to ':'
echo "ls . |& :"
ls . |& :
echo $?
# redirect only stdout to ':'
echo "ls . | :"
ls . | :
echo $?
# redirect only stderr to ':'
echo "((ls . 1>&3) |& : ) 3>&1"
((ls . 1>&3) |& : ) 3>&1
echo $?
# now move to failures
# redirect everything to ':'
echo "ls zzz |& :"
ls zzz |& :
echo $?
# redirect only stdout to ':'
echo "ls zzz |:"
ls zzz |:
echo $?
# redirect only stderr to ':'
echo "((ls zzz 1>&3) |& : ) 3>&1"
((ls zzz 1>&3) |& : ) 3>&1
echo $?
I use two subshells when I'm attempting to destroy stdout but keep stderr. You could do it without the outer one. In fact, that might be better. Instead of getting a broken pipe error, you get a 1 exit status.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Explanation needed for tee, process substitution, redirect...and different behaviors in Bash and Z shell ('zsh') - linux

Related

Bash command with pipe('|') alway return exit code of 0, even in error case [duplicate]

Log command line plus its output in a bash script

Where does echo output go to, when running a bash script?

awk output when run in background

How to redirect stdout/stderr when /dev/null is not writable for normal users

Categories

Resources