by default, does stderr start as a duplicate file descriptor of stdout? - linux

Does stderr start out as a duplicate FD of stdout?
i.e. considering dup(2), is stderr initialized kind of like so?
int stderr = dup(stdout); // stdout = 1
In the BashGuide, there's a code example
$ grep proud file 'not a file' > proud.log 2> proud.log
The author states
We've created two FDs that both point to the same file, independently of each other. The results of this are not well-defined. Depending on how the operating system handles FDs, some information written via one FD may clobber information written through the other FD.
and further says
We need to prevent having two independent FDs working on the same destination or source. We can do this by duplicating FDs
So basically, 2 independent FDs on the same file = broken
Well, I know that stdout & stderr both point to my terminal by default. Since they can both function properly (i.e. i don't see mish-mashed output+error messages), does that mean that they're not independent FDs? And thus, stderr is a duplicate FD of stdout? (or vice versa)

No, stderr is not a duplicate of stdout.
They work in parallel, independent and asynchronously.
Which means that in a race condition, you might even 'mish-mash' as you say.
One practical difference is also that stderr output will be ignored when you pipe your output to a subsequent command:
Practical example:
$ cat tst.sh
#!/bin/bash
echo "written to stdout"
echo "written to stderr" 1>&2
exit 0
~$ ./tst.sh
written to stdout
written to stderr
~$ ./tst.sh | xargs -n1 -I{} echo "this came through the pipe:{}"
written to stderr
this came through the pipe:written to stdout

Related

Is the redirect in bash hierarchical?

There's an example explaining how the order of redirect command will affect the final result in the manual of bash as
Note that the order of redirections is significant. For example, the command
ls > dirlist 2>&1
directs both standard output (file descriptor 1) and standard error (file descriptor 2) to the file dirlist, while the command
ls 2>&1 > dirlist
directs only the standard output to file dirlist, because the standard error was made a copy of the standard output before the standard output was redirected to dirlist.
In the second case, if my understanding is correct, 2>&1 means everything in stderr will be redirected to stdout and > dirlist means everything in stdout will be redirected to file dirlist. The final result is that only stderr is shown in the console.
My question is that does such a phenomenon means the redirect of bash is not hierarchical? Since if it's hierarchical, the stderr has already been put into stdout and should be put into the file dirlist together with stdout.
You need to see ">&n" (iow, "1>&n") as "fd 1 is redirected to wherever fd n is currently going to", not "fd 1 will go to wherever fd n is going even if fd n changes after that". Thus I dislike the term "copy" as it is to me a bit ambiguous: 'n>&m' make fd n 'point' to wherever fd m currently points to (and does not link fd n to fd m in any way, ie it does not make them "the same fd". they both still ar different fd's, that can change independently)
ex1 : in an interactive shell, where stdout & stderr both go to your terminal:
Every commands you type in this shell will output their STDOUT (fd 1) to the terminal as well, and their STDERR (fd 2) to the terminal as well, by default (unless you tell them to output to something else)
So when you redirect a command to output somewhere else:
command 2>&1 >/dev/null
# now `command`'s STDERR (its `fd 2`) goes to: the terminal (again), as by default its `fd 1` goes to terminal
# and then only its `fd 1` (its stdout) is redirected to /dev/null, discarding it.
command >/dev/null 2>&1
# here, fd 1 (stdout) is redirected to /dev/null
# and then fd 2 (stderr) goes to where fd 1 is *currently* going, ie /dev/null too
most commands with a '>&something' or '>something' have this redirection be temporary. A special case is 'exec >something' or 'exec >&something', which changes the redirection of the current shell or program, and thus stays in effect "until another exec changes it again" or until the current shell or program finishes. iow, 'exec someredirection' affects not a subcommand, but the current program/shell.
ex2 : in a program launched with : program >/some/log_stdout 2>/some/log_stderr
# beginning of the program
... # anything here will have its stdout go to /some/log_stdout and stderr go to /some/log_stderr
exec 3>&1 # cree a new fd, 3, that goes to where stdout currently goes (/some/log_stdout)
exec 4>&2 # cree a new fd, 4, that goes to where stderr currently goes (/some/log_stderr)
exec 1>/dev/null # redirect fd 1 to /dev/null. fd 3 still goes to /some/log_stdout
exec 2>/dev/null # redirect fd 2 to /dev/null. fd 4 still goes to /some/log_stderr
something # here, no output: this command "something" outputs to fd1 and fd2, both going to /dev/null
other thing # same for that one
exec 5>&1 # this new fd, 5, goes to where fd1 currently goes (/dev/null)
somecommand ... >&3
# here, the STDOUT (only) of somecommand goes to where fd 3 currently points to,
# ie to /some/log_stdout (and of course just for the durection of somecommand)
another_command
# no output, as it outputs by default its stdout to fd 1 and stderr to fd 2, both going to /dev/null
# etc.
exec 1>&3 # fd 1 now goes to wherever fd3 was going,
# and thus we "restore" the original fd 1
# (as no 'exec 3>...' where done to change fd 3)
exec 2>&4 # same thing for fd 2, going to where fd 4 is going
# (which was where fd 2 was originally going)
It may help to "imagine" the following table attached to your command (or your shell, or your script:
your interactive shell usually will have this by default:
fd 1(=stdout) _goes_to_ your terminal
fd 2(=stderr) _goes_to_ your terminal
Each command it launches will also have 'a copy of this table' (or rather, their process space inherit the launching command/shell's current fd's), unless told otherwise.
When you launch : somecommand >/dev/null 2>&1, before somecommand is launched, fd 1 is first pointed to "/dev/null", and fd 2 is then pointed to where fd 1 now points (/dev/null) and the process is launched with this 'table':
fd 1(=stdout) _goes_to_ /dev/null
fd 2(=stderr) _goes_to_ /dev/null
The shell that launched the command still have its own table, unchanged. (unless 'somecommand' was in fact 'exec n>somewhere', which changes the current shell's process fd n to point to 'somewhere', "permanently" (well, until the current shell finishes.).
ls 2>&1 > dirlist
Here, destination of stdout is copied to stderr first, but this still means they both point to your terminal. After that, stdout is redirected to dirlist. So dirlist's stderr will still be printed to terminal.

Chronologically capturing STDOUT and STDERR

This very well may fall under KISS (keep it simple) principle but I am still curious and wish to be educated as to why I didn't receive the expected results. So, here we go...
I have a shell script to capture STDOUT and STDERR without disturbing the original file descriptors. This is in hopes of preserving the original order of output (see test.pl below) as seen by a user on the terminal.
Unfortunately, I am limited to using sh, instead of bash (but I welcome examples), as I am calling this from another suite and I may wish to use it in a cron in the future (I know cron has the SHELL environment variable).
wrapper.sh contains:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
${command} >${out} 2>${err}
test.pl contains:
#!/usr/bin/perl
print "1: stdout1\n";
print STDERR "2: stderr1\n";
print "3: stdout2\n";
In the scenario:
sh wrapper.sh /tmp/xxx perl test.pl
STDOUT contains:
1: stdout1
3: stdout2
STDERR contains:
2: stderr1
All good so far...
/tmp/xxx contains:
2: stderr1
1: stdout1
3: stdout2
However, I was expecting /tmp/xxx to contain:
1: stdout1
2: stderr1
3: stdout2
Can anyone explain to me why STDOUT and STDERR are not appending /tmp/xxx in the order that I expected? My guess would be that the backgrounded tee processes are blocking the /tmp/xxx resource from one another since they have the same "destination". How would you solve this?
related: How do I write stderr to a file while using "tee" with a pipe?
It is a feature of the C runtime library (and probably is imitated by other runtime libraries) that stderr is not buffered. As soon as it is written to, stderr pushes all of its characters to the destination device.
By default stdout has a 512-byte buffer.
The buffering for both stderr and stdout can be changed with the setbuf or setvbuf calls.
From the Linux man page for stdout:
NOTES: The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output. The buffering mode of the standard streams (or any other stream) can be changed using the setbuf(3) or setvbuf(3) call. Note that in case stdin is associated with a terminal, there may also be input buffering in the terminal driver, entirely unrelated to stdio buffering. (Indeed, normally terminal input is line buffered in the kernel.) This kernel input handling can be modified using calls like tcsetattr(3); see also stty(1), and termios(3).
After a little bit more searching, inspired by #wallyk, I made the following modification to wrapper.sh:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
script -q -F 2 ${command} >${out} 2>${err}
Which now produces the expected:
1: stdout1
2: stderr1
3: stdout2
The solution was to prefix the $command with script -q -F 2 which makes script quite (-q) and then forces file descriptor 2 (STDOUT) to flush immediately (-F 2).
I am now researching to determine how portable this is. I think -F pipe may be Mac and FreeBSD and -f or --flush may be other distros...
related: How to make output of any shell command unbuffered?

bash script read pipe or argument

I want my script to read a string either from stdin , if it's piped, or from an argument. So first i want to check if some text is piped and if not it should use an argument as input. My code looks something like this:
value=$(cat) # read from stdin
if [ "$pipe" != "" ]; then #check if pipe is not empty
#Do something with pipe string
else
#Do something with argument string
fi
The problem is when it's not piped, then the script will halt and wait for "ctrl d" and i dont want that. Any suggestions on how to solve this?
Thanks in advance.
/Tomas
What about checking the argument first?
if (($#)) ; then
process "$1"
else
cat | process
fi
Or, just take advantage from the same behaviour of cat:
cat "$#" | process
If you only need to know if it's a pipe or a redirection, it should be sufficient to determine if stdin is a terminal or not:
if [ -t 0 ]; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
[ (aka test) with -t is equivalent to the libc isatty() function.
The above will work with both something | myscript and myscript < infile. This is the simplest solution, assuming your script is for interactive use.
The [ command is a builtin in bash and some other shells, and since [/test with -tis in POSIX, it's portable too (not relying on Linux, bash, or GNU utility features).
There's one edge case, test -t also returns false if the file descriptor is invalid, but it would take some slight adversity to arrange that. test -e will detect this, though assuming you have a filename such as /dev/stdin to use.
The POSIX tty command can also be used, and handles the adversity above. It will print the tty device name and return 0 if stdin is a terminal, and will print "not a tty" and return 1 in any other case:
if tty >/dev/null ; then
# stdin is a tty: process command line
else
# stdin is not a tty: process standard input
fi
(with GNU tty, you can use tty -s for silent operation)
A less portable way, though certainly acceptable on a typical Linux, is to use GNU stat with its %F format specifier, this returns the text "character special file", "fifo" and "regular file" in the cases of terminal, pipe and redirection respectively. stat requires a filename, so you must provide a specially-named file of the form /dev/stdin, /dev/fd/0, or /proc/self/fd/0, and use -L to chase symlinks:
stat -L -c "%F" /dev/stdin
This is probably the best way to handle non-interactive use (since you can't make assumptions about terminals then), or to detect an actual pipe (FIFO) distinct from redirection.
There is a slight gotcha with %F in that you cannot use it to tell the difference between a terminal and certain other device files, for example /dev/zero or /dev/null which are also "character special files" and might reasonably appear. An unpretty solution is to use %t to report the underlying device type (major, in hex), assuming you know what the underlying tty device number ranges are... and that depends on whether you're using BSD style ptys or Unix98 ptys, or whether you're on the actual console, among other things. In the simple case %t will be 0 though for a pipe or a redirection of a normal (non-special) file.
More general solutions to this kind of problem are to use bash's read with a timeout (read -t 0 ...) or non-blocking I/O with GNU dd (dd iflag=nonblock).
The latter will allow you to detect lack of input on stdin, dd will return an exit code of 1 if there is nothing ready to read. However, these are more suitable for non-blocking polling loops, rather than a once-off check: there is a race condition when you start two or more processes in a pipeline as one may be ready to read before another has written.
It's easier to check for command line arguments first and fallback to stdin if no arguments. Shell Parameter Expansion is a nice shorthand instead of the if-else:
value=${*:-`cat`}
# do something with $value

redirecting stdin _and_ stdout to a pipe

I would like to run a program "A", have its output go to the input to another program "B", as well as stdin going to intput of "B". If program "A" closes, I'd like "B" to continue running.
I can redirect A output to B input easily:
./a | ./b
And I can combine stderr into the output if I'd like:
./a 2>&1 | ./b
But I can't figure out how to combine stdin into the output. My guess would be:
./a 0>&1 | ./b
but it doesn't work.
Here's a test that doesn't require us to rewrite up any test programs:
$ echo ls 0>&1 | /bin/sh -i
$ a b info.txt
$
/bin/sh: Cannot set tty process group (No such process)
If possible, I'd like to do this using only bash redirection on the command line (I don't want to write a C program to fork off child processes and do anything complicated everytime I want to do some redirection of stdin to a pipe).
This cannot be done without writing an auxiliary program.
In general, stdin could be a read-only file descriptor (heck, it might refer to read-only file). So you cannot "insert" anything into it.
You will need to write a "helper" program that monitors two file descriptors (say, 0 and 3) in order to read from both and "merge" them. A simple select or poll loop would be sufficient, and you could write it in most scripting languages, but not the shell, I don't think.
Then you can use shell redirection to feed your program's output to descriptor 3 of the "helper".
Since what you want is basically the opposite of "tee", I might call it "eet"...
[edit]
If only you could launch "cat" in the background...
But that will fail because background processes with a controlling terminal cannot read from stdin. So if you could just detach "cat" from its controlling terminal and run it in the background...
On Linux, "setsid cat" should do it, roughly. But (a) I could not get it to work very well and (b) I really do not have time for this today and (c) it is non-standard anyway.
I would just write the helper program.
[edit 2]
OK, this seems to work:
{ seq 5 ; sleep 2 ; seq 5 ; } | /bin/bash -c 'set -m ; setsid cat ; echo HELLO'
The set -m thing forces bash to enable job control, which apparently is needed to prevent the shell from redirecting stdin from /dev/null.
Here, the echo HELLO represents your "program A". The seq commands (with the sleep in the middle) are just to provide some input. And yes, you can pipe this whole thing to process B.
About as ugly and non-portable a solution as you could ask for...
A pipe has two ends. One is for writing, and that which gets written appears in the other end, which is for reading.
It's a pipe, not a T or Y junction.
I don't think your scenario is possible. Having "stdin going to input of" anything doesn't make sense.
If I understand your requirements correctly, you want this set up (ASCII art to the fore):
o----+----->| A |----+---->| B |---->o
| ^
| |
+------------------+
with the additional constraint that if process A closes up shop, process B should be able to continue with the input stream going to B.
This is a non-standard setup, as you realize, and can only be achieved by using an auxilliary program to drive the input to A and B. You end up with some interesting synchronization issues but it will all work remarkably well as long as your messages are short enough.
The plumbing necessary to achieve this is notable - you'll need two pipes, one for the input to A and the other for the input to B, and the output of A will be connected to the input of B as well.
o---->| C |---------->| A |----+---->| B |---->o
| ^
| |
+--------------------------+
Note that C will be writing the data twice, once to A and once to B. Note, too, that the pipe from A to B is the same pipe as the pipe from C to A.
To make the given test case work you have to while ... read from the controlling terminal device /dev/tty inside a sh -c '...' construct.
Note the use of eval (could it be avoided here?) and that multi-line commands on input> will fail.
echo 'ls; export var=myval' | (
stdin="$(</dev/stdin)"
/bin/sh -i -c '
eval "$1";
while IFS="" read -e -r -p "input> " line; do
history -s "${line}"
eval "${line}";
done </dev/tty
' argv0 "${stdin}"
)
input> echo $var
For a similar problem and the use of named pipes see here:
BASH: Best architecture for reading from two input streams
This can't be done exactly as shown, but to perform your example you can make use of cat's ability to join files together:
cat <(echo ls) - | /bin/sh
(You can do -i, but then you'll have to have another process kill the /bin/sh, as your attempts to Ctrl-C and Ctrl-D out will fail.)
This assumes that you want to pass in your piped input and then accept from stdin. You can also make it so that it does something after stdin is done, or on both sides -- but it won't merge input character-by-character or line-by-line.
This seems to do what you want:
$ ( ./a <&-; cat ) | ./b
(It's not clear to me if you want a to get input...this solution sends all input to b)
Of course, in this case the inputs to b are strictly ordered: all of the output of a
is sent to b first, then a terminates, then input goes to b. If you want things
interleaved, try:
$ ( ./a <&- & cat ) | ./b

exec n<&m versus exec n>&m -- based on Sobell's Linux book

In Mark Sobell's A Practical Guide to Linux Commands, Editors, and Shell Programming, Second Edition he writes (p. 432):
The <& token duplicates an input file
descriptor; >& duplicates an output
file descriptor.
This seems to be inconsistent with another statement on the same page:
Use the following format to open or
redirect file descriptor n as a
duplicate of file descriptor m:
exec n<&m
and with an example also on the same page:
# File descriptor 3 duplicates standard input
# File descriptor 4 duplicates standard output
exec 3<&0 4<&1
If >& duplicates an output file descriptor then should we not say
exec 4>&1
to duplicate standard output?
The example is right in practice. The book's original explanation is an accurate description of what the POSIX standard says, but the POSIX-like shells I have handy (bash and dash, the only ones I believe are commonly seen on Linux) are not that picky.
The POSIX standard says the same thing as the book about input and output descriptors, and goes on to say this: for n<&word, "if the digits in word do not represent a file descriptor already open for input, a redirection error shall result". So if you want to be careful about POSIX compatibility, you should avoid this usage.
The bash documentation also says the same thing about <& and >&, but without the promise of an error. Which is good, because it doesn't actually give an error. Instead, empirically n<&m and n>&m appear to be interchangeable. The only difference between <& and >& is that if you leave off the fd number on the left, <& defaults to 0 (stdin) and >& to 1 (stdout).
For example, let's start a shell with fd 1 pointing at a file bar, then try out exactly the exec 4<&1 example, try to write to the resulting fd 4, and see if it works:
$ sh -c 'exec 4<&1; echo foo >&4' >bar; cat bar
foo
It does, and this holds using either dash or bash (or bash --posix) for the shell.
Under the hood, this makes sense because <& and >& are almost certainly just calling dup2(), which doesn't care whether the fds are opened for reading or writing or appending or what.
[EDIT: Added reference to POSIX after discussion in comments.]
If stdout is a tty, then it can safely be cloned for reading or writing. If stdout is a file, then it may not work. I think the example should be 4>&1. I agree with Greg that you can both read and write the clone descriptor, but requesting a redirection with <& is supposed to be done with source descriptors that are readable, and expecting stdout to be readable doesn't make sense. (Although I admit I don't have a reference for this claim.)
An example may make it clearer. With this script:
#!/bin/bash
exec 3<&0
exec 4<&1
read -p "Reading from fd 3: " <&3
echo From fd 3: $REPLY >&2
REPLY=
read -p "Reading from fd 4: " <&4
echo From fd 4: $REPLY >&2
echo To fd 3 >&3
echo To fd 4 >&4
I get the following output (the stuff after the : on "Reading from" lines is typed at the terminal):
$ ./5878384b.sh
Reading from fd 3: foo
From fd 3: foo
Reading from fd 4: bar
From fd 4: bar
To fd 3
To fd 4
$ ./5878384b.sh < /dev/null
From fd 3:
Reading from fd 4: foo
From fd 4: foo
./5878384b.sh: line 12: echo: write error: Bad file descriptor
To fd 4
$ ./5878384b.sh > /dev/null
Reading from fd 3: foo
From fd 3: foo
./5878384b.sh: line 9: read: read error: 0: Bad file descriptor
From fd 4:
To fd 3
Mind the difference between file descriptors and IO streams such as stderr and stdout.
The redirecting operators are just redirecting IO streams via different file descriptors (IO stream handling mechanisms); they do not do any copying or duplicating of IO streams (that's what tee(1) is for).
See: File Descriptor 101
Another test to show that n<&m and n>&m are interchangeable would be "to use either style of 'n<&-' or 'n>&-' for closing a file descriptor, even if it doesn't match the read/write mode that the file descriptor was opened with" (http://www.gnu.org/s/hello/manual/autoconf/File-Descriptors.html).

Resources