Trouble understanding bash piping behaviour

Trouble understanding bash piping behaviour - linux

I'm a bit confused on how does bash performs pipe redirections.
First on piping behaviour:
cat /dev/random | ls doesn't waits cat and ends as soon as ls result is printed, however
cat /dev/random | grep foo waits for cat before executing grep.
It make sense because ls doesn't need cat result to work as grep do, but I don't understand how it can work, does bash waits processes with waitpid calls on some processes? Does it waits for EOF on write end of the pipe's right side, and on read end on the pipe's left side?
More, I'm not sure about which commands are forked or not and where it's done:
I guess built-in commands are exectuted in the main process as most of them are used to modify some shell settings. On the other hand (I guess), binary are always executed in a subprocess, with fork, am I right?
If I am, it means piping redierction doesn't call fork itself as it don't know if the commands to execute are built-ins or not.
Bash source code is way too far from my skills, I don't understand it, can someone explain me how it behaves?
I tried to reimplement its behaviour (without success), I asked here, with some code if you want to check.

Related

Redirect output from subshell and processing through pipe at the same time

tl;dr: need a way to process (with grep) an output inside subshell AND redirect all original output to the main stdout/stderr at the same time. I am looking for shell-independent (!) way.
In detail
There is a proprietary binary which I want to grep for some value
The proprietary binary from time to time might be interactive to ask for a password (depends on the internal logic)
I want to grep the output of the binary AND want being able to enter the password it that is required to proceed further
So the script which is supposed to achieve my task might look like:
#!/bin/sh
user_id=... # some calculated value
list_actions_cmd="./proprietary-binary --actions ${user_id}"
action_item=$(${list_actions_cmd} | grep '^Main:')
Here proprietary-binary might ask for a password through stdin. Since subshell inside $() catches the all output, an end-user won't understand that the list_actions_cmd waits for input. What I want is either to show all output of list_action_cmd AND grepping at the same time or at least caught the keyword that now user will be asked for a password and let him know about that.
Currently what I figured out is to tee the output and grep there:
#!/bin/sh
user_id=... # some calculated value
list_actions_cmd="./proprietary-binary --actions ${user_id}"
$list_actions_cmd 2>&1 | tee /tmp/.proprietary-binary.log
action_item=$(grep "^Main" /tmp/.proprietary-binary.log)
But I wonder is there any elegant shell-independent (not limited to bash which is quite powerful) solution without any intermediate temporary file? Thanks.

What about duplicating output to stderr if executed in a terminal:
item=$(your_command | tee /dev/stderr | grep 'regexp')

crash-stopping bash pipeline [duplicate]

This question already has answers here:
How do you catch error codes in a shell pipe?
(5 answers)
Closed 7 years ago.
I have a pipeline, say a|b where if a runs into a problem, I want to stop the whole pipeline.
'a exiting with exit=1 doesn't do this as often 'b doesn't care about return codes.
e.g.
echo 1|grep 0|echo $? <-- this shows that grep did exit=1
but
echo 1|grep 0 | wc <--- wc is unfazed by grep's exit here
If I ran the pipeline as a subprocess of an owning process, any of the pipeline processes could kill the owning process. But this seems a bit clumsy -- but it would zap the whole pipeline.

Not possible with basic shell constructs, probably not possible in shell at all.
Your first example doesn't do what you think. echo doesn't use standard input, so putting it on the right side of a pipe is never a good idea. The $? that you're echoing is not the exit value of the grep 0. All commands in a pipeline run simultaneously. echo has already been started, with the existing value of $?, before the other commands in the pipeline have finished. It echoes the exit value of whatever you did before the pipeline.
# The first command is to set things up so that $? is 2 when the
# second command is parsed.
$ sh -c 'exit 2'
$ echo 1|grep 0|echo $?
2
Your second example is a little more interesting. It's correct to say that wc is unfazed by grep's exit status. All commands in the pipeline are children of the shell, so their exit statuses are reported to the shell. The wc process doesn't know anything about the grep process. The only communication between them is the data stream written to the pipe by grep and read from the pipe by wc.
There are ways to find all the exit statuses after the fact (the linked question in the comment by shx2 has examples) but a basic rule that you can't avoid is that the shell will always wait for all the commands to finish.
Early exits in a pipeline sometimes do have a cascade effect. If a command on the right side of a pipe exits without reading all the data from the pipe, the command on the left of that pipe will get a SIGPIPE signal the next time it tries to write, which by default terminates the process. (The 2 phrases to pay close attention to there are "the next time it tries to write" and "by default". If a the writing process spends a long time doing other things between writes to the pipe, it won't die immediately. If it handles the SIGPIPE, it won't die at all.)
In the other direction, when a command on the left side of a pipe exits, the command on the right side of that pipe gets EOF, which does cause the exit to happen fairly soon when it's a simple command like wc that doesn't do much processing after reading its input.
With direct use of pipe(), fork(), and wait3(), it would be possible to construct a pipeline, notice when one child exits badly, and kill the rest of them immediately. This requires a language more sophisticated than the shell.
I tried to come up with a way to do it in shell with a series of named pipes, but I don't see it. You can run all the processes as separate jobs and get their PIDs with $!, but the wait builtin isn't flexible enough to say "wait for any child in this set to exit, and tell me which one it was and what the exit status was".
If you're willing to mess with ps and/or /proc you can find out which processes have exited (they'll be zombies), but you can't distinguish successful exit from any other kind.

Write
set -e
set -o pipefail
at the beginning of your file.
-e will exit on an error and -o pipefail will produce an errorcode on each stage of you "pipeline"

How to launch a process for reading and writing in bash?

Background: I have to revive my old program, which unfortunately fails when it comes to communication with subprocess. The program is written in C++ and creates subprocess for writing with opened pipe for reading. Nothing crashes, but there is no data to read.
My idea is to recreate entire scenario in bash, so I could interactively check what is going on.
Things I used in C++:
mkfifo for creating pipe, there is a bash equivalent
popen for creating subprocess (in my case for writing)
espeak -x -q -z 1> /dev/null 2> /tmp/my-pipe
open and read -- for opening the pipe and then reading, I hope simple cat will suffice
fwrite -- for writing to subprocess, will just redirection work?
So I hope open, read and fwrite will be straightforward, but how do I launch a program as a process (what is popen in bash)?

bash naturally makes piping between processes very easy, so commands to create and open pipes are not normally needed
program1 | program2
This is the equivalent of program1 running popen("program2","w");
It could also be achieved by program2 running popen("program1","r");
If you explicitly want to use a named pipe:
mkfifo /tmp/mypipe
program1 >/tmp/mypipe &
program2 </tmp/mypipe
rm /tmp/mypipe

A thought that might solve your original problem (and is a consideration for using pipes in shell):
Using stdio commands such as popen, fwrite, etc involve buffering. If a program on the write end of the pipe only writes a small amount of data to the pipe, the program on the reading end won't see any of it until a full block of data has been written to the pipe, after which, the block of data will be pushed along the pipe. If you wish to have the data get there sooner, you need either call fflush() on the writing end, or fclose() if you are not planning on sending any more data. Note that with bash, I don't believe there is any equivalent of fflush.

You simply run the process in the background.
espeak -x -q -z >/dev/null 2>/tmp/mypipe &

Bash Command Substitution Giving Weird Inconsistent Output

For some reasons not relevant to this question, I am running a Java server in a bash script not directly but via command substitution under a separate sub-shell, and in the background. The intent is for the subcommand to return the process id of the Java server as its standard output. The fragement in question is as follows:
launch_daemon()
{
/bin/bash <<EOF
$JAVA_HOME/bin/java $JAVA_OPTS -jar $JAR_FILE daemon $PWD/config/cl.yml <&- &
pid=\$!
echo \${pid} > $PID_FILE
echo \${pid}
EOF
}
daemon_pid=$(launch_daemon)
echo ${daemon_pid} > check.out
The Java daemon in question prints to standard error and quits if there is a problem in initialization, otherwise it closes standard out and standard err and continues on its way. Later in the script (not shown) I do a check to make sure the server process is running. Now on to the problem.
Whenever I check the $PID_FILE above, it contains the correct process id on one line.
But when I check the file check.out, it sometimes contains the correct id, other times it contains the process id repeated twice on the same line separated by a space charcater as in:
34056 34056
I am using the variable $daemon_pid in the script above later on in the script to check if the server is running, so if it contains the pid repeated twice this totally throws off the test and it incorrectly thinks the server is not running. Fiddling with the script on my server box running CentOS Linux by putting in more echo statements etc. seems to flip the behavior back to the correct one of $daemon_pid containing the process id just once, but if I think that has fixed it and check in this script to my source code repo and do a build and deploy again, I start seeing the same bad behavior.
For now I have fixed this by assuming that $daemon_pid could be bad and passing it through awk as follows:
mypid=$(echo ${daemon_pid} | awk '{ gsub(" +.*",""); print $0 }')
Then $mypid always contains the correct process id and things are fine, but needless to say I'd like to understand why it behaves the way it does. And before you ask, I have looked and looked but the Java server in question does NOT print its process id to its standard out before closing standard out.
Would really appreciate expert input.

Following the hint by #WilliamPursell, I tracked this down in the bash source code. I honestly don't know whether it is a bug or not; all I can say is that it seems like an unfortunate interaction with a questionable use case.
TL;DR: You can fix the problem by removing <&- from the script.
Closing stdin is at best questionable, not just for the reason mentioned by #JonathanLeffler ("Programs are entitled to have a standard input that's open.") but more importantly because stdin is being used by the bash process itself and closing it in the background causes a race condition.
In order to see what's going on, consider the following rather odd script, which might be called Duff's Bash Device, except that I'm not sure that even Duff would approve: (also, as presented, it's not that useful. But someone somewhere has used it in some hack. Or, if not, they will now that they see it.)
/bin/bash <<EOF
if (($1<8)); then head -n-$1 > /dev/null; fi
echo eight
echo seven
echo six
echo five
echo four
echo three
echo two
echo one
EOF
For this to work, bash and head both have to be prepared to share stdin, including sharing the file position. That means that bash needs to make sure that it flushes its read buffer (or not buffer), and head needs to make sure that it seeks back to the end of the part of the input which it uses.
(The hack only works because bash handles here-documents by copying them into a temporary file. If it used a pipe, it wouldn't be possible for head to seek backwards.)
Now, what would have happened if head had run in the background? The answer is, "just about anything is possible", because bash and head are racing to read from the same file descriptor. Running head in the background would be a really bad idea, even worse than the original hack which is at least predictable.
Now, let's go back to the actual program at hand, simplified to its essentials:
/bin/bash <<EOF
cmd <&- &
echo \$!
EOF
Line 2 of this program (cmd <&- &) forks off a separate process (to run in the background). In that process, it closes stdin and then invokes cmd.
Meanwhile, the foreground process continues reading commands from stdin (its stdin fd hasn't been closed, so that's fine), which causes it to execute the echo command.
Now here's the rub: bash knows that it needs to share stdin, so it can't just close stdin. It needs to make sure that stdin's file position is pointing to the right place, even though it may have actually read ahead a buffer's worth of input. So just before it closes stdin, it seeks backwards to the end of the current command line. [1]
If that seek happens before the foreground bash executes echo, then there is no problem. And if it happens after the foreground bash is done with the here-document, also no problem. But what if it happens while the echo is working? In that case, after the echo is done, bash will reread the echo command because stdin has been rewound, and the echo will be executed again.
And that's precisely what is happening in the OP. Sometimes, the background seek completes at just the wrong time, and causes echo \${pid} to be executed twice. In fact, it also causes echo \${pid} > $PID_FILE to execute twice, but that line is idempotent; had it been echo \${pid} >> $PID_FILE, the double execution would have been visible.
So the solution is simple: remove <&- from the server start-up line, and optionally replace it with </dev/null if you want to make sure the server can't read from stdin.
Notes:
Note 1: For those more familiar with bash source code and its expected behaviour than I am, I believe that the seek and close takes place at the end of case r_close_this: in function do_redirection_internal in redir.c, at approximately line 1093:
check_bash_input (redirector);
close_buffered_fd (redirector);
The first call does the lseek and the second one does the close. I saw the behaviour using strace -f and then searched the code for a plausible looking lseek, but I didn't go to the trouble of verifying in a debugger.

Use of tee command promptly even for one command

I am new to using tee command.
I am trying to run one of my program which takes long time to finish but it prints out information as it progresses. I am using 'tee' to save the output to a file as well as to see the output in the shell (bash).
But the problem is tee doesn't forward the output to shell until the end of my command.
Is there any way to do that ?
I am using Debian and bash.

This actually depends on the amount of output and the implementation of whatever command you are running. No program is obliged to print stuff straight to stdout or stderr and flush it all the time. So even though most C runtime implementation flush after a certain amount of data was written using one of the runtime routines, such as printf, this may not be true depending on the implementation.
It tee doesn't output it right away, it is likely only receiving the input at the very end of the run of your command. It might be helpful to mention which exact command it is.

The problem you are experienced is most probably related to buffering.
You may have a look at stdbuf command, which does the following:
stdbuf - Run COMMAND, with modified buffering operations for its standard streams.

If you were to post your usage I could give a better answer, but as it is
(for i in `seq 10`; do echo $i; sleep 1s; done) | tee ./tmp
Is proper usage of the tee command and seems to work. Replace the part before the pipe with your command and you should be good to go.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string