Are linux shell pipes pipelined? - linux

Given a file input.txt if I do something like
grep pattern1 input.txt | grep pattern2 | wc -l
is the output from the first command continuously passed (as soon as it is generated) as input to the second command?
Or does the pipe wait until the first command finishes to start running the second command?

Yes, they're pipelined -- each component's stdout is connected to the stdin of the next via a FIFO, and all components are started in parallel.
This is why
cat some-file | ...tools... >some-file
...typically results in a truncated file: Because the pipeline is started all at once, the last piece (truncating some-file for write) happens before cat has finished (or often, even started) reading the file from input.

The general answer to your question is "yes".
HOWEVER, some programs, such as grep itself, buffer their results up to some arbitrary point. They may include options to disable this buffering, but you should not rely on them being available.

Piped commands run concurrently.
This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example
grep pattern huge-file | tr a-z A-Z
begins to display the matching lines in uppercase even before grep has finished traversing the large file.
Similarly
grep pattern huge-file | head -n 1
displays the first matching line, and may stop processing well before grep has finished reading its input file.
These two examples which I can think of explains that they run concurrently.

Related

Bash command "read" behaviour using redirection operator

If I execute the following command:
> read someVariable _ < <(echo "54 41")
and:
> echo $someVariable
The result is: 54.
What does < < (with spaces) do?
Why is _ giving the first word from the result in the "echo" command?
The commands above are just example.
Thanks a lot
Process Substitution
As tldp.org explains,
Process substitution feeds the output of a process (or processes) into
the stdin of another process.
So in effect this is similar to piping stdout of one command to the other , e.g. echo foobar barfoo | wc . But notice: in the [bash manpage][3] you will see that it is denoted as <(list). So basically you can redirect output of multiple (!) commands.
Note: technically when you say < < you aren't referring to one thing, but two redirection with single < and process redirection of output from <( . . .).
Now what happens if we do just process substitution?
$ echo <(echo bar)
/dev/fd/63
As you can see, the shell creates temporary file descriptor /dev/fd/63 where the output goes. That means < redirects that file descriptor as input into a command.
So very simple example would be to make process substitution of output from two echo commands into wc:
$ wc < <(echo bar;echo foo)
2 2 8
So here we make shell create a file descriptor for all the output that happens in the parenthesis and redirect that as input to wc .As expected, wc receives that stream from two echo commands, which by itself would output two lines, each having a word, and appropriately we have 2 words, 2 lines, and 6 characters plus two newlines counted.
Side Note: Process substitution may be referred to as a bashism (a command or structure usable in advanced shells like bash, but not specified by POSIX), but it was implemented in ksh before bash's existence as ksh man page. Shells like tcsh and mksh however do not have process substitution. So how could we go around redirecting output of multiple commands into another command without process substitution? Grouping plus piping!
$ (echo foo;echo bar) | wc
2 2 8
Effectively this is the same as above example, However, this is different under the hood from process substitution, since we make stdout of the whole subshell and stdin of wc [linked with the pipe][5]. On the other hand, process substitution makes a command read a temporary file descriptor.
So if we can do grouping with piping, why do we need process substitution? Because sometimes we cannot use piping. Consider the example below - comparing outputs of two commands with diff (which needs two files, and in this case we are giving it two file descriptors)
diff <(ls /bin) <(ls /usr/bin)

How to redirect one of several inputs?

In Linux/Unix command line, when using a command with multiple inputs, how can I redirect one of them?
For example, say I'm using cat to concatenate multiple files, but I only want the last few lines of one file, so my inputs are testinput1, testinput2, and tail -n 4 testinput3.
How can I do this in one line without any temporary files?
I tried tail -n 4 testinput3 | cat testinput1 testinput2, but this seems to just take in input 1 and 2.
Sorry for the bad title, I wasn't sure how to phrase it exactly.
Rather than trying to pipe the output of tail to cat, bash provides process substitution where the process substitution is run with its input or output connected to a FIFO or a file in /dev/fd (like your terminal tty). This allows you to treat the output of a process as if it were a file.
In the normal case you will generally redirect the output of the process substitution into a loop, e.g, while read -r line; do ##stuff; done < <(process). However, in your case, cat takes the file itself as an argument rather than reading from stdin, so you omit the initial redirection, e.g.
cat file1 file2 <(tail -n4 file3)
So be familiar with both forms, < <(process) if you need to redirect a process as input or simply <(process) if you need the result of process to be treated as a file.

Continuous grep, output at same spot on console

I use
tail -f file | grep pattern
all the time for continuous grep.
However, is there a way I can make grep output its pattern at the same spot, say at the top of the screen? so that the screen doesn't scroll all the time?
My case is something like this: tail -f log_file | grep Status -A 2 will show the current status and what changed it to that status. The problem is the screen scrolls and it becomes annoying. I'd rather have the output stuck on the first 3 lines in the screen.
Thank you!
you could use the watch command; which will always execute the same command, but the position on the screen will stay the same. The process might eat some more CPU or memory though:
watch "tail file | grep pattern"
by default watch executes that command every 2 seconds. You can adjust up to 0.1 seconds using:
watch -n 0.1
NOTE
As noted by #etanReisner: this is not exactly the same as tail -f: tail -f will change immediately if something is added to your logfile, the watch command will only notice that when it executes, ie every 2 (or 0.1 seconds).
Assuming you are using a vt100 compatible emulator...
This command will tail a file, pipe it into grep, read it a line at a time and then display it in reverse on the top line of the screen:
TOSL=$(tput sc;tput cup 0 0;tput rev;tput el)
FROMSL=$(tput sgr0; tput rc)
tail -f file | grep --line-buffered pattern | while read line
do
echo -n "$TOSL${line}$FROMSL"
done
It assumes your output appears a line at a time. If you want more than one line, you can read more than a line, but you have to decide how you want to buffer the output. You could also use the csr terminfo command to set up an entire separate scrolling region instead of just having one line.
Here is the scrolling region version with a ten line status area at the top:
TOSL=$(tput sc; tput csr 0 10; tput cup 10 0;tput rev;tput el)
FROMSL=$(tput sgr0; tput rc;tput csr 10 50;tput rc)
tail -f file | grep --line-buffered pattern | while read line
do
echo -n "$TOSL${line}
$FROMSL"
done
Note that it is not impossible that your display will be corrupted from time-to-time as it could be that the output from your main shell and your background task get mixed up.
Simply replace the newlines with carriage returns.
tail -f file | grep --line-buffered whatever | tr '\012' '\015'
The line buffering is to avoid jumpy output; see http://mywiki.wooledge.org/BashFAQ/009
This is quick and dirty. As noted in comments, this will leave the previous contents of the line underneath, so a shorter line will not completely overlay a longer line. You could add some control codes to address that, but then you might as well use Curses for the formatting too, like in rghome's answer.

How to take advantage of filters

I've read here that
To make a pipe, put a vertical bar (|) on the command line between two commands.
then
When a program takes its input from another program, performs some operation on that input, and writes the result to the standard output, it is referred to as a filter.
So I've first tried the ls command whose output is:
Desktop HelloWord.java Templates glassfish-4.0
Documents Music Videos hs_err_pid26742.log
Downloads NetBeansProjects apache-tomcat-8.0.3 mozilla.pdf
HelloWord Pictures examples.desktop netbeans-8.0
Then ls | echo which outputs absolutely nothing.
I'm looking for a way to take advantages of pipelines and filters in my bash script. Please help.
echo doesn't read from standard input. It only writes its command-line arguments to standard output. The cat command is what you want, which takes what it reads from standard input to standard output.
ls | cat
(Note that the pipeline above is a little pointless, but does demonstrate the idea of a pipe. The command on the right-hand side must read from standard input.)
Don't confuse command-line arguments with standard input.
echo doesn't read standard input. To try something more useful, try
ls | sort -r
to get the output sorted in reverse,
or
ls | grep '[0-9]'
to only keep the lines containing digits.
In addition to what others have said - if your command (echo in this example) does not read from standard input you can use xargs to "feed" this command from standard input, so
ls | echo
doesn't work, but
ls | xargs echo
works fine.

How do I use the filenames output by "grep" as argument to another program

I have this grep command which outputs the names of files (which contains matches to some pattern), and I want to parse those files with some file-parsing program. The pipechain looks like this:
grep -rl "{some-pattern}" . | {some-file-parsing-program} > a.out
How do I get those file names as command line arguments to the file-parsing program?
For example, let's say grep returns the filenames a, b, c. How do I pass the filenames so that it's as if I'm executing
{some-file-parsing-program} a b c > a.out
?
It looks to me as though you're wanting xargs:
grep -rl "{some_pattern" . | xargs your-command > a.out
I'm not convinced a.out is a good output file name, but we can let that slide. The xargs command reads white-space separated file names from standard input and then invokes your-command with those names as arguments. It may need to invoke your-command several times; unless you're using GNU xargs and you specify -r, your-command will be invoked at least once, even if there are no matching file names.
Without using xargs, you could not use sed for this job. Without using xargs, using awk would be clumsy. Perl (and Python) could manage it 'trivially'; it would be easy to write the code to read file names from standard input and then process each file in turn.
I don't know of any linux programs that cannot read from stdin. Depending on the program, the default input may be stdin or you may need to specify to use stdin by using a command line option (often - by itself). Do you have anything particular in mind?

Resources