Do 'cat foo.txt | my_cmd' and 'my_cmd < foo.txt' accomplish the same thing?

Do 'cat foo.txt | my_cmd' and 'my_cmd < foo.txt' accomplish the same thing? - linux

This question helped me understand the difference between redirection and piping, but the examples focus on redirecting STDOUT (echo foo > bar.txt) and piping STDIN (ls | grep foo).
It would seem to me that any command that could be written my_command < file.txt could also be written cat file.txt | my_command. In what situations are STDIN redirection necessary?
Apart from the fact that using cat spawns an extra process and is less efficient than redirecting STDIN, are there situations in which you have to use the STDIN redirection? Put another way, is there ever a reason to pipe the output of cat to another command?

What's the difference between my_command < file.txt and cat file.txt | my_command?
my_command < file.txt
The redirection symbol can also be written as 0< as this redirects file descriptor 0 (stdin) to connect to file.txt instead of the current setting, which is probably the terminal. If my_command is a shell built-in then there are NO child processes created, otherwise there is one.
cat file.txt | my_command
This redirects file descriptor 1 (stdout) of the command on the left to the input stream of an anonymous pipe, and file descriptor 0 (stdin) of the command on the right to the output stream of the anonymous pipe.
We see at once that there is a child process, since cat is not a shell built-in. However in bash even if my_command is a shell builtin it is still run in a child process. Therefore we have TWO child processes.
So the pipe, in theory, is less efficient. Whether that difference is significant depends on many factors, including the definition of "significant". The time when a pipe is preferable is this alternative:
command1 > file.txt
command2 < file.txt
Here it is likely that
command1 | command2
is more efficient, remembering that, in practice, we will probably need a third child process in rm file.txt.
However, there are limitations to pipes. They are not seekable (random access, see man 2 lseek) and they cannot be memory mapped (see man 2 mmap). Some applications map files to virtual memory, but it would be unusual to do that to stdin or stdout. Memory mapping in particular is not possible on a pipe (whether anonymous or named) because a range of virtual addresses has to be reserved and for that a size is required.
Edit:
As mentioned by #JohnKugelman, a common error and source of many SO questions is the associated issue with a child process and redirection:
Take a file file.txt with 99 lines:
i=0
cat file.txt|while read
do
(( i = i+1 ))
done
echo "$i"
What gets displayed? The answer is 0. Why? Because the count i = i + 1 is done in a subshell which, in bash, is a child process and does not change i in the parent (note: this does not apply to korn shell, ksh).
while read
do
(( i = i+1 ))
done < file.txt
echo "$i"
This displays the correct count because no child processes are involved.

You can of course replace any use of input redirection with a pipe that reads from cat, but it is inefficient to do so, as you are spawning a new process to do something the shell can already do by itself. However, not every instance of cat ... | my_command can be replaced with my_command < ..., namely when cat is doing its intended job of concatenating two (or more) files, it is perfectly reasonable to pipe its output to another command.
cat file1.txt file2.txt | my_command

Related

Bash command "read" behaviour using redirection operator

If I execute the following command:
> read someVariable _ < <(echo "54 41")
and:
> echo $someVariable
The result is: 54.
What does < < (with spaces) do?
Why is _ giving the first word from the result in the "echo" command?
The commands above are just example.
Thanks a lot

Process Substitution
As tldp.org explains,
Process substitution feeds the output of a process (or processes) into
the stdin of another process.
So in effect this is similar to piping stdout of one command to the other , e.g. echo foobar barfoo | wc . But notice: in the [bash manpage][3] you will see that it is denoted as <(list). So basically you can redirect output of multiple (!) commands.
Note: technically when you say < < you aren't referring to one thing, but two redirection with single < and process redirection of output from <( . . .).
Now what happens if we do just process substitution?
$ echo <(echo bar)
/dev/fd/63
As you can see, the shell creates temporary file descriptor /dev/fd/63 where the output goes. That means < redirects that file descriptor as input into a command.
So very simple example would be to make process substitution of output from two echo commands into wc:
$ wc < <(echo bar;echo foo)
2 2 8
So here we make shell create a file descriptor for all the output that happens in the parenthesis and redirect that as input to wc .As expected, wc receives that stream from two echo commands, which by itself would output two lines, each having a word, and appropriately we have 2 words, 2 lines, and 6 characters plus two newlines counted.
Side Note: Process substitution may be referred to as a bashism (a command or structure usable in advanced shells like bash, but not specified by POSIX), but it was implemented in ksh before bash's existence as ksh man page. Shells like tcsh and mksh however do not have process substitution. So how could we go around redirecting output of multiple commands into another command without process substitution? Grouping plus piping!
$ (echo foo;echo bar) | wc
2 2 8
Effectively this is the same as above example, However, this is different under the hood from process substitution, since we make stdout of the whole subshell and stdin of wc [linked with the pipe][5]. On the other hand, process substitution makes a command read a temporary file descriptor.
So if we can do grouping with piping, why do we need process substitution? Because sometimes we cannot use piping. Consider the example below - comparing outputs of two commands with diff (which needs two files, and in this case we are giving it two file descriptors)
diff <(ls /bin) <(ls /usr/bin)

How to redirect one of several inputs?

In Linux/Unix command line, when using a command with multiple inputs, how can I redirect one of them?
For example, say I'm using cat to concatenate multiple files, but I only want the last few lines of one file, so my inputs are testinput1, testinput2, and tail -n 4 testinput3.
How can I do this in one line without any temporary files?
I tried tail -n 4 testinput3 | cat testinput1 testinput2, but this seems to just take in input 1 and 2.
Sorry for the bad title, I wasn't sure how to phrase it exactly.

Rather than trying to pipe the output of tail to cat, bash provides process substitution where the process substitution is run with its input or output connected to a FIFO or a file in /dev/fd (like your terminal tty). This allows you to treat the output of a process as if it were a file.
In the normal case you will generally redirect the output of the process substitution into a loop, e.g, while read -r line; do ##stuff; done < <(process). However, in your case, cat takes the file itself as an argument rather than reading from stdin, so you omit the initial redirection, e.g.
cat file1 file2 <(tail -n4 file3)
So be familiar with both forms, < <(process) if you need to redirect a process as input or simply <(process) if you need the result of process to be treated as a file.

Redirecting linux cout to a variable and the screen in a script

I am currently trying to make a script file that runs multiple other script files on a server. I would like to display the output of these script to the screen IN ADDITION to passing it into grep so I can do error testing. currently I have written this:
status=$(SOMEPROCESS | grep -i "SOMEPROCESS started completed correctly")
I do further error handling below this using the variable status, so I would like to display SOMEPROCESS's output to the screen for error reference. This is a read only server and I can not save the output to a log file.

You need to use the tee command. It will be slightly fiddly, since tee outputs to a file handle. However you could create a file descriptor using pipe.
Or (simpler) for your use case.
Start the script without grep and pipe it through tee SOMEPROCESS | tee /my/safely/generated/filename. Then use tail -f /my/safely/generated/filename | grep -i "my grep pattern separately.

You can use process substituion together with tee:
SOMEPROCESS | tee >(grep ...)
This will use an anonymous pipe and pass /dev/fd/... as file name to tee (or a named pipe on platforms that don't support /dev/fd/...).
Because SOMEPROCESS is likely to buffer its output when not talking to a terminal, you might see significant lag in screen output.

I'm not sure whether I understood your question exactly.
I think you want to get the output of SOMEPROCESS, test it, print it out when there are errors. If it is, I think the code bellow may help you:
s=$(SOMEPROCESS)
grep -q 'SOMEPROCESS started completed correctly' <<< $s
if [[ $? -ne 0 ]];then
# specified string not found in the output, it means SOMEPROCESS started failed
echo $s
fi
But in this code, it will store the all output in the memory, if the output is big enough, there will be a OOM risk.

Bash while read loop extremely slow compared to cat, why?

A simple test script here:
while read LINE; do
LINECOUNT=$(($LINECOUNT+1))
if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done
When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.
Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.
Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).

The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:
while read size; do test "$size" -gt 0 || break; dd bs="$size" count=1 of=file$(( i++ )); done
in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.

It's because the bash script is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:
awk 'NR%1000==0{print}' inputFile
which matches your "print every 1000 lines" sample.
If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:
awk '{print length($0)" "$0}' inputFile | someOtherProcess
Tools like awk, sed, grep, cut and the more powerful perl are far more suited to these tasks than an interpreted shell script.

The perl solution for count bytes of each string:
perl -p -e '
use Encode;
print length(Encode::encode_utf8($_))."\n";$_=""'
for example:
dd if=/dev/urandom bs=1M count=100 |
perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |
tail
works for me as 7.7Mb/s
to compare how much script used:
dd if=/dev/urandom bs=1M count=100 >/dev/null
run as 9.1Mb/s
seems script not so slow :)

Not really sure what your script is supposed to do. So this might not be an answer to your question but more of a generic tip.
Don't cat your file and pipe it to your script, instead when reading from a file with a bash script do it like this:
while read line
do
echo $line
done <file.txt

How to redirect output to a file and stdout

In bash, calling foo would display any output from that command on the stdout.
Calling foo > output would redirect any output from that command to the file specified (in this case 'output').
Is there a way to redirect output to a file and have it display on stdout?

The command you want is named tee:
foo | tee output.file
For example, if you only care about stdout:
ls -a | tee output.file
If you want to include stderr, do:
program [arguments...] 2>&1 | tee outfile
2>&1 redirects channel 2 (stderr/standard error) into channel 1 (stdout/standard output), such that both is written as stdout. It is also directed to the given output file as of the tee command.
Furthermore, if you want to append to the log file, use tee -a as:
program [arguments...] 2>&1 | tee -a outfile

$ program [arguments...] 2>&1 | tee outfile
2>&1 dumps the stderr and stdout streams.
tee outfile takes the stream it gets and writes it to the screen and to the file "outfile".
This is probably what most people are looking for. The likely situation is some program or script is working hard for a long time and producing a lot of output. The user wants to check it periodically for progress, but also wants the output written to a file.
The problem (especially when mixing stdout and stderr streams) is that there is reliance on the streams being flushed by the program. If, for example, all the writes to stdout are not flushed, but all the writes to stderr are flushed, then they'll end up out of chronological order in the output file and on the screen.
It's also bad if the program only outputs 1 or 2 lines every few minutes to report progress. In such a case, if the output was not flushed by the program, the user wouldn't even see any output on the screen for hours, because none of it would get pushed through the pipe for hours.
Update: The program unbuffer, part of the expect package, will solve the buffering problem. This will cause stdout and stderr to write to the screen and file immediately and keep them in sync when being combined and redirected to tee. E.g.:
$ unbuffer program [arguments...] 2>&1 | tee outfile

Another way that works for me is,
<command> |& tee <outputFile>
as shown in gnu bash manual
Example:
ls |& tee files.txt
If ‘|&’ is used, command1’s standard error, in addition to its standard output, is connected to command2’s standard input through the pipe; it is shorthand for 2>&1 |. This implicit redirection of the standard error to the standard output is performed after any redirections specified by the command.
For more information, refer redirection

You can primarily use Zoredache solution, but If you don't want to overwrite the output file you should write tee with -a option as follow :
ls -lR / | tee -a output.file

Something to add ...
The package unbuffer has support issues with some packages under fedora and redhat unix releases.
Setting aside the troubles
Following worked for me
bash myscript.sh 2>&1 | tee output.log
Thank you ScDF & matthew your inputs saved me lot of time..

Using tail -f output should work.

In my case I had the Java process with output logs. The simplest solution to display output logs and redirect them into the file(named logfile here) was:
my_java_process_run_script.sh |& tee logfile
Result was Java process running with output logs displaying and
putting them into the file with name logfile

You can do that for your entire script by using something like that at the beginning of your script :
#!/usr/bin/env bash
test x$1 = x$'\x00' && shift || { set -o pipefail ; ( exec 2>&1 ; $0 $'\x00' "$#" ) | tee mylogfile ; exit $? ; }
# do whaetever you want
This redirect both stderr and stdout outputs to the file called mylogfile and let everything goes to stdout at the same time.
It is used some stupid tricks :
use exec without command to setup redirections,
use tee to duplicates outputs,
restart the script with the wanted redirections,
use a special first parameter (a simple NUL character specified by the $'string' special bash notation) to specify that the script is restarted (no equivalent parameter may be used by your original work),
try to preserve the original exit status when restarting the script using the pipefail option.
Ugly but useful for me in certain situations.

Bonus answer since this use-case brought me here:
In the case where you need to do this as some other user
echo "some output" | sudo -u some_user tee /some/path/some_file
Note that the echo will happen as you and the file write will happen as "some_user" what will NOT work is if you were to run the echo as "some_user" and redirect the output with >> "some_file" because the file redirect will happen as you.
Hint: tee also supports append with the -a flag, if you need to replace a line in a file as another user you could execute sed as the desired user.

< command > |& tee filename # this will create a file "filename" with command status as a content, If a file already exists it will remove existed content and writes the command status.
< command > | tee >> filename # this will append status to the file but it doesn't print the command status on standard_output (screen).
I want to print something by using "echo" on screen and append that echoed data to a file
echo "hi there, Have to print this on screen and append to a file"

tee is perfect for this, but this will also do the job
ls -lr / > output | cat output

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Do 'cat foo.txt | my_cmd' and 'my_cmd < foo.txt' accomplish the same thing? - linux

Related

Bash command "read" behaviour using redirection operator

How to redirect one of several inputs?

Redirecting linux cout to a variable and the screen in a script

Bash while read loop extremely slow compared to cat, why?

How to redirect output to a file and stdout

Categories

Resources