Why am I getting "cat: write error: Broken pipe" rarely and not always - linux

I am running some scripts with commands having cat pipelined with grep like:
cat file.txt | grep "pattern"
Most of the times there are no problems. But sometimes I get below error:
cat: write error: Broken pipe
So how do I find out when the command is causing this problem and why?

The reason is because the pipe is closed by grep when it still has some data to be read from cat. The signal SIGPIPE is caught by cat and it exits.
What usually happens in a pipeline is the shell runs cat in one process and grep in another. The stdout of cat is connected to the write-end of the pipe and stdin of grep to the read end. What happened was grep hit a pattern search that did not exist and exited immediately causing the read end of the pipe to be closed, which cat does not like since it has some more data to be write out to the pipe. Since the write actions happens to an other which has been closed other end, SIGPIPE is caught by the cat on which it immediately exits.
For such a trivial case, you could remove the pipeline usage altogether and run it as grep "pattern" file.txt when the file's contents are made available over the stdin of grep on which it could read from.

You can use only grep without pipe like this :
grep "pattern" file.txt
I think it's better to resolve this problem

Related

"cat a | cat b" ignoring contents of a

The formal definition of pipe states that the STDOUT of the left file will be immediately piped to the STDIN of the right file.I have two files, hello.txt and human.txt. cat hello.txt returns Hello and cat human.txt returns I am human.Now if I do cat hello.txt | cat human.txt, shouldn't that return Hello I am human?Instead I'm seeing command not found.I am new to shell scripting.Can someone explain?
Background: A pipe arranges for the output of the command on the left (that is, contents written to FD 1, stdout) to be delivered as input to the command on the right (on FD 0, stdin). It does this by connecting the processes with a "pipe", or FIFO, and executing them at the same time; attempts to read from the FIFO will wait until the other process has written something, and attempts to write to the FIFO will wait until the other process is ready to read.
cat hello.txt | cat human.txt
...feeds the content of hello.txt into the stdin of cat human.txt, but cat human.txt isn't reading from its stdin; instead, it's been directed by its command line arguments to read only from human.txt.
Thus, that content on the stdin of cat human.txt is ignored and never read, and cat hello.txt receives a SIGPIPE when cat human.txt exits, and thereafter exits as well.
cat hello.txt | cat - human.txt
...by contrast will have the second cat read first from stdin (you could also use /dev/stdin in place of - on many operating systems, including Linux), then from a file.
You don't need to pipe them rather you can read from multiple file like below which will in-turn concatenate the content of both file content
cat hello.txt human.txt
| generally used when you want to fed output of first command to the second command in pipe. In this case specifically your second command is reading from a file and thus don't need to be piped. If you want to you can do like
echo "Hello" | cat - human.txt
First thing the command will not give an error it will print I m human i.e the contents of human.txt
Yeah you are right about pipe definition , but on the right side of pipe there should be some command.
If the command is for receiving the input and providing the output than it will give you output,otherwise the command will do its own behaviour
But here there is a command i.e cat human.txt on the right side but it will print its own contents and does no operation on the received input .
And also this error comes when when you write like
cat hello.txt | human.txt
bash will give you this error :
human.txt: command not found

Are linux shell pipes pipelined?

Given a file input.txt if I do something like
grep pattern1 input.txt | grep pattern2 | wc -l
is the output from the first command continuously passed (as soon as it is generated) as input to the second command?
Or does the pipe wait until the first command finishes to start running the second command?
Yes, they're pipelined -- each component's stdout is connected to the stdin of the next via a FIFO, and all components are started in parallel.
This is why
cat some-file | ...tools... >some-file
...typically results in a truncated file: Because the pipeline is started all at once, the last piece (truncating some-file for write) happens before cat has finished (or often, even started) reading the file from input.
The general answer to your question is "yes".
HOWEVER, some programs, such as grep itself, buffer their results up to some arbitrary point. They may include options to disable this buffering, but you should not rely on them being available.
Piped commands run concurrently.
This is very commonly used to allow the second program to process data as it comes out from the first program, before the first program has completed its operation. For example
grep pattern huge-file | tr a-z A-Z
begins to display the matching lines in uppercase even before grep has finished traversing the large file.
Similarly
grep pattern huge-file | head -n 1
displays the first matching line, and may stop processing well before grep has finished reading its input file.
These two examples which I can think of explains that they run concurrently.

Redirecting linux cout to a variable and the screen in a script

I am currently trying to make a script file that runs multiple other script files on a server. I would like to display the output of these script to the screen IN ADDITION to passing it into grep so I can do error testing. currently I have written this:
status=$(SOMEPROCESS | grep -i "SOMEPROCESS started completed correctly")
I do further error handling below this using the variable status, so I would like to display SOMEPROCESS's output to the screen for error reference. This is a read only server and I can not save the output to a log file.
You need to use the tee command. It will be slightly fiddly, since tee outputs to a file handle. However you could create a file descriptor using pipe.
Or (simpler) for your use case.
Start the script without grep and pipe it through tee SOMEPROCESS | tee /my/safely/generated/filename. Then use tail -f /my/safely/generated/filename | grep -i "my grep pattern separately.
You can use process substituion together with tee:
SOMEPROCESS | tee >(grep ...)
This will use an anonymous pipe and pass /dev/fd/... as file name to tee (or a named pipe on platforms that don't support /dev/fd/...).
Because SOMEPROCESS is likely to buffer its output when not talking to a terminal, you might see significant lag in screen output.
I'm not sure whether I understood your question exactly.
I think you want to get the output of SOMEPROCESS, test it, print it out when there are errors. If it is, I think the code bellow may help you:
s=$(SOMEPROCESS)
grep -q 'SOMEPROCESS started completed correctly' <<< $s
if [[ $? -ne 0 ]];then
# specified string not found in the output, it means SOMEPROCESS started failed
echo $s
fi
But in this code, it will store the all output in the memory, if the output is big enough, there will be a OOM risk.

How to get “instant" output of “tail -f” as input?

I want to monitor a log file, when a new log message match my defined pattern (say contain “error”), then send out an email to me.
To do that, I wrote a python script monitor.py, the main part looks like:
import sys
for line in sys.stdin:
if "error" in line:
print line
It works well when I use tail my.log | python monitor.py, then I switch to tail -f my.log | python monitor.py, then it doesn’t work, at least not immediately.
I have done some tests, when the new content to the log accumulate up to 8KB, then my python script can get output from tail. So I highly suspect that this is controlled by the stdin/stdout buffer size. How can I get the output immediately?
One more question, when I use tail -f my.log and tail -f my.log | grep error, why it could show me the output immediately?
Most Linux programs will use line buffering if stdout is connecting to a TTY and full buffering otherwise. You can use stdbuf to force line buffering.
stdbuf -oL tail -f my.log | python monitor.py
There's a patch to add unbuffered output to tail, dating from 2008. which appears to have been rejected and my own (BSD) manpage does not indicate it. Perhaps you could download coreutils, apply the patch, compile tail yourself and it may still work?

Special Piping/redirection in bash

(Firstly I've been looking for an hour so I'm pretty sure this isn't a repeat)
I need to write a script that executes 1 command, 1 time, and then does the following:
Saves both the stdout and stderr to a file (while maintaining their proper order)
Saves stderr only to a variable.
Elaboration on point 1, if I have a file like so
echo "one"
thisisanerrror
echo "two"
thisisanotherError
I should expect to see output, followed by error, followed by output, followed by more error (thus concatenating is not sufficient).
The closest I've come is the following, which seems to corrupt the log file:
errs=`((./someCommand.sh 2>&1 1>&3) | tee /dev/stderr ) 3>file.log 2>&3 `
This might be a starting point:
How do I write stderr to a file while using "tee" with a pipe?
Edit:
This seems to work:
((./foo.sh) 2> >(tee >(cat) >&2)) > foo.log
Split stderr with tee, write one copy to stdout (cat) and write the other to stderr. Afterwards you can grab all the stdout and write it to a file.
Edit: to store the output in a variable
varx=`((./foo.sh) 2> >(tee >(cat) >&2))`
I also saw the command enclosed in additional double quotes, but i have no clue what that might be good for.

Resources