linux bash and grep bug? - linux

Doing the following:
First console
touch /tmp/test
Second console
tail -f /tmp/test |grep propo |grep -v miles
Third console
echo propo >> /tmp/test
Second console must show "propo" but it doesn't shows anything, if you run in second console instead:
tail -f /tmp/test |grep propo
And do echo propo >> /tmp/test it will show propo, but the grep -v is for miles not for propo
Why?
Test into your own environment if you want, it's pretty obvious but not working.

Why?
Most probably because the output of a command when piped to another command is fully buffered, not line buffered. The output could be buffered in the first pipe or by grep.
Use stdbuf -oL to force line buffering and grep --line-buffered for line buffered grep.

the problem is that grep does not use line buffering by default; so the output will be buffered. You could use grep --line-buffered:
tail -f /tmp/test | grep --line-buffered propo | grep -v miles

Related

Filter followed tail to file using grep and redirect

I want to get the output of
tail -f /var/log/apache2/error.log | grep "trace1"
into a file. But
tail -f /var/log/apache2/error.log | grep "trace1" > output.txt
does not work, while the first command gives an output in my terminal window as expected.
I guess it has to do with the follow-parameter, because if I omit the "-f", the output file is created.
But why is this so and how can I achieve my goal?
Regards,
Axel
Can you please try:
tail -f /var/log/apache2/error.log | grep "trace1" | tee -a output.txt

Strange grep behaviour in scripts

In one of my tools is needed the PID of specyfic process in system. I try do this by following command:
parasit#host:~/# ps -ef | grep beam.smp |grep -v grep |awk '{ print $2 }' |head -n1
11982
Works fine, but when i try use the same command in script in the vast majority of cases got PID of grep instead of target process (beam.smp in this case) despite of 'grep -v grep`.
parasit#host:~/# cat getPid.sh
#!/bin/bash
PROC=$1
#GET PID
CMD="ps -ef | grep $PROC |grep -v grep |awk '{ print \$2 }' |head -n1"
P=`eval $CMD`
parasit#host:~/# bash -x ./getPid.sh beam.smp
+ PROC=beam.smp
+ CMD='ps -ef |grep beam.smp |grep -v grep |awk '\''{ print $2 }'\'' |head -n1'
++ eval ps -ef '|grep' beam.smp '|grep' -v grep '|awk' ''\''{' print '$2' '}'\''' '|head' -n1
+++ head -n1
+++ awk '{ print $2 }'
+++ grep -v grep
+++ grep beam.smp
+++ ps -ef
+ P=2189
Interestingly, it is not deterministic, I know it sounds strange, but sometimes it works OK, and sometimes no, I have no idea what it depends on.
How it is possibile? Is there any better method to get rid of "grep" from results?
BR
Parasit
pidof -s is made for that (-s: single ID is returned):
pidof -s "beam.smp"
However, pidof also returns defunct (zombie, dead) processes. So here's a way to get PID of the first alive-and-running process of a specified command:
# function in bash
function _get_first_pid() {
ps -o pid=,comm= -C "$1" | \
sed -n '/'"$1"' *$/{s:^ *\([0-9]*\).*$:\1:;p;q}'
}
# example
_get_first_pid "beam.smp"
-o pid=,comm=: list only PID and COMMAND columns; ie. only list what we need to check; if all are listed then it is more difficult to process later on
-C "$1": of the command specified in -C; ie. only find the process of that specific command, not everything
sed: print only PID for first line that do not have "defunct" or anything after the base command name

linux strace: How to filter system calls that take more than a second

I'm using "strace -f -T -tt -o foo.txt -p 1234" to print the time spent in system calls. This makes the output file huge, is there a way to just print the system calls that took greater than 1second. I can grep it out from the file later, but is there a better way?
If we simply omit the -o foo.txt argument, the output goes to standard output. We can pipe it through grep and redirect to the file:
strace -f -T -tt -p 1234 | grep pattern > foo.txt
To watch the output at the same time:
strace -f -T -tt -p 1234 | grep pattern | tee foo.txt
If a command prints only to a file that is passed as an argument, and we want to filter/redirect its output, the first step is to check whether it implements the dash convention: can you specify standard input or output using - as a filename argument:
some_command - | our_pipe > file.txt
If not, then the recourse is to use Bash process substitution substitution syntax: >(output command) and <(input command):
some_command >(our_pipe > file.txt)
The process substitution syntax expands into a token that is suitable as a filename argument for a command or function. When the program opens that token, it gets a file descriptor to the command's input or output, depending on direction.
With process substitution, we can redirect the input or output of stubborn programs which work only with files passed as by name as arguments, and which do not support any convention for requesting that standard input or output be used in place of a file.
The token used by process substitution is platform-dependent; we can see what it is using echo. For instance on GNU/Linux, Bash takes advantage of the /dev/fd operating system feature:
$ echo <(ls -l)
/dev/fd/63
You can use the following command:
strace -T command 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>1.0'
Explanation:
strace -T adds the time spent in the syscall end the end of the line, enclosed in <...>
2>&1 >/dev/null | awk pipes stderr to awk. (strace writes it's output to stderr!)
The awk command removes the <> from the last field $NF and prints lines where the time spent is higher than a second.
Probably you'll also want to pass the threshold as a variable to the awk command:
strace -T command 2>&1 >/dev/null \
| awk -v thres=0.001 '{gsub(/[<>]/,"",$NF)}$NF+.0>thres+.0'

Redirecting tail output into a program

I want to send a program the most recent lines from a text file using tail as stdin.
First, I echo to the program some input that will be the same every time, then send in tail input from an inputfile which should first be processed through sed. The following is the command line that I expect to work. But when the program runs it only receives the echo input, not the tail input.
(echo "new" && tail -f ~/inputfile 2> /dev/null | sed -n -r 'some regex' && cat) | ./program
However, the following works exactly as expected, printing everything out to the terminal:
echo "new" && tail -f ~/inputfile 2> /dev/null | sed -n -r 'some regex' && cat
So I tried with another type of output, and again while the echoed text posted, the tail text does not appear anywhere:
(echo "new" && tail -f ~/inputfile 2> /dev/null | sed -n -r 'some regex') | tee out.txt
This made me think it is a problem with buffering, but I tried the unbuffer program and all other advice here (https://superuser.com/questions/59497/writing-tail-f-output-to-another-file) without results. Where is the tail output going and how can I get it to go into my program as expected?
The buffering problem was resolved when I prefixed the sed command with the following:
stdbuf -i0 -o0 -e0
Much more preferable to using unbuffer, which didn't even work for me. Dave M's suggestion of using sed's relatively new -u also seems to do the trick.
One thing you may be getting confused by -- | (pipeline) is higher precedence than && (consecutive execution). So when you say
(echo "new" && tail -f ~/inputfile 2> /dev/null | sed -n -r 'some regex' && cat) | ./program
that is equivalent to
(echo "new" && (tail -f ~/inputfile 2> /dev/null | sed -n -r 'some regex') && cat) | ./program
So the cat isn't really doing anything, and the sed output is probably buffered a bit. You can try using the -u option to sed to get it to use unbuffered output:
(echo "new" && (tail -f ~/inputfile 2> /dev/null | sed -n -u -r 'some regex')) | ./program
I believe some versions of sed default to -u when the output is a terminal and not when it is a pipe, so that may be the source of the difference you're seeing.
You can use the i command in sed (see the command list in the manpage for details) to do the inserting at the beginning:
tail -f inputfile | sed -e '1inew file' -e 's/this/that/' | ./program

listening on netcat works but not grep'able (or several other utilities)

I'm testing some netcat udp shell tools and was trying to take output and send it through standard pipe stuff. In this example, I have a netcat client which sends 'foo' then 'bar' then 'foo' with newlines for each attempt reading from the listener:
[root#localhost ~ 05:40:20 ]# exec 5< <(nc -d -w 0 -6 -l -k -u 9801)
[root#localhost ~ 05:40:25 ]# awk '{print}' <&5
foo
bar
foo
^C
[root#localhost ~ 05:40:48 ]# awk '{print}' <&5 | grep foo
^C
[root#localhost ~ 05:41:12 ]# awk '{print}' <&5 | grep --line-buffered foo
^C
[root#localhost ~ 05:41:37 ]#
[root#localhost ~ 05:44:38 ]# grep foo <&5
foo
foo
^C
[root#localhost ~ 05:44:57 ]#
I've checked the --line-buffered... and I also get the same behavior from 'od -bc', ie, nothing at all. Grep works on the fd... but if I pipe that grep to anything (like od -bc, cut), I get the same (nothing). Tried prefixing the last grep with stdbuf -oL to no avail. Could > to file and then tail -f it, but feel like I'm missing something.
Update:
Appears to be something descriptor/order/timing related. I created a file 'tailSource' and used this instead, which produced the same issue (no output) when I ran echo -e "foo\nfoo\nbar\nfoo" >> tailSource
[root#localhost shm 07:16:18 ]# exec 5< <(tail -n 0 -f tailSource)
[root#localhost shm 07:16:32 ]# awk '{print}' <&5 | grep foo
... and when I run without the '| grep foo', I get the output I'd expect.
(GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu))
awk is buffering when its output is not going to a terminal. If you have GNU awk, you can use its fflush() function to flush after every print
gawk '{print; fflush()}' <&5 | grep foo
In this particular case though, you don't need awk and grep, either will do.
awk /foo/ <&5
grep foo <&5
See BashFAQ 9 for more on buffering and how to work around it.
For what it's worth
$ exec 5< <(echo -e "foo\nbar\nfoo")
$ awk '{print}' <&5 | grep foo
prints
foo
foo
where as
$ exec 5< <(echo -e "foo\nbar\nfoo")
$ awk '{print}' <&5
foo
bar
foo
$ awk '{print}' <&5
$
the second call does not output anything.
I think this is because the file descriptor is at its end and you need to rewind it, see also this answer to this question question.
In order to use it, you could alternatively either capture it in a temporary file or perform the transformations in one line.

Resources