Bash commands piped to awk are sometimes buffered

Bash commands piped to awk are sometimes buffered - linux

System: Linux 4.13.0-43-generic #48~16.04.1-Ubuntu
BASH_VERSION='4.3.48(1)-release'
The command:
while sleep 5
do
date +%T
done | awk -F: '{print $3}'
Should print the 3rd field (seconds) of the "date" output, one line every 5 seconds. Problem: awk reads from the pipe, and processes its input, only when the pipe's buffer is full. i.e. when more than 4K of input is generated.
When awk is replaced by cat, a line is printed every 5 seconds as expected.
This code snippet is simplified from a shell script which had worked ok on other systems, so there must be something about bash, awk and their configuration in this system.
In short, is there a way to convince awk to behave like cat when reading from a pipe?
#Ed Morton : I did try to add fflush() after each print, but it does not work -- that's what showed that the problem is with awk's input, not output.
I also tried to add calls to system("date"), which showed that indeed awk gets all input lines at once, not immediately when they are produced.
For those who asked:
$ awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
compiled limits:
max NF 32767
sprintf buffer 2040

While trying to find out how to make awk print its version, I discovered that it is really mawk, and that it has the following flag:
-W interactive -- sets unbuffered writes to stdout and line buffered reads from stdin.
Records from stdin are lines regardless of the value of RS.
This seems to solve the problem!
Thanks to all repliers.

Related

Trying to add a user inputted variable to end of file

Using Ubuntu.
Currently I'm trying to add a user inputted variable to the end of a file.
In short, this allows me to use a BASH script to automate adding VSFTPD users.
Currently I have used awk & sed.
I don't have the sed but here is my awk that I currently have together.
awk '{$centre_name}' /etc/vsftpd-users

GNU AWK solution
You might use -v to ram variable into GNU AWK, I would do it following way, let file.txt content be
1
2
3
then
var1="four"
awk -v var=${var1} '{print}END{print var}' file.txt
gives output
1
2
3
four
Explanation: I use -v to set awk's variable var value to value of shell's variable var1 then each line of file I just print as-is, after processing is done I do print value of var.
(tested in gawk 4.2.1)
GNU sed solution You might use $ to target last line and use command a for append line of text as follows, for file.txt shown earlier
var1="four"
sed "$ a ${var1}" file.txt
gives same output as above
(tested in GNU sed 4.5)

How can I cat some continous logs and grep word in real time?

In Linux, I want to monitor the output of some tool, e.g. dbus-monitor's output. I hope to cat some special key word of its output, and then use the key word to be as input argument of other program. Like below, but it is not good.
dbus-monitor --system > d.log &
var=`cat d.log | grep some-key-word`
my_script.sh $var
I hope to monitor the output flow in real time, and not to cat the whole log from beginning. Just to cat its latest change. E.g. dmesg provides an option, dmesg -w, which meets what I want.
-w, --follow wait for new messages
So how to make such script? To cat the latest new output and use it continuously.

Instead of cat, use tail -F <file> | grep <something>. This option makes tail to wait for and output all incoming data. Most likely, you also will need to modify buffering mode for standard streams with stdbuf -oL (by default, stdout is fully buffered meaning that data is written into file each couple of kilobytes and not after each line).

Grep versus Awk: How do the search mechanisms differ

I am writing a script that must loop, each loop different scripts pull variables from external files and the last step compiles them. I am trying to maximize the speed at which this loop can run, and thus trying to find the best programs for the job.
The rate limiting step right now is searching through a file which has 2 columns and 4.5 million lines. column one is a key and column 2 is the value I am extracting.
The two programs I am evaluating are awk and grep. I have put the two scripts and their run times to find the last value below.
time awk -v a=15 'BEGIN{B=10000000}$1==a{print $2;B=NR}NR>B{exit}' infile
T
real 0m2.255s
user 0m2.237s
sys 0m0.018s
time grep "^15 " infile |cut -d " " -f 2
T
real 0m0.164s
user 0m0.127s
sys 0m0.037s
This brings me to my question... how does grep search. I understand awk runs line by line and field by field, which is why it takes longer as the file gets longer and i have to search further into it.
how does grep search? Clearly not line by line, or if it is it's clearly in a much different manner than awk considering the almost 20x time difference.
(I have noticed awk runs faster than grep for short files and I've yet to try and find where they diverge, but for those sizes it really doesn't matter nearly as much!).
I'd like to understand this so I can make good decisions for future program usage.

The awk command you posted does far more than the grep+cut:
awk -v a=15 'BEGIN{B=10000000}$1==a{print $2;B=NR}NR>B{exit}' infile
grep "^15 " infile |cut -d " " -f 2
so a time difference is very understandable. Try this awk command, which IS equivalent to the grep+cut, and see what results you get so we can compare apples to apples:
awk '/^15 /{print $2}' infile
or even:
awk '$1==15{print $2}' infile

Why `read -t` is not timing out in bash on RHEL?

Why read -t doesn't time out when reading from pipe on RHEL5 or RHEL6?
Here is my example which doesn't timeout on my RHEL boxes wile reading from the pipe:
tail -f logfile.log | grep 'something' | read -t 3 variable
If I'm correct read -t 3 should timeout after 3 seconds?
Many thanks in advance.
Chris
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

The solution given by chepner should work.
An explanation why your version doesn't is simple: When you construct a pipe like yours, the data flows through the pipe from the left to the right. When your read times out however, the programs on the left side will keep running until they notice that the pipe is broken, and that happens only when they try to write to the pipe.
A simple example is this:
cat | sleep 5
After five seconds the pipe will be broken because sleep will exit, but cat will nevertheless keep running until you press return.
In your case that means, until grep produces a result, your command will keep running despite the timeout.

While not a direct answer to your specific question, you will need to run something like
read -t 3 variable < <( tail -f logfile.log | grep "something" )
in order for the newly set value of variable to be visible after the pipeline completes. See if this times out as expected.
Since you are simply using read as a way of exiting the pipeline after a fixed amount of time, you don't have to worry about the scope of variable. However, grep may find a match without printing it within your timeout due to its own internal buffering. You can disable that (with GNU grep, at least), using the --line-buffered option:
tail -f logfile.log | grep --line-buffered "something" | read -t 3
Another option, if available, is the timeout command as a replacement for the read:
timeout 3 tail -f logfile.log | grep -q --line-buffered "something"
Here, we kill tail after 3 seconds, and use the exit status of grep in the usual way.

I dont have a RHEL server to test your script right now but I could bet than read is exiting on timeout and working as it should. Try run:
grep 'something' | strace bash -c "read -t 3 variable"
and you can confirm that.

Line Buffered Cat

is there a way to do line-bufferd cat? For example, I want to watch a UART device, and I only want to see it's message when there is a whole line. Can I do some thing like:
cat --line-buffered /dev/crbif0rb0c0ttyS0
Thanks.

You can also use bash to your advantage here:
cat /dev/crbif0rb0c0ttyS0 | while read line; do echo $line; done
Since the read command reads a line at a time, it will perform the line buffering that cat does not.

No, but GNU grep with --line-buffered can do this. Just search for something every line has, such as '^'.

Pipe it through perl in a no-op line-buffered mode:
perl -pe 1 /dev/whatever

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash commands piped to awk are sometimes buffered - linux

Related

Trying to add a user inputted variable to end of file

How can I cat some continous logs and grep word in real time?

Grep versus Awk: How do the search mechanisms differ

Why `read -t` is not timing out in bash on RHEL?

Line Buffered Cat

Categories

Resources