inotify + stdout piping - output being lost in a pipe - linux

I have some one-liner generating events for inotify.
while true; do for i in $(seq 1 100); do touch /tmp/ino/foo$i; sleep 1s; done; rm /tmp/ino/foo*; done
I then set up a small bash pipeline to watch that folder, ignoring events about ISDIR (maybe I could do that with inotifywait, but that's not relevant):
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR
And that works fine, I see lines like /tmp/ino/ CLOSE_WRITE,CLOSE foo57.
But if I add an extra pipe at the end, I don't get any output. To keep it simple, let's use the fact that grep pattern is idempotent.
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR | grep -v ISDIR
This produces no output. I know the my generator is still running, and a pipeless inotifywait -m -e close /tmp/ino in another terminal is still producing output.
After a bit of thinking, I assumed it was probably a buffering problem (issues like this often seem to be). I changed my pipeline to
inotifywait -m -e close /tmp/ino 2>/dev/null | grep -v ISDIR --line-buffered | grep -v ISDIR
And now I'm getting output again, so that fixes the problem.
However, I don't really understand why it failed to work without forcing line buffering. I've never experienced issues like this with grep, even with 'slow producing' outputs.
However, some other programs in the pipeline, forcing sed to be sed -u , and forcing me to add fflush() in at the end of each awk statement.
So, what's forcing strange buffering here, and how can I fix it (without having to scrabble around in man pages looking for esoteric force line buffering commands)?

inotifywait is probably buffering. I would have suggested using stdbuf:
stdbuf -oL inotifywait -m -e close /tmp/ino 2>/dev/null | grep ...

Related

linux bash and grep bug?

Doing the following:
First console
touch /tmp/test
Second console
tail -f /tmp/test |grep propo |grep -v miles
Third console
echo propo >> /tmp/test
Second console must show "propo" but it doesn't shows anything, if you run in second console instead:
tail -f /tmp/test |grep propo
And do echo propo >> /tmp/test it will show propo, but the grep -v is for miles not for propo
Why?
Test into your own environment if you want, it's pretty obvious but not working.
Why?
Most probably because the output of a command when piped to another command is fully buffered, not line buffered. The output could be buffered in the first pipe or by grep.
Use stdbuf -oL to force line buffering and grep --line-buffered for line buffered grep.
the problem is that grep does not use line buffering by default; so the output will be buffered. You could use grep --line-buffered:
tail -f /tmp/test | grep --line-buffered propo | grep -v miles

Allow sh to be run from anywhere

I have been monitoring the performance of my Linux server with ioping (had some performance degradation last year). For this purpose I created a simple script:
echo $(date) | tee -a ../sb-output.log | tee -a ../iotest.txt
./ioping -c 10 . 2>&1 | tee -a ../sb-output.log | grep "requests completed in\|ioping" | grep -v "ioping statistics" | sed "s/^/IOPing I\/O\: /" | tee -a ../iotest.txt
./ioping -RD . 2>&1 | tee -a ../sb-output.log | grep "requests completed in\|ioping" | grep -v "ioping statistics" | sed "s/^/IOPing seek rate\: /" | tee -a ../iotest.txt
etc
The script calls ioping in the folder /home/bench/ioping-0.6. Then it saves the output in readable form in /home/bench/iotest.txt. It also adds the date so I can compare points in time.
Unfortunately I am no experienced programmer and this version of the script only works if you first enter the right directory (/home/bench/ioping-0.6).
I would like to call this script from anywhere. For example by calling
sh /home/bench/ioping.sh
Googling this and reading about path variables was a bit over my head. I kept up ending up with different version of
line 3: ./ioping: No such file or directory
Any thoughts on how to upgrade my scripts so that it works anywhere?
The trick is the shell's $0 variable. This is set to the path of the script.
#!/bin/sh
set -x
cd $(dirname $0)
pwd
cd ${0%/*}
pwd
If dirname isn't available for some reason, like some limited busybox distributions, you can try using shell parameter expansion tricks like the second one in my example.
Isn't it obvious? ioping is not in . so you can't use ./ioping.
Easiest solution is to set PATH to include the directory where ioping is. perhaps more robust - figure out the path to $0 and use that path as the location for ioping (assing your script sits next to ioping).
If iopinf itself depend on being ruin in a certain directory, you might have to make your script cd to the ioping directory before running.

linux strace: How to filter system calls that take more than a second

I'm using "strace -f -T -tt -o foo.txt -p 1234" to print the time spent in system calls. This makes the output file huge, is there a way to just print the system calls that took greater than 1second. I can grep it out from the file later, but is there a better way?
If we simply omit the -o foo.txt argument, the output goes to standard output. We can pipe it through grep and redirect to the file:
strace -f -T -tt -p 1234 | grep pattern > foo.txt
To watch the output at the same time:
strace -f -T -tt -p 1234 | grep pattern | tee foo.txt
If a command prints only to a file that is passed as an argument, and we want to filter/redirect its output, the first step is to check whether it implements the dash convention: can you specify standard input or output using - as a filename argument:
some_command - | our_pipe > file.txt
If not, then the recourse is to use Bash process substitution substitution syntax: >(output command) and <(input command):
some_command >(our_pipe > file.txt)
The process substitution syntax expands into a token that is suitable as a filename argument for a command or function. When the program opens that token, it gets a file descriptor to the command's input or output, depending on direction.
With process substitution, we can redirect the input or output of stubborn programs which work only with files passed as by name as arguments, and which do not support any convention for requesting that standard input or output be used in place of a file.
The token used by process substitution is platform-dependent; we can see what it is using echo. For instance on GNU/Linux, Bash takes advantage of the /dev/fd operating system feature:
$ echo <(ls -l)
/dev/fd/63
You can use the following command:
strace -T command 2>&1 >/dev/null | awk '{gsub(/[<>]/,"",$NF)}$NF+.0>1.0'
Explanation:
strace -T adds the time spent in the syscall end the end of the line, enclosed in <...>
2>&1 >/dev/null | awk pipes stderr to awk. (strace writes it's output to stderr!)
The awk command removes the <> from the last field $NF and prints lines where the time spent is higher than a second.
Probably you'll also want to pass the threshold as a variable to the awk command:
strace -T command 2>&1 >/dev/null \
| awk -v thres=0.001 '{gsub(/[<>]/,"",$NF)}$NF+.0>thres+.0'

How does data get processed across pipes?

I used this command-line program that I found in another post on SO describing how to spider a website.
wget --spider --force-html -r -l2 http://example.com 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\)$' > wget.out
When I crawl a large site, it takes a long time to finish. Meanwhile the wget.out file on disk shows zero size. So when does the piped data get processed and written to the file on disk? Is it after each stage in the pipe having run to completion? In that case, will wget.out fill up after the entire crawling is over?
How do I make the program write intermittently to disk, so that, even if the crawling stage is interrupted, I have some output saved ?
There is buffering in each pipe, and maybe in the stdio layers of each program. Data will not make it to the disk until the final grep has processed enough lines to cause its buffers to fill to the point of being spilled to disk.
If you run your pipeline on the command-line, and then hit Ctrl-C, sigint will be sent to every process, terminating each, and losing any pending output.
Either:
Ignore sigint in all processes but the first. Bash hackery follows:
$ wget --spider --force-html -r -l2 http://example.com 2>&1 grep '^--' |
{ trap '' int; awk '{ print $3 }'; } |
∶
Simply deliver the keyboard interrupt to the first process. Interactively you can discover the pid with jobs -l and then kill that. (Run the pipeline in the background.)
$ jobs -l
[1]+ 10864 Running wget
3364 Running | grep
13500 Running | awk
∶
$ kill -int 10864
Play around with the disown bash builtin.

How to tail -f the latest log file with a given pattern

I work with some log system which creates a log file every hour, like follows:
SoftwareLog.2010-08-01-08
SoftwareLog.2010-08-01-09
SoftwareLog.2010-08-01-10
I'm trying to tail to follow the latest log file giving a pattern (e.g. SoftwareLog*) and I realize there's:
tail -F (tail --follow=name --retry)
but that only follow one specific name - and these have different names by date and hour. I tried something like:
tail --follow=name --retry SoftwareLog*(.om[1])
but the wildcard statement is resoved before it gets passed to tail and doesn't re-execute everytime tail retries.
Any suggestions?
I believe the simplest solution is as follows:
tail -f `ls -tr | tail -n 1`
Now, if your directory contains other log files like "SystemLog" and you only want the latest "SoftwareLog" file, then you would simply include a grep as follows:
tail -f `ls -tr | grep SoftwareLog | tail -n 1`
[Edit: after a quick googling for a tool]
You might want to try out multitail - http://www.vanheusden.com/multitail/
If you want to stick with Dennis Williamson's answer (and I've +1'ed him accordingly) here are the blanks filled in for you.
In your shell, run the following script (or it's zsh equivalent, I whipped this up in bash before I saw the zsh tag):
#!/bin/bash
TARGET_DIR="some/logfiles/"
SYMLINK_FILE="SoftwareLog.latest"
SYMLINK_PATH="$TARGET_DIR/$SYMLINK_FILE"
function getLastModifiedFile {
echo $(ls -t "$TARGET_DIR" | grep -v "$SYMLINK_FILE" | head -1)
}
function getCurrentlySymlinkedFile {
if [[ -h $SYMLINK_PATH ]]
then
echo $(ls -l $SYMLINK_PATH | awk '{print $NF}')
else
echo ""
fi
}
symlinkedFile=$(getCurrentlySymlinkedFile)
while true
do
sleep 10
lastModified=$(getLastModifiedFile)
if [[ $symlinkedFile != $lastModified ]]
then
ln -nsf $lastModified $SYMLINK_PATH
symlinkedFile=$lastModified
fi
done
Background that process using the normal method (again, I don't know zsh, so it might be different)...
./updateSymlink.sh 2>&1 > /dev/null
Then tail -F $SYMLINK_PATH so that the tail hands the changing of the symbolic link or a rotation of the file.
This is slightly convoluted, but I don't know of another way to do this with tail. If anyone else knows of a utility that handles this, then let them step forward because I'd love to see it myself too - applications like Jetty by default do logs this way and I always script up a symlinking script run on a cron to compensate for it.
[Edit: Removed an erroneous 'j' from the end of one of the lines. You also had a bad variable name "lastModifiedFile" didn't exist, the proper name that you set is "lastModified"]
I haven't tested this, but an approach that may work would be to run a background process that creates and updates a symlink to the latest log file and then you would tail -f (or tail -F) the symlink.
#!/bin/bash
PATTERN="$1"
# Try to make sure sub-shells exit when we do.
trap "kill -9 -- -$BASHPID" SIGINT SIGTERM EXIT
PID=0
OLD_FILES=""
while true; do
FILES="$(echo $PATTERN)"
if test "$FILES" != "$OLD_FILES"; then
if test "$PID" != "0"; then
kill $PID
PID=0
fi
if test "$FILES" != "$PATTERN" || test -f "$PATTERN"; then
tail --pid=$$ -n 0 -F $PATTERN &
PID=$!
fi
fi
OLD_FILES="$FILES"
sleep 1
done
Then run it as: tail.sh 'SoftwareLog*'
The script will lose some log lines if the logs are written to between checks. But at least it's a single script, with no symlinks required.
We have daily rotating log files as: /var/log/grails/customer-2020-01-03.log. To tail the latest one, the following command worked fine for me:
tail -f /var/log/grails/customer-`date +'%Y-%m-%d'`.log
(NOTE: no space after the + sign in the expression)
So, for you, the following should work (if you are in the same directory of the logs):
tail -f SoftwareLog.`date +'%Y-%m-%d-%H'`
I believe the easiest way is to use tail with ls and head, try something like this
tail -f `ls -t SoftwareLog* | head -1`

Resources