Pipe continuous stream to multiple files - linux

I have a program which will be running continuously writing output to stdout, and I would like to continuously take the last set of output from it and write this to a file. Each time a new set of output comes along, I'd like to overwrite the same file.
An example would be the following command:
iostat -xkd 5
Every 5 seconds this prints some output, with a blank line after each one. I'd like the last "set" to be written to a file, and I think this should be achievable with something simple.
So far I've tried using xargs, which I could only get to do this with a single line, rather than a group of lines delimited by something.
I think it might be possible with awk, but I can't figure out how to get it to either buffer the data as it goes along so it can write it out using system, or get it to close and reopen the same output file.
EDIT: To clarify, this is a single command which will run continuously. I do NOT want to start a new command and read the output of that.
SOLUTION
I adapted the answer from #Bittrance to get the following:
iostat -xkd 5 | (while read r; do
if [ -z "$r" ]; then
mv -f /tmp/foo.out.tmp /tmp/foo.out
else
echo "$r" >> /tmp/foo.out.tmp
fi
done)
This was basically the same, apart from detecting the end of a "section" and writing to a temp file so that whenever an external process tried to read the output file it would be complete.

Partial answer:
./your_command | (while read r ; do
if ... ; then
rm -f foo.out
fi
echo $r >> foo.out
done)
If there is a condition (the ...) such that you know that you are receiving the first line of the "set" this would work.

Why not:
while :; do
iostat -xkd > FILE
sleep 5
done
If you're set on using awk the following writes each output from iostat to a numbered file:
iostat -xkd 5 | awk -v RS=$'\n'$'\n' '{ print >NR; close(NR) }'
Note: the first record is the header that iostat outputs.

Edit In reply to comment, I tested this perl solution to work:
#!/usr/bin/perl
use strict;
use warnings;
$|=1;
open(STDOUT, '>', '/tmp/iostat.running') or die $!;
while(<>)
{
seek (STDOUT, 0, 0) if (m/^$/gio);
print
}
So now you can
iostat -xkd 5 | myscript.pl&
And watch the snapshots:
watch -n10 cat /tmp/iostat.running
If you wanted to make sure that enough space is overwritten at the end (since the output length might vary slightly each time), you could print some padding at the end, like e.g.:
print "***"x40 and seek (STDOUT, 0, 0) if (m/^$/gio);
(snipped old answer text since apparently it was confusing people)

Related

Use I/O redirection between two scripts without waiting for the first to finish

I have two scripts, let's say long.sh and simple.sh: one is very time consuming, the other is very simple. The output of the first script should be used as input of the second one.
As an example, the "long.sh" could be like this:
#!/bin/sh
for line in `cat LONGIFLE.dat` do;
# read line;
# do some complicated processing (time consuming);
echo $line
done;
And the simple one is:
#!/bin/sh
while read a; do
# simple processing;
echo $a + "other stuff"
done;
I want to pipeline the two scripts this:
sh long.sh | sh simple.sh
Using pipelines, the simple.sh has to wait the end of the long script before it could start.
I would like to know if in the bash shell it is possible to see the output of simple.sh per current line, so that I can see at runtime what line is being processed at this moment.
I would prefer not to merge the two scripts together, nor to call the simple.sh inside long.sh.
Thank you very much.
stdout is normally buffered. You want line-buffered. Try
stdbuf -oL sh long.sh | sh simple.sh
Note that this loop
for line in `cat LONGIFLE.dat`; do # see where I put the semi-colon?
reads words from the file. If you only have one word per line, you're OK. Otherwise, to read by lines, use while IFS= read -r line; do ...; done < LONGFILE.dat
Always quote your variables (echo "$line") unless you know specifically when not to.

Looping through lines in a file in bash, without using stdin

I am foxed by the following situation.
I have a file list.txt that I want to run through line by line, in a loop, in bash. A typical line in list.txt has spaces in. The problem is that the loop contains a "read" command. I want to write this loop in bash rather than something like perl. I can't do it :-(
Here's how I would usually write a loop to read from a file line by line:
while read p; do
echo $p
echo "Hit enter for the next one."
read x
done < list.txt
This doesn't work though, because of course "read x" will be reading from list.txt rather than the keyboard.
And this doesn't work either:
for i in `cat list.txt`; do
echo $i
echo "Hit enter for the next one."
read x
done
because the lines in list.txt have spaces in.
I have two proposed solutions, both of which stink:
1) I could edit list.txt, and globally replace all spaces with "THERE_SHOULD_BE_A_SPACE_HERE" . I could then use something like sed, within my loop, to replace THERE_SHOULD_BE_A_SPACE_HERE with a space and I'd be all set. I don't like this for the stupid reason that it will fail if any of the lines in list.txt contain the phrase THERE_SHOULD_BE_A_SPACE_HERE (so malicious users can mess me up).
2) I could use the while loop with stdin and then in each loop I could actually launch e.g. a new terminal, which would be unaffected by the goings-on involving stdin in the original shell. I tried this and I did get it to work, but it was ugly: I want to wrap all this up in a shell script and I don't want that shell script to be randomly opening new windows. What would be nice, and what might somehow be the answer to this question, would be if I could figure out how to somehow invoke a new shell in the command and feed commands to it without feeding stdin to it, but I can't get it to work. For example this doesn't work and I don't really know why:
while read p; do
bash -c "echo $p; echo ""Press enter for the next one.""; read x;";
done < list.txt
This attempt seems to fail because "read x", despite being in a different shell somehow, is still seemingly reading from list.txt. But I feel like I might be close with this one -- who knows.
Help!
You must open as a different file descriptor
while read p <&3; do
echo "$p"
echo 'Hit enter for the next one'
read x
done 3< list.txt
Update: Just ignore the lengthy discussion in the comments below. It has nothing to do with the question or this answer.
I would probably count lines in a file and iterate each of those using eg. sed. It is also possible to read infinitely from stdin by changing while condition to: while true; and exit reading with ctrl+c.
line=0 lines=$(sed -n '$=' in.file)
while [ $line -lt $lines ]
do
let line++
sed -n "${line}p" in.file
echo "Hit enter for the next ${line} of ${lines}."
read -s x
done
AWK is also great tool for this. Simple way to iterate through input would be like:
awk '{ print $0; printf "%s", "Hit enter for the next"; getline < "-" }' file
As an alternative, you can read from stderr, which by default is connected to the tty as well. The following then also includes a test for that assumption:
(
tty -s <& 2|| exit 1
while read -r line; do
echo "$line"
echo 'Hit enter'
read x <& 2
done < file
)

Bash while read loop extremely slow compared to cat, why?

A simple test script here:
while read LINE; do
LINECOUNT=$(($LINECOUNT+1))
if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done
When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.
Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.
Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).
The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:
while read size; do test "$size" -gt 0 || break; dd bs="$size" count=1 of=file$(( i++ )); done
in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.
It's because the bash script is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:
awk 'NR%1000==0{print}' inputFile
which matches your "print every 1000 lines" sample.
If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:
awk '{print length($0)" "$0}' inputFile | someOtherProcess
Tools like awk, sed, grep, cut and the more powerful perl are far more suited to these tasks than an interpreted shell script.
The perl solution for count bytes of each string:
perl -p -e '
use Encode;
print length(Encode::encode_utf8($_))."\n";$_=""'
for example:
dd if=/dev/urandom bs=1M count=100 |
perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |
tail
works for me as 7.7Mb/s
to compare how much script used:
dd if=/dev/urandom bs=1M count=100 >/dev/null
run as 9.1Mb/s
seems script not so slow :)
Not really sure what your script is supposed to do. So this might not be an answer to your question but more of a generic tip.
Don't cat your file and pipe it to your script, instead when reading from a file with a bash script do it like this:
while read line
do
echo $line
done <file.txt

Grep filtering output from a process after it has already started?

Normally when one wants to look at specific output lines from running something, one can do something like:
./a.out | grep IHaveThisString
but what if IHaveThisString is something which changes every time so you need to first run it, watch the output to catch what IHaveThisString is on that particular run, and then grep it out? I can just dump to file and later grep but is it possible to do something like background it and then bring it to foreground and bringing it back but now piped to some grep? Something akin to:
./a.out
Ctrl-Z
fg | grep NowIKnowThisString
just wondering..
No, it is only in your screen buffer if you didn't save it in some other way.
Short form: You can do this, but you need to know that you need to do it ahead-of-time; it's not something that can be put into place interactively after-the-fact.
Write your script to determine what the string is. We'd need a more detailed example of the output format to give a better example of usage, but here's one for the trivial case where the entire first line is the filter target:
run_my_command | { read string_to_filter_for; fgrep -e "$string_to_filter_for" }
Replace the read string_to_filter_for with as many commands as necessary to read enough input to determine what the target string is; this could be a loop if necessary.
For instance, let's say that the output contains the following:
Session id: foobar
and thereafter, you want to grep for lines containing foobar.
...then you can pipe through the following script:
re='Session id: (.*)'
while read; do
if [[ $REPLY =~ $re ]] ; then
target=${BASH_REMATCH[1]}
break
else
# if you want to print the preamble; leave this out otherwise
printf '%s\n' "$REPLY"
fi
done
[[ $target ]] && grep -F -e "$target"
If you want to manually specify the filter target, this can be done by having the loop check for a file being created with filter contents, and using that when starting up grep afterwards.
That is a little bit strange what you need, but you can do it tis way:
you must go into script session first;
then you use shell how usually;
then you start and interrupt you program;
then run grep over typescript file.
Example:
$ script
$ ./a.out
Ctrl-Z
$ fg
$ grep NowIKnowThisString typescript
You could use a stream editor such as sed instead of grep. Here's an example of what I mean:
$ cat list
Name to look for: Mike
Dora 1
John 2
Mike 3
Helen 4
Here we find the name to look for in the fist line and want to grep for it. Now piping the command to sed:
$ cat list | sed -ne '1{s/Name to look for: //;h}' \
> -e ':r;n;G;/^.*\(.\+\).*\n\1$/P;s/\n.*//;br'
Mike 3
Note: sed itself can take file as a parameter, but you're not working with text files, so that's how you'd use it.
Of course, you'd need to modify the command for your case.

Pull fields/attributes from lsof (Linux command line)

With the recent move to Flash 10 (or maybe it was a distro choice), I and many others are no longer able to copy Flash videos from /tmp. I have, however, found a workaround in the following:
First, execute:
lsof | grep Flash
which should return output like this:
plugin-co 8935 richard 16w REG 8,1 4139180 8220 /tmp/FlashXXq4KyOZ (deleted)
Note: You can see the problem here....the /tmp file has the file pointer released.
You are, however, able to grab the file by using the cp command thusly:
cp /proc/#/fd/# video.flv
where the 1st # is the process ID (8935) and the second if the next number (16, from 16w).
Currently, this works, but it requires a few manual steps. To automate this, I figure I could pull the PID and the fd number and insert them dynamically into the cp command.
My question is how do I pull the appropriate fields into variables? I know you can use $1, etc. for grabbing input arguments, but how do you retrieve outputs?
Note: I could use pidof plugin-container to find the PID, but I still need the other number (since it tells which specific flash video to save).
The following command will return PIDs and FDs for all the files in /tmp that have filenames that begin with "Flash"
lsof -F pfn /tmp/Flash*
and the output will look something like this:
p16471
f16
n/tmp/FlashXXq4KyOZ
f17
n/tmp/FlashXXq4KyOZ
p26588
f16
n/tmp/FlashYYh3JwIW
f17
Where the field identifiers are p: PID, f: FD, n: NAME. The -F option is designed to make the output of lsof easy to parse.
Iterating over these and removing the field identifiers is trivial.
#!/bin/bash
c=-1
while read -r line
do
case $line in
f*)
fds[pids[c]]+=${line:1}" "
;;
n*)
names[pids[c]]+=${line:1}" "
;;
p*)
pids[++c]=${line:1}
;;
esac
done < <(lsof -F pfn -- /tmp/Flash*)
for ((i=0; i<=c; i++))
do
for name in ${names[pids[i]]}
do
for fd in ${fds[pids[i]]}
do
echo "File: $name, Process ID: ${pids[i]}, File Descriptor: $fd"
done
done
done
Lines like this:
fds[pids[c]]+=${line:1}" "
accumulate file descriptors in a string stored in an array indexed by the PID. Doing this for file names will fail for filenames which contain spaces. That could be worked around if necessary.
The line is stripped of the leading field descriptor character by using a substring operator: ${line:1} starts at position one and includes the rest of the string so it drops character zero.
The second loop is just a demo to show iterating over the arrays.
var=$(lsof | awk '/Flash/{gsub(/[^0-9]/,"",$4);print $2 FS $4};exit')
set -- $var
pid=$1
number=$2
Completed Script:
#!/bin/sh
if [ $1 ]; then
#lsof | grep Flash | awk '{print $2}' also works for PID
pid=$(pidof plugin-container)
file_num=$(lsof -p $pid | grep /tmp/Flash | awk '{print substr($4,1,2)}')
cp /proc/$pid/fd/$file_num ~/Downloads/"$1".flv
else
echo "Please enter video name as argument."
fi
Avoid using lsof because it takes too long (>30 seconds) to return the path. The below .bashrc line will work with vlc, mplayer, or whatever you put in and return the path to the deleted temp file in milliseconds.
flashplay () {
vlc $(stat -c %N /proc/*/fd/* 2>&1|awk -F[\`\'] '/lash/{print$2}')
}

Resources