Bash: close if pipe IO is idle

Bash: close if pipe IO is idle - linux

How can I close a program if a pipe stream is idle for a period of time?
Say for instance:
someprogram | closeidlepipe -t 500 | otherprogram
Is there some program closeidlepipe that can close if idle for a period (-t 500)?
timeout can close after a period, but not with the "idle" distinction.
UPDATE
It is important to note that someprogram outputs an endless stream of binary data. The data may contain the null character \0 and should be piped verbatim.

Here's the general form of the heart of a program that does this.
while(1) {
struct timeval tv;
tv.m_sec = 0;
tv.m_usec = 500000;
int marker = 1;
select(1, &marker, NULL, NULL, &tv);
if (marker == 0) exit(1);
char buf[8192];
int n = read(0, buf, 8192);
if (n < 0) exit(2);
char *b = buf;
while (n)
{
int l = write(1, b, n);
if (l <= 0) exit(3);
b += l;
n -= l;
}
}

The builtin read has a timeout option -t.
someprogram |
while :; do
IFS= read -d'' -r -t 500 line
res=$?
if [[ $res -eq 0 || )); then
# Normal read up to delimiter
printf '%s\0' "$line"
else
# Either read timed out without reading another null
# byte, or there was some other failure meaning we
# should break. In neither case did we read a trailing null byte
# that read discarded.
[[ -n $line ]] && printf '%s' "$line"
break
fi
done |
otherprogram
If read times out after 500 seconds, the while loop will exit and the middle part of the pipeline closes. someprogram will receive a SIGCHLD signal the next time it tries to write to its end of that pipe, allowing it to exit.

Related

Processing backspace control character (^H) in real time while logging sdout to file

I am working on a script to test new-to-me hard drives in the background (so I can close the terminal window) and log the outputs. My problem is in getting badblocks to print stdout to the log file so I can monitor its multi-day progress and create properly formatted update emails.
I have been able to print stdout to a log file with the following: (flags are r/w, % monitor, verbose)
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tee sdx.log
Normally the output would look like:
Testing with pattern 0xaa: 2.23% done, 7:00 elapsed. (0/0/0 errors)
No new-line character is used, the ^H control command backs up the cursor, and then the new updated status overwrites the previous status.
Unfortunately, the control character is not processed but saved as a character in the file, producing the above output followed by 43 copies of ^H, the new updated stats, 43 copies of ^H, etc.
Since the output is updated at least once per second, this produces a much larger file than necessary, and makes it difficult to retrieve the current status.
While working in terminal, the solution cat sdx.log && echo"" prints the expected/wanted results by parsing the control characters (and then inserting a carriage return so it is not immediately printed over by the next terminal line), but using cat sdx.log > some.file or cat sdx.log | mail both still include all of the extra characters (though in email they are interpreted as spaces). This solution (or ones like it which decode or remove the control character at the time of access still produce a huge, unnecessary output file.
I have worked my way through the following similar questions, but none have produced (at least that I can figure out) a solution which works in real time with the output to update the file, instead requiring that the saved log file be processed separately after the task has finished writing, or that the log file not be written until the process is done, both of which defeat the stated goal of monitoring progress.
Bash - process backspace control character when redirecting output to file
How to "apply" backspace characters within a text file (ideally in vim)
Thank you!

The main place I've run into this in real life is trying to process man pages. In the past, I've always used a simple script that post processes by stripping out the backspace appropriately. One could probably do this sort of thing in 80 character of perl, but here's an approach that handles backspace and cr/nl fairly well. I've not tested extensively, but it produces good output for simple cases. eg:
$ printf 'xxx\rabclx\bo\rhel\nworld\n' | ./a.out output
hello
world
$ cat output
hello
world
$ xxd output
00000000: 6865 6c6c 6f0a 776f 726c 640a hello.world.
If your output starts to have a lot of csi sequences, this approach just isn't worth the trouble. cat will produce nice human consumable output for those cases.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE * xfopen(const char *path, const char *mode);
off_t xftello(FILE *stream, const char *name);
void xfseeko(FILE *stream, off_t offset, int whence, const char *name);
int
main(int argc, char **argv)
{
const char *mode = "w";
char *name = strchr(argv[0], '/');
off_t last = 0, max = 0, curr = 0;
name = name ? name + 1 : argv[0];
if( argc > 1 && ! strcmp(argv[1], "-a")) {
argv += 1;
argc -= 1;
mode = "a";
}
if( argc > 1 && ! strcmp(argv[1], "-h")) {
printf("usage: %s [-a] [-h] file [ file ...]\n", name);
return EXIT_SUCCESS;
}
if( argc < 2 ) {
fprintf(stderr, "Missing output file. -h for usage\n");
return EXIT_FAILURE;
}
assert( argc > 1 );
argc -= 1;
argv += 1;
FILE *ofp[argc];
for( int i = 0; i < argc; i++ ) {
ofp[i] = xfopen(argv[i], mode);
}
int c;
while( ( c = fgetc(stdin) ) != EOF ) {
fputc(c, stdout);
for( int i = 0; i < argc; i++ ) {
if( c == '\b' ) {
xfseeko(ofp[i], -1, SEEK_CUR, argv[i]);
} else if( isprint(c) ) {
fputc(c, ofp[i]);
} else if( c == '\n' ) {
xfseeko(ofp[i], max, SEEK_SET, argv[i]);
fputc(c, ofp[i]);
last = curr + 1;
} else if( c == '\r' ) {
xfseeko(ofp[i], last, SEEK_SET, argv[i]);
}
}
curr = xftello(ofp[0], argv[0]);
if( curr > max ) {
max = curr;
}
}
return 0;
}
off_t
xftello(FILE *stream, const char *name)
{
off_t r = ftello(stream);
if( r == -1 ) {
perror(name);
exit(EXIT_FAILURE);
}
return r;
}
void
xfseeko(FILE *stream, off_t offset, int whence, const char *name)
{
if( fseeko(stream, offset, whence) ) {
perror(name);
exit(EXIT_FAILURE);
}
}
FILE *
xfopen(const char *path, const char *mode)
{
FILE *fp = fopen(path, mode);
if( fp == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return fp;
}

You can delete the ^H
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | tr -d '\b' | tee sdx.log

I have found col -b and colcrt usefull, but none worked perfect for me. These will apply control characters, not just drop them:
sudo badblocks -b 4096 -wsv /dev/sdx 2>&1 | col -b | tee sdx.log

what happens to background processes stdout and stderr when I log out?

I'm trying to understand what happens with stdout and stderr of background processes when exiting an SSH session. I understand about SIGHUP, child processes and all that, but I'm puzzled about the following:
If I run:
(while true; do date; sleep 0.5; done) | tee foo | cat >bar
and then kill the cat process then the tee process terminates because it can no longer write into the pipe. You can observe this using ps.
But if I run:
(while true; do date; sleep 0.5; done) | tee foo & disown
and the log out of my SSH session, I can observe that everything continues running just fine "forever". So somehow the stdout of the tee process must "keep going" even though my pty must be gone.
Can anyone explain what happens in the second example?
(Yes, I know I could explicitly redirect stdout/stderr/stdin of the background process.)

This is the crucial loop where tee sends output to stdout and opened files:
while (1)
{
bytes_read = read (0, buffer, sizeof buffer);
if (bytes_read < 0 && errno == EINTR)
continue;
if (bytes_read <= 0)
break;
/* Write to all NFILES + 1 descriptors.
Standard output is the first one. */
for (i = 0; i <= nfiles; i++)
if (descriptors[i]
&& fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)
{
error (0, errno, "%s", files[i]);
descriptors[i] = NULL;
ok = false;
}
}
Pay closer attention on this part:
if (descriptors[i]
&& fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)
{
error (0, errno, "%s", files[i]);
descriptors[i] = NULL;
ok = false;
}
It shows that when an error occurs, tee would not close itself but just unset the file descriptor descriptors[i] = NULL and continue to keep reading data until EOF or error on input occurs besides EINTR.
The date command or anything that sends output to the pipe connected to tee would not terminated since tee still reads their data. Only that the data doesn't go anywhere besides the file foo. And even if a file argument was not provided, tee would still read their data.
This is what /proc/**/fd looks like on tee when disconnected from a terminal:
0 -> pipe:[431978]
1 -> /dev/pts/2 (deleted)
2 -> /dev/pts/2 (deleted)
And this one's from the process that connects to its pipe:
0 -> /dev/pts/2 (deleted)
1 -> pipe:[431978]
2 -> /dev/pts/2 (deleted)
You can see that tee's stdout and stderr is already EOL but it's still running.

Faster and precise way to count lines other than wc -l

Usually I use wc -l to count the lines of a file. However for a file with 5*10^7 lines, I get only 10^7 as an answer.
I've tried everything proposed here here:
How to count lines in a document?
But it takes to much time than wc -l.
Is there any other option?

Anyone serious about speed line counting can just create their own implementation:
#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#define BUFFER_SIZE (1024 * 16)
char BUFFER[BUFFER_SIZE];
int main(int argc, char** argv) {
unsigned int lines = 0;
int fd, r;
if (argc > 1) {
char* file = argv[1];
if ((fd = open(file, O_RDONLY)) == -1) {
fprintf(stderr, "Unable to open file \"%s\".\n", file);
return 1;
}
} else {
fd = fileno(stdin);
}
while ((r = read(fd, BUFFER, BUFFER_SIZE)) > 0) {
char* p = BUFFER;
while ((p = memchr(p, '\n', (BUFFER + r) - p))) {
++p;
++lines;
}
}
close(fd);
if (r == -1) {
fprintf(stderr, "Read error.\n");
return 1;
}
printf("%d\n", lines);
return 0;
}
Usage
a < input
... | a
a file
Example:
# time ./wc temp.txt
10000000
real 0m0.115s
user 0m0.102s
sys 0m0.014s
# time wc -l temp.txt
10000000 temp.txt
real 0m0.120s
user 0m0.103s
sys 0m0.016s
* Code compiled with -O3 natively on a system with AVX and SSE4.2 using GCC 4.8.2.

You could try sed
sed -n '$=' file
The = says to print the line number, and the dollar says to only do it on the last line. The -n says not to do too much else.
Or here's a way in Perl, save this as wc.pl and do chmod +x wc.pl:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = <#ARGV>;
my $lines = 0;
my $buffer;
open(FILE, $filename) or die "ERROR: Can not open file: $!";
while (sysread FILE, $buffer, 65536) {
$lines += ($buffer =~ tr/\n//);
}
close FILE;
print "$lines\n";
Run it like this:
wc.pl yourfile
Basically it reads your file in in chunks of 64kB at a time and then takes advantage of the fact that tr returns the number of substitutions it has made after asking it to delete all newlines.

Try with nl and see what happens...

You can get the line count using awk as well like below
awk 'END {print NR}' names.txt
(OR) Using while .. do .. done bash loop construct like
CNT=0; while read -r LINE; do (( CNT++ )); done < names.txt; echo $CNT

Depends on how you open the file, but probably reading it from STDIN instead would get the fix:
wc -l < file

reading a tuple from a file with awk

Hi I need an script to read number of eth interrupts from the /proc/interrupts file with awk and find the total number of interrupts per CPU core.An then I want to use them in bash.The content of the file is;
CPU0 CPU1 CPU2 CPU3
47: 33568 45958 46028 49191 PCI-MSI-edge eth0-rx-0
48: 0 0 0 0 PCI-MSI-edge eth0-tx-0
49: 1 0 1 0 PCI-MSI-edge eth0
50: 28217 42237 65203 39086 PCI-MSI-edge eth1-rx-0
51: 0 0 0 0 PCI-MSI-edge eth1-tx-0
52: 0 1 0 1 PCI-MSI-edge eth1
59: 114991 338765 77952 134850 PCI-MSI-edge eth4-rx-0
60: 429029 315813 710091 26714 PCI-MSI-edge eth4-tx-0
61: 5 2 1 5 PCI-MSI-edge eth4
62: 1647083 208840 1164288 933967 PCI-MSI-edge eth5-rx-0
63: 673787 1542662 195326 1329903 PCI-MSI-edge eth5-tx-0
64: 5 6 7 4 PCI-MSI-edge eth5
I am reading this file with awk in this code:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
print core_count
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%d\n", totals[i])
}
' $FILE)
core_count=$(echo $output | cut -d' ' -f1)
output=$(echo $output | sed 's/^[0-9]*//')
totals=(${output// / })
In this approach, I handle the total core count and then the total interrupts for per CORE in order to sort them in my script.But I can only handle the number in totals array lke this,
totals[0]=22222
totals[1]=33333
But I need to handle them as tuple with the name of the CPU cores.
totals[0]=(cPU1,2222)
totals[1]=(CPU',3333)
I think I must assign the names to an array and read themn to bash as tuples in my SED. How can I achieve this?

First of all, there's no such thing as a 'tuple' in bash. And arrays are completely flat. It means that you either have a 'scalar' variable, or one level array of scalars.
There is a number approaches to the task you're facing. Either:
If you're using a new enough bash (4.2 AFAIR), you can use an associative array (hash, map or however you call it). Then, the CPU names will be keys and the numbers will be values;
Create a plain array (perl-like hash) where odd indexes will have the keys (CPU names) and even ones — the values.
Create two separate arrays, one with the CPU names and other with the values,
Create just a single array, with CPU names separated from values by some symbol (i.e. = or :).
Let's cover approach 2 first:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%s %d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
totals=(${output})
Note a few things I've changed to make the script simpler:
awk now outputs `cpu-name number', one per line, separated by a single space;
the core count is not output by awk (to avoid preprocessing output) but instead deduced from number of lines in the output,
the totals array is created by flattening the output — both spaces and newlines will be treated as whitespace and use to separate the values.
The resulting array looks like:
totals=( CPU0 12345 CPU1 23456 ) # ...
To iterate over it, you could use something like (the simple way):
set -- "${totals[#}}"
while [[ $# -gt 0 ]]; do
cpuname=${1}
value=${2}
# ...
shift;shift
done
Now let's modify it for approach 1:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("[%s]=%d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
declare -A totals
eval totals=( ${output} )
Note that:
the awk output format has been changed to suit the associative array semantics,
totals is declared as an associative array (declare -A),
sadly, eval must be used to let bash directly handle the output.
The resulting array looks like:
declare -A totals=( [CPU0]=12345 [CPU1]=23456 )
And now you can use:
echo ${totals[CPU0]}
for cpu in "${!totals[#]}"; do
echo "For CPU ${cpu}: ${totals[${cpu}]}"
done
The third approach can be done a number of different ways. Assuming you can allow two reads of /proc/interrupts, you could even do:
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%d\n", totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
names=( $(cat /proc/interrupts | head -n 1) )
totals=( ${output} )
So, now the awk is once again only outputting the counts, and names are obtained by bash from first line of /proc/interrupts directly. Alternatively, you could create split arrays from a single array obtained in approach (2), or parsing the awk output some other way.
The result would be in two arrays:
names=( CPU0 CPU1 )
totals=( 12345 23456 )
And output:
for (( i = 0; i < core_count; i++ )); do
echo "${names[$i]} -> ${totals[$i]}"
done
And the last approach:
#!/bin/bash
FILE="/proc/interrupts"
output=$(awk 'NR==1 {
core_count = NF
for (i = 1; i <= core_count; i++)
names[i-1] = $i
next
}
/eth/ {
for (i = 2; i <= 2+core_count; i++)
totals[i-2] += $i
}
END {
for (i = 0; i < core_count; i++)
printf("%s=%d\n", names[i], totals[i])
}
' ${FILE})
core_count=$(echo "${output}" | wc -l)
totals=( ${output} )
Now the (regular) array looks like:
totals=( CPU0=12345 CPU1=23456 )
And you can parse it like:
for x in "${totals[#]}"; do
name=${x%=*}
value=${x#*=}
echo "${name} -> ${value}"
done
(note now that splitting CPU name and value occurs in the loop).

How to generate random numbers in the BusyBox shell

How can I generate random numbers using AShell (restricted bash)? I am using a BusyBox binary on the device which does not have od or $RANDOM. My device has /dev/urandom and /dev/random.

$RANDOM and od are optional features in BusyBox, I assume given your question that they aren't included in your binary. You mention in a comment that /dev/urandom is present, that's good, it means what you need to do is retrieve bytes from it in a usable form, and not the much more difficult problem of implementing a random number generator. Note that you should use /dev/urandom and not /dev/random, see Is a rand from /dev/urandom secure for a login key?.
If you have tr or sed, you can read bytes from /dev/urandom and discard any byte that isn't a desirable character. You'll also need a way to extract a fixed number of bytes from a stream: either head -c (requiring FEATURE_FANCY_HEAD to be enabled) or dd (requiring dd to be compiled in). The more bytes you discard, the slower this method will be. Still, generating random bytes is usually rather fast in comparison with forking and executing external binaries, so discarding a lot of them isn't going to hurt much. For example, the following snippet will produce a random number between 0 and 65535:
n=65536
while [ $n -ge 65536 ]; do
n=1$(</dev/urandom tr -dc 0-9 | dd bs=5 count=1 2>/dev/null)
n=$((n-100000))
done
Note that due to buffering, tr is going to process quite a few more bytes than what dd will end up keeping. BusyBox's tr reads a bufferful (at least 512 bytes) at a time, and flushes its output buffer whenever the input buffer is fully processed, so the command above will always read at least 512 bytes from /dev/urandom (and very rarely more since the expected take from 512 input bytes is 20 decimal digits).
If you need a unique printable string, just discard non-ASCII characters, and perhaps some annoying punctuation characters:
nonce=$(</dev/urandom tr -dc A-Za-z0-9-_ | head -c 22)
In this situation, I would seriously consider writing a small, dedicated C program. Here's one that reads four bytes and outputs the corresponding decimal number. It doesn't rely on any libc function other than the wrappers for the system calls read and write, so you can get a very small binary. Supporting a variable cap passed as a decimal integer on the command line is left as an exercise; it'll cost you hundreds of bytes of code (not something you need to worry about if your target is big enough to run Linux).
#include <stddef.h>
#include <unistd.h>
int main () {
int n;
unsigned long x = 0;
unsigned char buf[4];
char dec[11]; /* Must fit 256^sizeof(buf) in decimal plus one byte */
char *start = dec + sizeof(dec) - 1;
n = read(0, buf, sizeof(buf));
if (n < (int)sizeof(buf)) return 1;
for (n = 0; n < (int)sizeof(buf); n++) x = (x << 8 | buf[n]);
*start = '\n';
if (x == 0) *--start = '0';
else while (x != 0) {
--start;
*start = '0' + (x % 10);
x = x / 10;
}
while (n = write(1, start, dec + sizeof(dec) - start),
n > 0 && n < dec + sizeof(dec) - start) {
start += n;
}
return n < 0;
}

</dev/urandom sed 's/[^[:digit:]]\+//g' | head -c10

/dev/random or /dev/urandom are likely to be present.
Another option is to write a small C program that calls srand(), then rand().

I Tried Gilles' first snippet with BusyBox 1.22.1 and I have some patches, which didn't fit into a comment:
while [ $n -gt 65535 ]; do
n=$(</dev/urandom tr -dc 0-9 | dd bs=5 count=1 2>/dev/null | sed -e 's/^0\+//' )
done
The loop condition should check for greater than the maximum value, otherwise there will be 0 executions.
I silenced dd's stderr
Leading zeros removed, which could lead to surprises in contexts where interpreted as octal (e.g. $(( )))

Hexdump and dc are both available with busybox. Use /dev/urandom for mostly random or /dev/random for better random. Either of these options are better than $RANDOM and are both faster than looping looking for printable characters.
32-bit decimal random number:
CNT=4
RND=$(dc 10 o 0x$(hexdump -e '"%02x" '$CNT' ""' -n $CNT /dev/random) p)
24-bit hex random number:
CNT=3
RND=0x$(hexdump -e '"%02x" '$CNT' ""' -n $CNT /dev/random)
To get smaller numbers, change the format of the hexdump format string and the count of bytes that hexdump reads.

Trying escitalopram's solution didn't work on busybox v1.29.0 but inspired me doing a function.
sI did actually come up with a portable random number generation function that asks for the number of digits and should work fairly well (tested on Linux, WinNT10 bash, Busybox and msys2 so far).
# Get a random number on Windows BusyBox alike, also works on most Unixes
function PoorMansRandomGenerator {
local digits="${1}" # The number of digits of the number to generate
local minimum=1
local maximum
local n=0
if [ "$digits" == "" ]; then
digits=5
fi
# Minimum already has a digit
for n in $(seq 1 $((digits-1))); do
minimum=$minimum"0"
maximum=$maximum"9"
done
maximum=$maximum"9"
#n=0; while [ $n -lt $minimum ]; do n=$n$(dd if=/dev/urandom bs=100 count=1 2>/dev/null | tr -cd '0-9'); done; n=$(echo $n | sed -e 's/^0//')
# bs=19 since if real random strikes, having a 19 digits number is not supported
while [ $n -lt $minimum ] || [ $n -gt $maximum ]; do
if [ $n -lt $minimum ]; then
# Add numbers
n=$n$(dd if=/dev/urandom bs=19 count=1 2>/dev/null | tr -cd '0-9')
n=$(echo $n | sed -e 's/^0//')
if [ "$n" == "" ]; then
n=0
fi
elif [ $n -gt $maximum ]; then
n=$(echo $n | sed 's/.$//')
fi
done
echo $n
}
The following gives a number between 1000 and 9999
echo $(PoorMansRandomGenerator 4)

Improved the above reply to a more simpler version,that also runs really faster, still compatible with Busybox, Linux, msys and WinNT10 bash.
function PoorMansRandomGenerator {
local digits="${1}" # The number of digits to generate
local number
# Some read bytes can't be used, se we read twice the number of required bytes
dd if=/dev/urandom bs=$digits count=2 2> /dev/null | while read -r -n1 char; do
number=$number$(printf "%d" "'$char")
if [ ${#number} -ge $digits ]; then
echo ${number:0:$digits}
break;
fi
done
}
Use with
echo $(PoorMansRandomGenerator 5)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash: close if pipe IO is idle - linux

Related

Processing backspace control character (^H) in real time while logging sdout to file

what happens to background processes stdout and stderr when I log out?

Faster and precise way to count lines other than wc -l

reading a tuple from a file with awk

How to generate random numbers in the BusyBox shell

Categories

Resources