Linux tee command occasionally fails if followed by a pipe - linux

I run a find command with tee log and xargs process output; by accident I forget add xargs in second pipe and found this question.
The example:
% tree
.
├── a.sh
└── home
└── localdir
├── abc_3
├── abc_6
├── mydir_1
├── mydir_2
└── mydir_3
7 directories, 1 file
and the content of a.sh is:
% cat a.sh
#!/bin/bash
LOG="/tmp/abc.log"
find home/localdir -name "mydir*" -type d -print | tee $LOG | echo
If I add the second pipe with some command, such as echo or ls, the write log action would occasionally fail.
These are some examples when I ran ./a.sh many times:
% bash -x ./a.sh; cat /tmp/abc.log // this tee failed
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
% bash -x ./a.sh; cat /tmp/abc.log // this tee ok
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
home/localdir/mydir_2 // this is cat /tmp/abc.log output
home/localdir/mydir_3
home/localdir/mydir_1
Why is it that if I add a second pipe with some command (and forget xargs), the tee command will fail occasionally?

The problem is that, by default, tee exits when a write to a pipe fails. So, consider:
find home/localdir -name "mydir*" -type d -print | tee $LOG | echo
If echo completes first, the pipe will fail and tee will exit. The timing, though, is imprecise. Every command in the pipeline is in a separate subshell. Also, there are the vagaries of buffering. So, sometimes the log file is written before tee exits and sometimes it isn't.
For clarity, let's consider a simpler pipeline:
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="0" [2]="0")'
1
2
3
4
5
6
7
8
9
10
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="141" [2]="0")'
$
In the first execution, each process in the pipeline exits with a success status and the log file is written. In the second execution of the same command, tee fails with exit code 141 and the log file is not written.
I used true in place of echo to illustrate the point that there is nothing special here about echo. The problem exists for any command that follows tee that might reject input.
Documentation
Very recent versions of tee have an option to control the pipe-fail-exit behavior. From man tee from coreutils-8.25:
--output-error[=MODE]
set behavior on write error. See MODE below
The possibilities for MODE are:
MODE determines behavior with write errors on the outputs:
'warn' diagnose errors writing to any output
'warn-nopipe'
diagnose errors writing to any output not a pipe
'exit' exit on error writing to any output
'exit-nopipe'
exit on error writing to any output not a pipe
The default MODE for the -p option is 'warn-nopipe'. The default
operation when --output-error is not specified, is to exit immediately
on error writing to a pipe, and diagnose errors writing to non pipe
outputs.
As you can see, the default behavior is "to exit immediately
on error writing to a pipe". Thus, if the attempt to write to the process that follows tee fails before tee wrote the log file, then tee will exit without writing the log file.

Right, piping from tee to something that exits early (not dependent on reading the input from tee in your case) will cause intermittent errors.
For a summary of this gotcha see:
http://www.pixelbeat.org/docs/coreutils-gotchas.html#tee

I debugged the tee source code, but I'm not familiar with Linux C, so maybe have problems.
tee belongs to coreutils package, under src/tee.c
First, it set buffer with:
setvbuf (stdout, NULL, _IONBF, 0); // for standard output
setvbuf (descriptors[i], NULL, _IONBF, 0); // for file descriptor
So it is unbuffer?
Second, tee put stdout as its first item in descriptor array, and will write to descriptor with for loop:
/* In the array of NFILES + 1 descriptors, make
the first one correspond to standard output. */
descriptors[0] = stdout;
files[0] = _("standard output");
setvbuf (stdout, NULL, _IONBF, 0);
...
for (i = 0; i <= nfiles; i++) {
if (descriptors[i]
&& fwrite (buffer, bytes_read, 1, descriptors[i]) != 1) // failed!!!
{
error (0, errno, "%s", files[i]);
descriptors[i] = NULL;
ok = false;
}
}
such as tee a.log, descriptors[0] is stdout, and descriptors[1] is a.log.
As #John1024 said, pipeline is parallel (what I misunderstand before). The second pipe command, such as echo, ls, or true, not accept input, so it would not "wait" for the input, and if it execute faster, it will close the pipe (input end) before tee write to output end, so above code, the comment line will failed not not go on writing to file descriptor.
Supply:
The strace result with killed by SIGPIPE:
write(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 21) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=22649, si_uid=1000} ---
+++ killed by SIGPIPE +++

Related

Insert input to exe program with linux script

I got a script that asks 1000 times for input of 1-5, it looks like this:
insert1:
insert2:
insert3:
insert4:
insert5:
//and again 1-5
insert 1:
...in total it will get 1000 inputs
I want to write a one line script that will run the script I just described, it will insert the input that needed each time.
this is what I tried:
#!/bin/bash
./my_script.exe -l | for i in {1..200}; do for j in {1..5}; do j; done; done
You are nearly there, but do it the other way around:
for ((i=1;i<=200:i++)) ; do
for ((j=1;j<=5;j++)) ; do
echo $j
done
done | ./myscript.exe -l
You can put a # before the | to comment it out and see what the script sends to your program.
You need to differentiate between parameters which are specified after the program name like this:
program param1 param2 param3
and inputs, which a program gets by reading its stdin and are supplied like this:
printf "input1\ninput2\ninput3\n" | program
Alternative version of second command:
{ echo input1; echo input2; echo input3; } | program

How to send stdout/stderr to grandparent process?

I have this node.js code:
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo; echo 'du results:';
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
the problem is that multiple grandchildren will write all it's stdout/stderr to the parent.
I want to do something like this instead:
const pid = process.pid;
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
exec 3<>/proc/${pid}/fd/1
echo > 3 ; echo > 3; echo 'du results:' > 3;
( cd "${extractDir}" && tar -xzvf "${createTarball.pack.value}" ) > /dev/null;
find "${extractDir}/package" -type f | xargs du --threshold=500KB ;
`);
k.stdout.pipe(pt(chalk.redBright('this is a big file (>500KB) according to du: '))).pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
but that technique won't work on MacOS because Mac doesn't have the /proc/pid fs feature. Does anyone know what I am trying to do and maybe a good workaround?
Install libsys or some other means of making raw syscalls from node. Then, do the syscall dup(2) with 1 as its argument, and store the result to variable fd. Once you do that, fd will be a number of a file descriptor that's a duplicate of node's stdout, and it will be inherited by child processes. At that point, just remove exec 3<>/proc/${pid}/fd/1 from bash, and replace all of your >&3s with >&${fd}s.
So I solved this problem a different way, the OP was a bit of an XY question - so I basically use mkfifo since I am on MacOS and don't have a /proc/<pid> file:
first we call this:
mkfifo "$fifoDir/fifo"
and then we have
const fifo = cp.spawn('bash');
fifo.stdout.setEncoding('utf8');
fifo.stdout
.pipe(pt(chalk.yellow.bold('warning: this is a big file (>5KB) according to du: ')))
.pipe(process.stdout);
fifo.stdin.end(`
while true; do
cat "${fifoDir}/fifo"
done
`);
const k = cp.spawn('bash');
k.stdin.end(`
set -e;
echo; echo 'du results:';
tar -xzvf "${createTarball.pack.value}" -C "${extractDir}" > /dev/null;
${cmd} > "${fifoDir}/fifo";
kill -INT ${fifo.pid};
`);
k.stdout.pipe(process.stdout);
k.stderr.pipe(pt(chalk.magenta('du stderr: '))).pipe(process.stderr);
k.once('exit', code => {
if (code > 0) {
log.error('Could not run cmd:', cmd);
}
cb(code);
});
},
the "fifo" child process is a sibling process to process k, which is obvious when you look at the above code block, and the "fifo" process ends up handling the stdout of one of the subprocesses of the k process - they intercommunicate by way of the named pipe (mkfifo).

how to redirect this perl script's output to file?

I don't have much experience with perl, and would appreciate any/all feedback....
[Before I start: I do not have access/authority to change the existing perl scripts.]
I run a couple perl scripts several times a day, but I would like to begin capturing their output in a file.
The first perl script does not take any arguments, and I'm able to "tee" its output without issue:
/asdf/loc1/rebuild-stuff.pl 2>&1 | tee $mytmpfile1
The second perl script hangs with this command:
/asdf/loc1/create-site.pl --record=${newsite} 2>&1 | tee $mytmpfile2
FYI, the following command does NOT hang:
/asdf/loc1/create-site.pl --record=${newsite} 2>&1
I'm wondering if /asdf/loc1/create-site.pl is trying to process the | tee $mytmpfile2 as additional command-line arguments? I'm not permitted to share the entire script, but here's the beginning of its main routine:
...
my $fullpath = $0;
$0 =~ s%.*/%%;
# Parse command-line options.
...
Getopt::Long::config ('no_ignore_case','bundling');
GetOptions ('h|help' => \$help,
'n|dry-run|just-print' => \$preview,
'q|quiet|no-mail' => \$quiet,
'r|record=s' => \$record,
'V|noverify' => \$skipverify,
'v|version' => \$version) or exit 1;
...
Does the above code provide any clues? Other than modifying the script, do you have any tips for allowing me to capture its output in a file?
It's not hanging. You are "suffering from buffering". Like most programs, Perl's STDOUT is buffered by default. Like most programs, Perl's STDOUT is flushed by a newline when connected to a terminal, and block buffered otherwise. When STDOUT isn't connected to a terminal, you won't get any output until 4 KiB or 8 KiB of output is accumulated (depending on your version of Perl) or the program exits.
You could add $| = 1; to the script to disable buffering for STDOUT. If your program ends with a true value or exits using exit, you can do that without changing the .pl file. Simply use the following wrapper:
perl -e'
$| = 1;
$0 = shift;
do($0);
my $e = $# || $! || "$0 didn\x27t return a true value\n";
die($e) if $e;
' -- prog args | ...
Or you could fool the program into thinking it's connected to a terminal using unbuffer.
unbuffer prog args | ...

Two way communication with process

I have given some compiled program. I want to communicate with it from my bash script by program stdin and stdout. I need two way communication. Program cannot be killed between exchange of information. How I can do that?
Simple example:
Let that program be compiled partial summation (C++) and script results will be squares of that sums. Program:
int main() {
int num, sum = 0;
while(true) {
std::cin >> num;
sum += num;
std::cout << sum << std::endl;
}
}
My script should looks like that:
for i in 1 2 3 4; do
echo "$i" > program
read program to line;
echo $((line * line))
done
If in program I have for(int i = 1; i <= 4; ++i), then I can do something like that:
exec 4< <(./program); # Just read from program
for i in 1 2 3 4; do
read <&4 line;
echo "sh: $((line * line))";
done
For more look here. From the other hand, if in program I have std::cout << sum * sum;, then solution could be:
exec &3> >(./program); # Write to program
for i in 1 2 3 4; do
echo "$i" > &3
done
My problem is two way communication with other process / program. I don't have to use exec. I cannot install third party software. Bash-only solution, without files, will be nice.
If I run other process, it will be nice to know pid to kill that at the end of script.
I think about communication with two or maybe three processes in the future. Output of firs program may dependents on output of second program and also in second side. Like communicator of processes.
However, I cannot recompile programs and change something. I have only stdin and stdout communication in programs.
If you have bash which is newer than 4.0, you can use coproc.
However, don't forget that the input/output of the command you want to communicate might be buffered.
In that case you should wrap the command with something like stdbuf -i0 -o0
Reference: How to make output of any shell command unbuffered?
Here's an example
#!/bin/bash
coproc mycoproc {
./a.out # your C++ code
}
# input to "std::cin >> num;"
echo "1" >&${mycoproc[1]}
# get output from "std::cout << sum << std::endl;"
# "-t 3" means that it waits for 3 seconds
read -t 3 -u ${mycoproc[0]} var
# print it
echo $var
echo "2" >&${mycoproc[1]}
read -t 3 -u ${mycoproc[0]} var
echo $var
echo "3" >&${mycoproc[1]}
read -t 3 -u ${mycoproc[0]} var
echo $var
# you can get PID
kill $mycoproc_PID
output will be
1
3
6
If your bash is older than 4.0, using mkfifo can do the same thing like:
#!/bin/bash
mkfifo f1 f2
exec 4<> f1
exec 5<> f2
./a.out < f1 > f2 &
echo "1" >&4
read -t 3 -u 5 var
echo $var
rm f1 f2
Considering that your C++ program reads from standard output, and prints to standard output, it's easy to put it inside a simple chain of pipes:
command_that_writes_output | your_cpp_program | command_that_handle_output
In your specific case you probably need to modify the program to only handle one single input and writing one single output, and remove the loop. Because then you can do it very simple, like this:
for i in 1 2 3 4; do
result=`echo $i | ./program`
echo $((result * result))
done

How does grep know it is writing to the input file?

If I try to redirect the output of grep to the same file that it is reading from, like so:
$ grep stuff file.txt > file.txt
I get the error message grep: input file 'file.txt' is also the output. How does grep determine this?
According to the GNU grep source code, the grep check the i-nodes of the input and the output:
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat)) /* <------------------ */
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = 1;
goto closeout;
}
The out_stat is filled by calling fstat against STDOUT_FILENO.
if (fstat (STDOUT_FILENO, &tmp_stat) == 0 && S_ISREG (tmp_stat.st_mode))
out_stat = tmp_stat;
Looking at the source code - you can see that it checks for this case (the file is already open for reading by grep) and reports it, see the SAME_INODE check below:
/* If there is a regular file on stdout and the current file refers
to the same i-node, we have to report the problem and skip it.
Otherwise when matching lines from some other input reach the
disk before we open this file, we can end up reading and matching
those lines and appending them to the file from which we're reading.
Then we'd have what appears to be an infinite loop that'd terminate
only upon filling the output file system or reaching a quota.
However, there is no risk of an infinite loop if grep is generating
no output, i.e., with --silent, --quiet, -q.
Similarly, with any of these:
--max-count=N (-m) (for N >= 2)
--files-with-matches (-l)
--files-without-match (-L)
there is no risk of trouble.
For --max-count=1, grep stops after printing the first match,
so there is no risk of malfunction. But even --max-count=2, with
input==output, while there is no risk of infloop, there is a race
condition that could result in "alternate" output. */
if (!out_quiet && list_files == 0 && 1 < max_count
&& S_ISREG (out_stat.st_mode) && out_stat.st_ino
&& SAME_INODE (st, out_stat))
{
if (! suppress_errors)
error (0, 0, _("input file %s is also the output"), quote (filename));
errseen = true;
goto closeout;
}
Here is how to write back to some file:
grep stuff file.txt > tmp && mv tmp file.txt
try pipline with cat or tac:
cat file | grep 'searchpattern' > newfile
it's best practice and short for realization

Resources