How to make line buffering work when pipelining stderr?

How to make line buffering work when pipelining stderr? - linux

What I want to achieve is adding timestamps before each line in a log file. The log file receives both the stdout and stderr.
#!/bin/sh
stdbuf -o0 -e0 continuously_running_command 2>&1 | stdbuf -o0 -e0 ts >> log_file
The utility ts adds the timestamps (I've tried to achieve this with bash code as well). stdbuf does not operate when using it in this pipe. When removing the pipe and redirecting only the stderr without adding the timestamps it works fine.
Any idea on how to fix it?

Perhaps the problem is with the buffering modes used:
stdbuf -o0 -e0 continuously_running_command 2>&1 | stdbuf -o0 -e0 ts >> log_file
According to the stdbuf manual page, your choice of 0 may allow the standard out and standard error to mix together in an unexpected manner:
If MODE is 'L' the corresponding stream will be line buffered. This option is invalid with standard input.
If MODE is '0' the corresponding stream will be unbuffered.
When unbuffered, it means that writes from the application in anything but full lines is more likely to result in partial lines being written on either stream. It is not uncommon for programs to write error messages in parts, e.g., a filename, then a message. Using line-buffering reduces the likelihood of mingling incomplete lines.

Related

Chronologically capturing STDOUT and STDERR

This very well may fall under KISS (keep it simple) principle but I am still curious and wish to be educated as to why I didn't receive the expected results. So, here we go...
I have a shell script to capture STDOUT and STDERR without disturbing the original file descriptors. This is in hopes of preserving the original order of output (see test.pl below) as seen by a user on the terminal.
Unfortunately, I am limited to using sh, instead of bash (but I welcome examples), as I am calling this from another suite and I may wish to use it in a cron in the future (I know cron has the SHELL environment variable).
wrapper.sh contains:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
${command} >${out} 2>${err}
test.pl contains:
#!/usr/bin/perl
print "1: stdout1\n";
print STDERR "2: stderr1\n";
print "3: stdout2\n";
In the scenario:
sh wrapper.sh /tmp/xxx perl test.pl
STDOUT contains:
1: stdout1
3: stdout2
STDERR contains:
2: stderr1
All good so far...
/tmp/xxx contains:
2: stderr1
1: stdout1
3: stdout2
However, I was expecting /tmp/xxx to contain:
1: stdout1
2: stderr1
3: stdout2
Can anyone explain to me why STDOUT and STDERR are not appending /tmp/xxx in the order that I expected? My guess would be that the backgrounded tee processes are blocking the /tmp/xxx resource from one another since they have the same "destination". How would you solve this?
related: How do I write stderr to a file while using "tee" with a pipe?

It is a feature of the C runtime library (and probably is imitated by other runtime libraries) that stderr is not buffered. As soon as it is written to, stderr pushes all of its characters to the destination device.
By default stdout has a 512-byte buffer.
The buffering for both stderr and stdout can be changed with the setbuf or setvbuf calls.
From the Linux man page for stdout:
NOTES: The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output. The buffering mode of the standard streams (or any other stream) can be changed using the setbuf(3) or setvbuf(3) call. Note that in case stdin is associated with a terminal, there may also be input buffering in the terminal driver, entirely unrelated to stdio buffering. (Indeed, normally terminal input is line buffered in the kernel.) This kernel input handling can be modified using calls like tcsetattr(3); see also stty(1), and termios(3).

After a little bit more searching, inspired by #wallyk, I made the following modification to wrapper.sh:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
script -q -F 2 ${command} >${out} 2>${err}
Which now produces the expected:
1: stdout1
2: stderr1
3: stdout2
The solution was to prefix the $command with script -q -F 2 which makes script quite (-q) and then forces file descriptor 2 (STDOUT) to flush immediately (-F 2).
I am now researching to determine how portable this is. I think -F pipe may be Mac and FreeBSD and -f or --flush may be other distros...
related: How to make output of any shell command unbuffered?

Writing Block buffered data to a File without fflush(stdout)

From what I understood about buffers: a buffer is a temporarily stored data.
For example: let's assume that you wanted to implement an algorithm for determining whether something is speech or just noise. How would you do this using a constant stream flow of sound data? It would be very difficult. Therefore, by storing this into an array you can perform analysis on this data.
This array of data is called a buffer.
Now, I have a Linux command where the output is continuous:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0
If I were to write the output of this command to a file, I won't be able to write it, because the output is probably block buffered and only an empty text file will be generated when I terminate the output of the above command (CTRL+C).
Here is what i mean by Block Buffered.
The three types of buffering available are unbuffered, block
buffered, and line buffered. When an output stream is unbuffered,
information appears on the destination file or terminal as soon as
written; when it is block buffered many characters are saved up and
written as a block; when it is line buffered characters are saved
up until a newline is output or input is read from any stream
attached to a terminal device (typically stdin). The function
fflush(3) may be used to force the block out early. (See
fclose(3).) Normally all files are block buffered. When the first
I/O operation occurs on a file, malloc(3) is called, and a buffer
is obtained. If a stream refers to a terminal (as stdout normally
does) it is line buffered. The standard error stream stderr is
always unbuffered by default.
Now, executing this command,
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0 > outputfile.txt
An empty file will be generated because the buffer block might have not been completed when I terminated the process, and since i don't know the block buffer size, there is no way to wait for the block is complete.
In order to write the output of this command to a file I have to use fflush() inside awk, which would successfully write the output into the text file, which I have already done successfully.
Here it goes:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/GGA/ {print "Latitude:",$3,$4,"Longitude:",$5,$6,"Altitude:",$10,$11,"Time:",substr($2+50000,1,2),".",substr($2,3,2),".",substr($2,5,2); fflush(stdout) }' < /dev/ttyUSB0 | head -n 2 > GPS_data.txt
But my question is:
Is there any way to declare the buffer block size so that I would know when the buffer block in generated, so eliminating the need of using fflush()?
OR
Is there anyway to change buffer type from Block buffered to unbuffered or line buffered ?

You can use stdbuf to run a command with a modified buffer size.
For example, stdbuf -o 100 awk ... will run awk with a 100 byte standard output buffer.

What is the order of redirection in terminal?

I want to take input from file input.txt and write output of execution to output.txt What is the right order? The below does not work.
./a.out < input.txt > output.txt
EDIT
Do I have to wait for execution to complete for it to be written? I usually break in the middle to see if o/p is getting written as run time is very high.
CLARIFICATION:
This C program (P1) iterates through a loop and feeds the loop value x to a system() call which calls another C program (P2) using ./P2 < x. Program P2 executes for each value of x and outputs to screen. I want to the complete output of both programs to output.txt.

If you're killing the command before it finishes, this is probably a buffering issue. Line-buffered terminal output and block-buffered file output are default behaviors in the C stdio library, so redirection can cause output to be buffered until a few kilobytes have been written.
Some programs have a command line option to force line-buffered or unbuffered output. They do this by calling setvbuf. If that a.out is a program you wrote, you could addsetvbuf(stdout, NULL, _IOLBF, 0);
If the program is not yours and you can't recompile it, there is a utility called stdbuf that might help, as in stdbuf -oL ./a.out < in > out
stdbuf is kind of a kludge though. I wouldn't use it unless there is no other option.

How do you get GHC to output compile errors to a file instead of standard output?

I'm trying to compile a haskell file that has a HUGE number of errors in it. I want to start debugging the first one but unfortunately there are so many that they go off the screen.
I want to pipe the error messages to a file so that I can read from the top, but normal methods don't seem to work.
I've tried:
ghc File.hs > errors.log
ghc File.hs >> errors.log
ghc File.hs | more
None of them work. Using > and >> only writes the first couple of lines to the file and then the rest to standard output. Using more, less, cat etc doesn't make any difference at all.
Is there a flag for GHC that will let me output to a file?
(I should probably let you know that I'm working on a Windows machine with Cygwin.)

Most programs write output to the standard output (file descriptor 1) and error messages to the standard error (file descriptor 2).
You can ask your shell to redirect the standard error to another location like this:
ghc File.hs > output.log 2> errors.log
or if you want them in the same file:
ghc File.hs > output.log 2>&1
See your shell's manpage section of redirections for full details. Note that the shells are picky about the order of the redirections.

You can also view the output directly, using the same redirect as sarnold's solution, but without the intermediate output file:
ghc File.hs 2>&1 | less
(same goes for more instead of less, etc.)

on-the-fly output redirection, seeing the file redirection output while the program is still running

If I use a command like this one:
./program >> a.txt &
, and the program is a long running one then I can only see the output once the program ended. That means I have no way of knowing if the computation is going well until it actually stops computing. I want to be able to read the redirected output on file while the program is running.
This is similar to opening a file, appending to it, then closing it back after every writing. If the file is only closed at the end of the program then no data can be read on it until the program ends. The only redirection I know is similar to closing the file at the end of the program.
You can test it with this little python script. The language doesn't matter. Any program that writes to standard output has the same problem.
l = range(0,100000)
for i in l:
if i%1000==0:
print i
for j in l:
s = i + j
One can run this with:
./python program.py >> a.txt &
Then cat a.txt .. you will only get results once the script is done computing.

From the stdout manual page:
The stream stderr is unbuffered.
The stream stdout is line-buffered
when it points to a terminal.
Partial lines will not appear until
fflush(3) or exit(3) is called, or
a new‐line is printed.
Bottom line: Unless the output is a terminal, your program will have its standard output in fully buffered mode by default. This essentially means that it will output data in large-ish blocks, rather than line-by-line, let alone character-by-character.
Ways to work around this:
Fix your program: If you need real-time output, you need to fix your program. In C you can use fflush(stdout) after each output statement, or setvbuf() to change the buffering mode of the standard output. For Python there is sys.stdout.flush() of even some of the suggestions here.
Use a utility that can record from a PTY, rather than outright stdout redirections. GNU Screen can do this for you:
screen -d -m -L python test.py
would be a start. This will log the output of your program to a file called screenlog.0 (or similar) in your current directory with a default delay of 10 seconds, and you can use screen to connect to the session where your command is running to provide input or terminate it. The delay and the name of the logfile can be changed in a configuration file or manually once you connect to the background session.
EDIT:
On most Linux system there is a third workaround: You can use the LD_PRELOAD variable and a preloaded library to override select functions of the C library and use them to set the stdout buffering mode when those functions are called by your program. This method may work, but it has a number of disadvantages:
It won't work at all on static executables
It's fragile and rather ugly.
It won't work at all with SUID executables - the dynamic loader will refuse to read the LD_PRELOAD variable when loading such executables for security reasons.
It's fragile and rather ugly.
It requires that you find and override a library function that is called by your program after it initially sets the stdout buffering mode and preferably before any output. getenv() is a good choice for many programs, but not all. You may have to override common I/O functions such as printf() or fwrite() - if push comes to shove you may just have to override all functions that control the buffering mode and introduce a special condition for stdout.
It's fragile and rather ugly.
It's hard to ensure that there are no unwelcome side-effects. To do this right you'd have to ensure that only stdout is affected and that your overrides will not crash the rest of the program if e.g. stdout is closed.
Did I mention that it's fragile and rather ugly?
That said, the process it relatively simple. You put in a C file, e.g. linebufferedstdout.c the replacement functions:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
char *getenv(const char *s) {
static char *(*getenv_real)(const char *s) = NULL;
if (getenv_real == NULL) {
getenv_real = dlsym(RTLD_NEXT, "getenv");
setlinebuf(stdout);
}
return getenv_real(s);
}
Then you compile that file as a shared object:
gcc -O2 -o linebufferedstdout.so -fpic -shared linebufferedstdout.c -ldl -lc
Then you set the LD_PRELOAD variable to load it along with your program:
$ LD_PRELOAD=./linebufferedstdout.so python test.py | tee -a test.out
0
1000
2000
3000
4000
If you are lucky, your problem will be solved with no unfortunate side-effects.
You can set the LD_PRELOAD library in the shell, if necessary, or even specify that library system-wide (definitely NOT recommended) in /etc/ld.so.preload.

If you're trying to modify the behavior of an existing program try stdbuf (part of coreutils starting with version 7.5 apparently).
This buffers stdout up to a line:
stdbuf -oL command > output
This disables stdout buffering altogether:
stdbuf -o0 command > output

Have you considered piping to tee?
./program | tee a.txt
However, even tee won't work if "program" doesn't write anything to stdout until it is done. So, the effectiveness depends a lot on how your program behaves.

If the program writes to a file, you can read it while it is being written using tail -f a.txt.

Your problem is that most programs check to see if the output is a terminal or not. If the output is a terminal then output is buffered one line at a time (so each line is output as it is generated) but if the output is not a terminal then the output is buffered in larger chunks (4096 bytes at a time is typical) This behaviour is normal behaviour in the C library (when using printf for example) and also in the C++ library (when using cout for example), so any program written in C or C++ will do this.
Most other scripting languages (like perl, python, etc.) are written in C or C++ and so they have exactly the same buffering behaviour.
The answer above (using LD_PRELOAD) can be made to work on perl or python scripts, since the interpreters are themselves written in C.

The unbuffer command from the expect package does exactly what you are looking for.
$ sudo apt-get install expect
$ unbuffer python program.py | cat -
<watch output immediately show up here>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to make line buffering work when pipelining stderr? - linux

Related

Chronologically capturing STDOUT and STDERR

Writing Block buffered data to a File without fflush(stdout)

What is the order of redirection in terminal?

How do you get GHC to output compile errors to a file instead of standard output?

on-the-fly output redirection, seeing the file redirection output while the program is still running

Categories

Resources