Writing Block buffered data to a File without fflush(stdout) - linux

From what I understood about buffers: a buffer is a temporarily stored data.
For example: let's assume that you wanted to implement an algorithm for determining whether something is speech or just noise. How would you do this using a constant stream flow of sound data? It would be very difficult. Therefore, by storing this into an array you can perform analysis on this data.
This array of data is called a buffer.
Now, I have a Linux command where the output is continuous:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0
If I were to write the output of this command to a file, I won't be able to write it, because the output is probably block buffered and only an empty text file will be generated when I terminate the output of the above command (CTRL+C).
Here is what i mean by Block Buffered.
The three types of buffering available are unbuffered, block
buffered, and line buffered. When an output stream is unbuffered,
information appears on the destination file or terminal as soon as
written; when it is block buffered many characters are saved up and
written as a block; when it is line buffered characters are saved
up until a newline is output or input is read from any stream
attached to a terminal device (typically stdin). The function
fflush(3) may be used to force the block out early. (See
fclose(3).) Normally all files are block buffered. When the first
I/O operation occurs on a file, malloc(3) is called, and a buffer
is obtained. If a stream refers to a terminal (as stdout normally
does) it is line buffered. The standard error stream stderr is
always unbuffered by default.
Now, executing this command,
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0 > outputfile.txt
An empty file will be generated because the buffer block might have not been completed when I terminated the process, and since i don't know the block buffer size, there is no way to wait for the block is complete.
In order to write the output of this command to a file I have to use fflush() inside awk, which would successfully write the output into the text file, which I have already done successfully.
Here it goes:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/GGA/ {print "Latitude:",$3,$4,"Longitude:",$5,$6,"Altitude:",$10,$11,"Time:",substr($2+50000,1,2),".",substr($2,3,2),".",substr($2,5,2); fflush(stdout) }' < /dev/ttyUSB0 | head -n 2 > GPS_data.txt
But my question is:
Is there any way to declare the buffer block size so that I would know when the buffer block in generated, so eliminating the need of using fflush()?
OR
Is there anyway to change buffer type from Block buffered to unbuffered or line buffered ?

You can use stdbuf to run a command with a modified buffer size.
For example, stdbuf -o 100 awk ... will run awk with a 100 byte standard output buffer.

Related

gzip: unexpected end of file when using gzip

I have to process a file using my Linux machine.
When I try to write my output to a csv file then gzip it in the same line of script:
processing > output.csv | gzip -f output.csv
I get an 'unexpected end of file' error. Even when I download the file using the Linux machine I get the same error.
When I do not gzip via terminal (or in a single line) everything works fine.
Why does it fail like this when the commands are all in a single line?
You should remove > output.csv
You can either:
Use a pipe: | or:
Redirect to a file
For the same stream (stdout)
You can redirect errors from stderr to a file with 2>errors.txt or they will display on screen
When you redirect a process' IO with the > operator, its output cannot be used by a pipe afterwards (because there's no "output" anymore to be piped). You have two options:
processing > output.csv &&
gzip output.csv
Writes the unprocessed output of your program to the file output.csv and then in a second task gzips this file, replacing it with output.gz. Depending on the amount of data, this might not be feasible (storage reqirements are the full uncompressed output PLUS the compressed size)
processing | gzip > output.csv.gz
This will compress the output of your process in-line and write it directly to the output file, without storing the uncompressed output in an intermediate file.

Chronologically capturing STDOUT and STDERR

This very well may fall under KISS (keep it simple) principle but I am still curious and wish to be educated as to why I didn't receive the expected results. So, here we go...
I have a shell script to capture STDOUT and STDERR without disturbing the original file descriptors. This is in hopes of preserving the original order of output (see test.pl below) as seen by a user on the terminal.
Unfortunately, I am limited to using sh, instead of bash (but I welcome examples), as I am calling this from another suite and I may wish to use it in a cron in the future (I know cron has the SHELL environment variable).
wrapper.sh contains:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
${command} >${out} 2>${err}
test.pl contains:
#!/usr/bin/perl
print "1: stdout1\n";
print STDERR "2: stderr1\n";
print "3: stdout2\n";
In the scenario:
sh wrapper.sh /tmp/xxx perl test.pl
STDOUT contains:
1: stdout1
3: stdout2
STDERR contains:
2: stderr1
All good so far...
/tmp/xxx contains:
2: stderr1
1: stdout1
3: stdout2
However, I was expecting /tmp/xxx to contain:
1: stdout1
2: stderr1
3: stdout2
Can anyone explain to me why STDOUT and STDERR are not appending /tmp/xxx in the order that I expected? My guess would be that the backgrounded tee processes are blocking the /tmp/xxx resource from one another since they have the same "destination". How would you solve this?
related: How do I write stderr to a file while using "tee" with a pipe?
It is a feature of the C runtime library (and probably is imitated by other runtime libraries) that stderr is not buffered. As soon as it is written to, stderr pushes all of its characters to the destination device.
By default stdout has a 512-byte buffer.
The buffering for both stderr and stdout can be changed with the setbuf or setvbuf calls.
From the Linux man page for stdout:
NOTES: The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output. The buffering mode of the standard streams (or any other stream) can be changed using the setbuf(3) or setvbuf(3) call. Note that in case stdin is associated with a terminal, there may also be input buffering in the terminal driver, entirely unrelated to stdio buffering. (Indeed, normally terminal input is line buffered in the kernel.) This kernel input handling can be modified using calls like tcsetattr(3); see also stty(1), and termios(3).
After a little bit more searching, inspired by #wallyk, I made the following modification to wrapper.sh:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
script -q -F 2 ${command} >${out} 2>${err}
Which now produces the expected:
1: stdout1
2: stderr1
3: stdout2
The solution was to prefix the $command with script -q -F 2 which makes script quite (-q) and then forces file descriptor 2 (STDOUT) to flush immediately (-F 2).
I am now researching to determine how portable this is. I think -F pipe may be Mac and FreeBSD and -f or --flush may be other distros...
related: How to make output of any shell command unbuffered?

How to make line buffering work when pipelining stderr?

What I want to achieve is adding timestamps before each line in a log file. The log file receives both the stdout and stderr.
#!/bin/sh
stdbuf -o0 -e0 continuously_running_command 2>&1 | stdbuf -o0 -e0 ts >> log_file
The utility ts adds the timestamps (I've tried to achieve this with bash code as well). stdbuf does not operate when using it in this pipe. When removing the pipe and redirecting only the stderr without adding the timestamps it works fine.
Any idea on how to fix it?
Perhaps the problem is with the buffering modes used:
stdbuf -o0 -e0 continuously_running_command 2>&1 | stdbuf -o0 -e0 ts >> log_file
According to the stdbuf manual page, your choice of 0 may allow the standard out and standard error to mix together in an unexpected manner:
If MODE is 'L' the corresponding stream will be line buffered. This option is invalid with standard input.
If MODE is '0' the corresponding stream will be unbuffered.
When unbuffered, it means that writes from the application in anything but full lines is more likely to result in partial lines being written on either stream. It is not uncommon for programs to write error messages in parts, e.g., a filename, then a message. Using line-buffering reduces the likelihood of mingling incomplete lines.

What is the order of redirection in terminal?

I want to take input from file input.txt and write output of execution to output.txt What is the right order? The below does not work.
./a.out < input.txt > output.txt
EDIT
Do I have to wait for execution to complete for it to be written? I usually break in the middle to see if o/p is getting written as run time is very high.
CLARIFICATION:
This C program (P1) iterates through a loop and feeds the loop value x to a system() call which calls another C program (P2) using ./P2 < x. Program P2 executes for each value of x and outputs to screen. I want to the complete output of both programs to output.txt.
If you're killing the command before it finishes, this is probably a buffering issue. Line-buffered terminal output and block-buffered file output are default behaviors in the C stdio library, so redirection can cause output to be buffered until a few kilobytes have been written.
Some programs have a command line option to force line-buffered or unbuffered output. They do this by calling setvbuf. If that a.out is a program you wrote, you could addsetvbuf(stdout, NULL, _IOLBF, 0);
If the program is not yours and you can't recompile it, there is a utility called stdbuf that might help, as in stdbuf -oL ./a.out < in > out
stdbuf is kind of a kludge though. I wouldn't use it unless there is no other option.

Bash output stream write to a file

so i am running this on bash:
# somedevice -getevent
What this command does is it just keeps running, and everytime my device sends a certain data, say it detects change in temperature, it outputs something like this
/dev/xyz: 123 4567 8910112238 20
/dev/xyz: 123 4567 8915712347 19
/dev/xyz: 123 4567 8916412345 22
/dev/xyz: 123 4567 8910312342 25
/dev/xyz: 123 4567 8910112361 18
/dev/xyz: 123 4567 8910112343 20
And this just keeps running and as soon as it has any cause it outputs something. So there is no end to execution.
No the echo is working perfectly, however when i am trying to use the '>' operator this doesn't seem to write to file.
so for instance
#somedevice -getevent > my_record_file
this doesn't work properly, my_record_file only gets data written to it in intervals, however i want to be written immediately.
Any ideas?
The output is being buffered because the C standard library changes the output buffering mode depending on whether or not stdout is a terminal device. If it's a terminal device (according to isatty(3)), then stdout is line-buffered: it gets flushed every time a newline character gets written. If it's not a terminal device, then it's fully buffered: it only gets flushed whenever a certain amount of data (usually something on the order of 4 KB to 64 KB) gets written.
So, when you redirect the command's output to a file using the shell's > redirection operator, it's no longer outputting to a terminal and it buffers its output. A program can change its buffering mode with setvbuf(3) and friends, but the program has to cooperate to do this. Many programs have command line options to make them line-buffered, e.g. grep(1)'s --line-buffered option. See if your command has a similar option.
If you don't have such an option, you can try using a tool such as unbuffer(1) to unbuffer the output stream, but it doesn't always work and isn't a standard utility, so it's not always available.
The command somedevice probably uses the "Standard Input/Output Library", and in that library, the buffering is on by default. It is switched off when the output does to a terminal/console.
Can you modify the somedevice program? If not, you can still hack around it. See http://www.pixelbeat.org/programming/stdio_buffering/ for details.
You can try 'tee':
somedevice -getevent | tee -a my_record_file
The '-a' option is to append instead of just replacing the content.
This is probably because your "somedevice -getevent" command's stdout is being block-buffered. According to this, stdout is by default line-buffered (i.e. what you want) if stdout is a terminal, and block-buffered otherwise.
I'd have a look at the manual for your somedevice command to see if you can force the output to be unbuffered or line-buffered. If not, stdbuf -oL somedevice -getevent > my_record_file should do what you want.

Resources