linux fifo, when read peer returns, write peer will also return immediately? - linux

I'm on linux and I originally expected that for a fifo file, when I dump some content into it by a writer, the writer will wait for read peer to read "all" its content and then return. But seems not as I expected, I had a simple experiment:
First of all, I created a fifo file:
$mkfifo hfifo.txt
Then I had a "my.txt" file, having several lines":
$cat my.txt
line1
line2
line3
line4
I open a "tty1" to write to hfifo.txt with my.txt:
cat my.txt >hfifo.txt
I open another terminal "tty2",to read one line from it:
$read l1<hfifo.txt
Well to my supprise, as "read" finishes, the "cat" in my "tty1" also returns immediately. I "echo $l1" will print "line1". This is quite weird to me because I expected that the reader peer should read all the content being writen to the fifo, and then the writer peer(tty1) will return. But the actual result is, once the reader peer ends, writer peer also ends.
I am just curious
(1) how the writer peer know that there's no more reader to read the fifo so it finishes? I could be in a loop to call the "read" command to print each line of this file.
(2) Beside "cat" command as a reader to dump the fifo, is there a way for shell programming to read this fifo one line by one line?
Please kindly suggest, thanks!

'strace' comes in handy. You can see the following lines for file with 3 characters + newline:
read(3, "qqq\n", 131072) = 4
write(1, "qqq\n", 4) = 4
read(3, "", 131072) = 0
As you can see, both read() and write() return the number of characters read, and on the last interation return zero, which signals the process has ended.
regardins (2) There are commands that do other things, like sed awk and egrep that also read the file line by line, but to just read the file, AFAIK only cat.

Related

python3 reset stdin after end of file

On Linux, I want to read from stdin where stdin comes first from pipe and then from user input.
So the command looks like:
cat my-file | ./my-prog.py
After reading all the lines from the pipe:
inf = open(0, "r")
inf.readlines()
I want to get further input from user. I do it with input(). But I get EOFError: EOF when reading a line.
I need a way to reset the stdin before the call to input().
Trying sys.stdin.seek(0) gives
io.UnsupportedOperation: underlying stream is not seekable
I read that in c there is clearerr that do that but I was not able to find how to do it in python.

by default, does stderr start as a duplicate file descriptor of stdout?

Does stderr start out as a duplicate FD of stdout?
i.e. considering dup(2), is stderr initialized kind of like so?
int stderr = dup(stdout); // stdout = 1
In the BashGuide, there's a code example
$ grep proud file 'not a file' > proud.log 2> proud.log
The author states
We've created two FDs that both point to the same file, independently of each other. The results of this are not well-defined. Depending on how the operating system handles FDs, some information written via one FD may clobber information written through the other FD.
and further says
We need to prevent having two independent FDs working on the same destination or source. We can do this by duplicating FDs
So basically, 2 independent FDs on the same file = broken
Well, I know that stdout & stderr both point to my terminal by default. Since they can both function properly (i.e. i don't see mish-mashed output+error messages), does that mean that they're not independent FDs? And thus, stderr is a duplicate FD of stdout? (or vice versa)
No, stderr is not a duplicate of stdout.
They work in parallel, independent and asynchronously.
Which means that in a race condition, you might even 'mish-mash' as you say.
One practical difference is also that stderr output will be ignored when you pipe your output to a subsequent command:
Practical example:
$ cat tst.sh
#!/bin/bash
echo "written to stdout"
echo "written to stderr" 1>&2
exit 0
~$ ./tst.sh
written to stdout
written to stderr
~$ ./tst.sh | xargs -n1 -I{} echo "this came through the pipe:{}"
written to stderr
this came through the pipe:written to stdout

Writing Block buffered data to a File without fflush(stdout)

From what I understood about buffers: a buffer is a temporarily stored data.
For example: let's assume that you wanted to implement an algorithm for determining whether something is speech or just noise. How would you do this using a constant stream flow of sound data? It would be very difficult. Therefore, by storing this into an array you can perform analysis on this data.
This array of data is called a buffer.
Now, I have a Linux command where the output is continuous:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0
If I were to write the output of this command to a file, I won't be able to write it, because the output is probably block buffered and only an empty text file will be generated when I terminate the output of the above command (CTRL+C).
Here is what i mean by Block Buffered.
The three types of buffering available are unbuffered, block
buffered, and line buffered. When an output stream is unbuffered,
information appears on the destination file or terminal as soon as
written; when it is block buffered many characters are saved up and
written as a block; when it is line buffered characters are saved
up until a newline is output or input is read from any stream
attached to a terminal device (typically stdin). The function
fflush(3) may be used to force the block out early. (See
fclose(3).) Normally all files are block buffered. When the first
I/O operation occurs on a file, malloc(3) is called, and a buffer
is obtained. If a stream refers to a terminal (as stdout normally
does) it is line buffered. The standard error stream stderr is
always unbuffered by default.
Now, executing this command,
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/SUF/ {print $3,$4,$5,$6,$10,$11,substr($2,1,2),".",substr($2,3,2),".",substr($2,5,2)}' < /dev/ttyUSB0 > outputfile.txt
An empty file will be generated because the buffer block might have not been completed when I terminated the process, and since i don't know the block buffer size, there is no way to wait for the block is complete.
In order to write the output of this command to a file I have to use fflush() inside awk, which would successfully write the output into the text file, which I have already done successfully.
Here it goes:
stty -F /dev/ttyUSB0 ispeed 4800 && awk -F"," '/GGA/ {print "Latitude:",$3,$4,"Longitude:",$5,$6,"Altitude:",$10,$11,"Time:",substr($2+50000,1,2),".",substr($2,3,2),".",substr($2,5,2); fflush(stdout) }' < /dev/ttyUSB0 | head -n 2 > GPS_data.txt
But my question is:
Is there any way to declare the buffer block size so that I would know when the buffer block in generated, so eliminating the need of using fflush()?
OR
Is there anyway to change buffer type from Block buffered to unbuffered or line buffered ?
You can use stdbuf to run a command with a modified buffer size.
For example, stdbuf -o 100 awk ... will run awk with a 100 byte standard output buffer.

command line "cat >> " behavior

What does this command do cat t.txt >> t.txt ? let say the t.txt only have one line of text "abc123". I assume the output of "abc123" is appended to t.txt. So, I should have 2 lines of "abc123". However, it just going in to a infinite loop. It doesn't stop until I hit Control-C. Is this the expect behavior of >>?
cat program opens the file for reading, reads the file and writes to standard out.
>> is a shell append redirect.
What you are seeing is the following cycle:
cat reads a line from t.txt
cat prints the line to file
the line is appended to t.txt
cat tests if it is at the end of the file
That 4th step will always be false, because by the time the EOF check happens a new line has been written. cat waits because the write always happens first.
If you want to prevent that behavior, you can add a buffer in between:
$ cat t.txt | cat >> t.txt
In this way, the write occurs after cat t.txt checks for EOF
What you are trying to do by:
cat t.txt >> t.txt
is like telling your system to read t.txt line by line and append each line to t.txt. Or in better words, "append the file to itself". The file is being gradually filled up with repetitions of the original contents of the file -- the reason behind your infinite loop.
Generally speaking, try to stay away from reading and writing to the same file using redirections. Is it not possible to break this down to two steps -- 1. Read from file, output to a temporary file 2. append to the temporary file to the original file?
cat is a command in unix-like systems that concatenates multiple input files and sends their result to the standard output. If only one file is specified, it just outputs that one file. The >> part redirects the output to the given file name, in your case t.txt.
But what you have says, overwrite t.txt with the contents of itself. I don't think this behavior is defined, and so I'm not surprised that you have an infinite loop!

exec n<&m versus exec n>&m -- based on Sobell's Linux book

In Mark Sobell's A Practical Guide to Linux Commands, Editors, and Shell Programming, Second Edition he writes (p. 432):
The <& token duplicates an input file
descriptor; >& duplicates an output
file descriptor.
This seems to be inconsistent with another statement on the same page:
Use the following format to open or
redirect file descriptor n as a
duplicate of file descriptor m:
exec n<&m
and with an example also on the same page:
# File descriptor 3 duplicates standard input
# File descriptor 4 duplicates standard output
exec 3<&0 4<&1
If >& duplicates an output file descriptor then should we not say
exec 4>&1
to duplicate standard output?
The example is right in practice. The book's original explanation is an accurate description of what the POSIX standard says, but the POSIX-like shells I have handy (bash and dash, the only ones I believe are commonly seen on Linux) are not that picky.
The POSIX standard says the same thing as the book about input and output descriptors, and goes on to say this: for n<&word, "if the digits in word do not represent a file descriptor already open for input, a redirection error shall result". So if you want to be careful about POSIX compatibility, you should avoid this usage.
The bash documentation also says the same thing about <& and >&, but without the promise of an error. Which is good, because it doesn't actually give an error. Instead, empirically n<&m and n>&m appear to be interchangeable. The only difference between <& and >& is that if you leave off the fd number on the left, <& defaults to 0 (stdin) and >& to 1 (stdout).
For example, let's start a shell with fd 1 pointing at a file bar, then try out exactly the exec 4<&1 example, try to write to the resulting fd 4, and see if it works:
$ sh -c 'exec 4<&1; echo foo >&4' >bar; cat bar
foo
It does, and this holds using either dash or bash (or bash --posix) for the shell.
Under the hood, this makes sense because <& and >& are almost certainly just calling dup2(), which doesn't care whether the fds are opened for reading or writing or appending or what.
[EDIT: Added reference to POSIX after discussion in comments.]
If stdout is a tty, then it can safely be cloned for reading or writing. If stdout is a file, then it may not work. I think the example should be 4>&1. I agree with Greg that you can both read and write the clone descriptor, but requesting a redirection with <& is supposed to be done with source descriptors that are readable, and expecting stdout to be readable doesn't make sense. (Although I admit I don't have a reference for this claim.)
An example may make it clearer. With this script:
#!/bin/bash
exec 3<&0
exec 4<&1
read -p "Reading from fd 3: " <&3
echo From fd 3: $REPLY >&2
REPLY=
read -p "Reading from fd 4: " <&4
echo From fd 4: $REPLY >&2
echo To fd 3 >&3
echo To fd 4 >&4
I get the following output (the stuff after the : on "Reading from" lines is typed at the terminal):
$ ./5878384b.sh
Reading from fd 3: foo
From fd 3: foo
Reading from fd 4: bar
From fd 4: bar
To fd 3
To fd 4
$ ./5878384b.sh < /dev/null
From fd 3:
Reading from fd 4: foo
From fd 4: foo
./5878384b.sh: line 12: echo: write error: Bad file descriptor
To fd 4
$ ./5878384b.sh > /dev/null
Reading from fd 3: foo
From fd 3: foo
./5878384b.sh: line 9: read: read error: 0: Bad file descriptor
From fd 4:
To fd 3
Mind the difference between file descriptors and IO streams such as stderr and stdout.
The redirecting operators are just redirecting IO streams via different file descriptors (IO stream handling mechanisms); they do not do any copying or duplicating of IO streams (that's what tee(1) is for).
See: File Descriptor 101
Another test to show that n<&m and n>&m are interchangeable would be "to use either style of 'n<&-' or 'n>&-' for closing a file descriptor, even if it doesn't match the read/write mode that the file descriptor was opened with" (http://www.gnu.org/s/hello/manual/autoconf/File-Descriptors.html).

Resources