How exactly stdin, stdout, stderr are implemented in linux? - linux

How exactly stdin, stderr, stdout are implemented in LINUX?
They are certainly not physical files. They must be some sort of temporary storage arrangement made by OS in the RAM for every process.
Are these array data structures attached to every process separately?

stdin, stderr, and stdout are file descriptors (or FILE* wrappers around them, if you mean the C stdio objects bearing those names). File descriptors are numbers that index a per-process data structure in the kernel. That data structure records which I/O channels a process has open, I/O channel being my ad-hoc term for a file, device, socket, or pipe.
By convention, the first entry in the table has index 0 and is called the standard input, 1 is the standard output and 2 is the standard error channel. This is just a convention in Unix programs; as far as the kernel is concerned, nothing's special about these numbers.
Each I/O system call (read, write, etc.) takes a file descriptor that indicates which channel the call should operate on.

Related

Do I need to flush named pipes?

I cannot find whether named pipes are buffered, hence the question.
The manpage says https://linux.die.net/man/3/mkfifo:
A FIFO special file is similar to a pipe ... any process can open it for reading or writing, in the same way as an ordinary file.
Pipes are not buffered, no need to flush. But in a ordinary file, I would fflush (or fsync) the file descriptor.
How about named pipe?
Pipes are not buffered, no need to flush.
I'd actually put that the other way around: for most intents and purposes, pipes are nothing but buffer. It is not meaningful to flush them because there is no underlying device to receive the data.
Moreover, although POSIX does not explicitly forbid additional buffering of pipe I/O, it does place sufficient behavioral requirements that I don't think there's any way to determine from observation whether such buffering occurs, except possibly by whether fsync() succeeds. In other words, even if there were extra buffering, it should not be necessary to fsync() a pipe end.
But in a ordinary file, I
would fflush (or fsync) the file descriptor.
Well no, you would not fflush() a file descriptor. fflush() operates on streams, represented by FILE objects, not on file descriptors. This is a crucial distinction, because most streams are buffered at the C library level, independent of the nature of the file underneath. It is this library-level buffer that fflush() interacts with. You can control the library-level buffering mode of a stream via the setvbuf() function.
On those systems that provide it, fsync() operates at a different, lower level. It instructs the OS to ensure that all data previously written to the specified file descriptor has been delivered to the underlying storage device. In other words, it flushes OS-level buffers.
Note well that you can wrap a stream around a pipe-end file descriptor via the fdopen() function. That doesn't make the pipe require flushing any more than it did before, but the stream will be buffered by default, so flushing will be relevant to it.
Note, too, that some storage devices perform their own buffering, so that even after the data have been handed off to a storage device, it is not certain that they are immediately persistent.
How about named pipe?
The discussion above about stream I/O vs. POSIX descriptor-based I/O applies here, too. If you access a named pipe via a stream, then its interaction with fflush() will depend on the buffering of that stream.
But I suppose your question is more about os-level buffering and flushing. POSIX does not appear to say much concrete, but since you tag [linux] and refer to a Linux manual page in your question, I offer this in response:
The only difference between pipes and FIFOs is the manner in which
they are created and opened. Once these tasks have been accomplished,
I/O on pipes and FIFOs has exactly the same semantics.
(Linux pipe(7) manual page.)
I don't understand quite well what you try to ask, but as you have been already told, pipes are not more than buffer.
Historically, fifos (or pipes) consumed the direct blocks of the inode used to maintain them, and they tied to a file (having it a name or not) in some filesystem.
Today, I don't know the exact implementation details for a fifo, but basically, the kernel buffers all data the writers have already written, but the readers haven't read yet. The fifo has an upper limit (system defined) for the amount of buffer they can support, but normally that fails around 10-20kb of data.
The kernel buffers, but there's no delay between writers and readers, because as soon a writer writes on a pipe, the kernel awakens all the readers waiting for it to have data. The reverse is also true, in the case the pipe gets full of data, as soon as a reader consumes it, all the writers are awaken to allow for filling it again.
Anyway, your question about flushing has nothing to do with pipes (well, not like, let me explain myself) but with <stdio.h> package. <stdio.h> does buffer, and it handles buffering on each FILE * individually, so you have calls for flushing buffers when you want them to be write(2)n to disk.
has a dynamic behaviour that allows to optimize buffering and not force programmers to have to flush at each time. That depends on the type of file descriptor associated with a FILE * pointer.
When the FILE * pointer is associated to a serial tty (it does check that calling to isatty(3) call, which internally makes an ioctl(2) call, that allow <stdio.h> to see if you are against a serial device, a char device. If this happens, then <stdio.h> does line buffering that means that always when a '\n' char is output to a device, the buffer is automatically buffered.
This supposes an optimization problem, because when, for example you are using cat(1) to copy a file, the largest the buffer normally supposes the most efficient approach. Well, <stdio.h> comes to solve the problem, because when output is not a tty device, it makes full buffering, and only flushes the internal buffers of the FILE * pointer when it is full of data.
So the question is: How does <stdio.h> behave with a fifo (or pipe) node? The answer is simple.... is is not a char device (or a tty) so <stdio.h> does full buffering on it. If you are communicating data between two processes and you want the reader to receive the data as soon as you have printf(3)ed it, then you have better to fflush(3), because if you don't, you can be waiting for a response that never comes, because what you have written, has not yet been written (not by the kernel, but by the <stdio.h> library)
As I said, I don't know if this is exactly the answer to your question, but for sure it can give you a hint on where the problem could be.

Is file object local to every process or System wide?

As a Linux device driver developer i was in the idea that file object is local structure to every process and its address is available in the fd table for the corresponding fd. But when i came across section 5.6 in Linux Programming interface by Michale Kerrisk which states that
Two different file descriptors that refer to the same open file description share
a file offset value. Therefore, if the file offset is changed via one file descriptor
(as a consequence of calls to read(), write(), or lseek()), this change is visible
through the other file descriptor. This applies both when the two file descrip
tors belong to the same process and when they belong to different processes.
I am befuddled...Kindly some one help me improve my understanding.
Each process does have its own file descriptor table, and each time a file is open()ed yields a separate file description. So there is sanity there!
The exception is when a file descriptor is duplicated, either within a process (via dup()) or across processes (by one process fork()ing a copy with all the same FDs, or by passing a file descriptor through a UNIX domain socket). When this happens, the two descriptors end up sharing some properties with each other, including the offset.
This is not necessarily a bad thing. It means, for instance, that two processes that are both writing to a shared file descriptor will not end up overwriting each other's output. It can sometimes have unexpected results, though. But it's not usually something that you'd end up with without knowing about it.

"Thread safety" of appending to single file from multiple processes?

Let's say I have X processes opening file Y for appending. Each process only writes a single line (with a \n) at the time (really log entries).
Is each line guaranteed not to be interleaved incorrectly in file Y ?
UPDATE: local attached file system.
The question depends on what type of write is going on. If you are using standard I/O with buffering, which is typically most program's default, then the buffer will only be flushed after several lines have been written and will when flushed will not necessarily be a integral number of lines. If you are using write(2) or have changed the default stdio buffering to be line or unbuffered, then it will PROBABLY be interleaved correctly as long as the lines are reasonable sized (certainly if lines are less than 512 bytes).

Does each Unix file description have its own read/write buffers?

In reference to this question about read() and write(), I'm wondering if each open file description has its own read and write buffers or if perhaps there's a single read and write buffer for a file when it has been opened multiple times at once. I'm curious because this would have an effect on what exactly happens with overlapping writes to the same file. Perhaps this is something that varies among Unixes?
(To my understanding, "file description" refers to the info/options about an open file, such as the current marker position. "File descriptor", in contrast, refers to just the number used in a process to refer to a description.)
This depends a bit on whether you are talking about sockets or actual files.
Strictly speaking, a descriptor never has its own buffers; it's just a handle to a deeper abstraction.
File system objects have their "own" buffers, at least when they are required. That is, if a program writes less than the file system block size, the kernel has no choice but to read a FS block and merge the write with the existing data.
This buffer is attached to the vnode and at a lower level, possibly an inode. It's owned by the file and not the descriptor. It may be kept around for a long time if memory is available.
In the case of a socket, then a stream, but not specifically a single descriptor, does actually have buffers that it owns.
If the files were open in blocking mode then yes there should only be one buffer. I would bet the default is non blocking for performance reasons.

Bash Pipe Handling

Does anyone know how bash handles sending data through pipes?
cat file.txt | tail -20
Does this command print all the contents of file.txt into a buffer, which is then read by tail? Or does this command, say, print the contents of file.txt line by line, and then pause at each line for tail to process, and then ask for more data?
The reason I ask is that I'm writing a program on an embedded device that basically performs a sequence of operations on some chunk of data, where the output of one operation is send off as the input of the next operation. I would like to know how linux (bash) handles this so please give me a general answer, not specifically what happens when I run "cat file.txt | tail -20".
EDIT: Shog9 pointed out a relevant Wikipedia Article, this didn't lead me directly to the article but it helped me find this: http://en.wikipedia.org/wiki/Pipeline_%28Unix%29#Implementation which did have the information I was looking for.
I'm sorry for not making myself clear. Of course you're using a pipe and of course you're using stdin and stdout of the respective parts of the command. I had assumed that was too obvious to state.
What I'm asking is how this is handled/implemented. Since both programs cannot run at once, how is data sent from stdin to stdout? What happens if the first program generates data significantly faster than the second program? Does the system just run the first command until either it's terminated or it's stdout buffer is full, and then move on to the next program, and so on in a loop until no more data is left to be processed or is there a more complicated mechanism?
I decided to write a slightly more detailed explanation.
The "magic" here lies in the operating system. Both programs do start up at roughly the same time, and run at the same time (the operating system assigns them slices of time on the processor to run) as every other simultaneously running process on your computer (including the terminal application and the kernel). So, before any data gets passed, the processes are doing whatever initialization necessary. In your example, tail is parsing the '-20' argument and cat is parsing the 'file.txt' argument and opening the file. At some point tail will get to the point where it needs input and it will tell the operating system that it is waiting for input. At some other point (either before or after, it doesn't matter) cat will start passing data to the operating system using stdout. This goes into a buffer in the operating system. The next time tail gets a time slice on the processor after some data has been put into the buffer by cat, it will retrieve some amount of that data (or all of it) which leaves the buffer on the operating system. When the buffer is empty, at some point tail will have to wait for cat to output more data. If cat is outputting data much faster than tail is handling it, the buffer will expand. cat will eventually be done outputting data, but tail will still be processing, so cat will close and tail will process all remaining data in the buffer. The operating system will signal tail when their is no more incoming data with an EOF. Tail will process the remaining data. In this case, tail is probably just receiving all the data into a circular buffer of 20 lines, and when it is signalled by the operating system that there is no more incoming data, it then dumps the last twenty lines to its own stdout, which just gets displayed in the terminal. Since tail is a much simpler program than cat, it will likely spend most of the time waiting for cat to put data into the buffer.
On a system with multiple processors, the two programs will not just be sharing alternating time slices on the same processor core, but likely running at the same time on separate cores.
To get into a little more detail, if you open some kind of process monitor (operating system specific) like 'top' in Linux you will see a whole list of running processes, most of which are effectively using 0% of the processor. Most applications, unless they are crunching data, spend most of their time doing nothing. This is good, because it allows other processes to have unfettered access to the processor according to their needs. This is accomplished in basically three ways. A process could get to a sleep(n) style instruction where it basically tells the kernel to wait n milliseconds before giving it another time slice to work with. Most commonly a program needs to wait for something from another program, like 'tail' waiting for more data to enter the buffer. In this case the operating system will wake up the process when more data is available. Lastly, the kernel can preempt a process in the middle of execution, giving some processor time slices to other processes. 'cat' and 'tail' are simple programs. In this example, tail spends most of it's time waiting for more data on the buffer, and cat spends most of it's time waiting for the operating system to retrieve data from the harddrive. The bottleneck is the speed (or slowness) of the physical medium that the file is stored on. That perceptible delay you might detect when you run this command for the first time is the time it takes for the read heads on the disk drive to seek to the position on the harddrive where 'file.txt' is. If you run the command a second time, the operating system will likely have the contents of file.txt cached in memory, and you will not likely see any perceptible delay (unless file.txt is very large, or the file is no longer cached.)
Most operations you do on your computer are IO bound, which is to say that you are usually waiting for data to come from your harddrive, or from a network device, etc.
Shog9 already referenced the Wikipedia article, but the implementation section has the details you want. The basic implementation is a bounded buffer.
cat will just print the data to standard out, which happens to be redirected to the standard in of tail. This can be seen in the man page of bash.
In other words, there is no pausing going on, tail is just reading from standard in and cat is just writing to standard out.

Resources