O_RDWR on named pipes with poll()

O_RDWR on named pipes with poll() - linux

I have gone through a variaty of different linux named pipe client/server implementations but most of them use the blocking defaults on reads/writes.
As I am already using poll() to check other flags I though it would be a good idea to check for incoming FIFO data via poll() as well...
After all the research I think that opening the pipe in O_RDWR mode is the only way to prevent an indefinitely number of EOF events on a pipe when no writer has opened it.
This way both ends of the pipe are closed and other clients can open the writable end as well. To respond back I would use separate pipes...
My problem is that although I have found some examples that use the O_RDWR flag the open() manpages describe this flag as being unefined when assigned to a FIFO. (http://linux.die.net/man/3/open)
But how would you use poll() on a pipe without O_RDWR? Do you think "O_RDWR" is a legitimate way to open pipes???

First, some preliminaries:
Using O_NONBLOCK and poll() is common practice -- not the other way around. To work successfully, you need to be sure to handle all poll() and read() return states correctly:
read() return value of 0 means EOF -- the other side has closed its connection. This corresponds (usually, but not on all OSes) to poll() returning a POLLHUP revent. You may want to check for POLLHUP before attempting read(), but it is not absolutely necessary since read() is guaranteed to return 0 after the writing side has closed.
If you call read() before a writer has connected, and you have O_RDONLY | O_NONBLOCK, you will get EOF (read() returning 0) repeatedly, as you've noticed. However, if you use poll() to wait for a POLLIN event before calling read(), it will wait for the writer to connect, and not produce the EOFs.
read() return value -1 usually means error. However, if errno == EAGAIN, this simply means there is no more data available right now and you're not blocking, so you can go back to poll() in case other devices need handling. If errno == EINTR, then read() was interrupted before reading any data, and you can either go back to poll() or simply call read() again immediately.
Now, for Linux:
If you open on the reading side with O_RDONLY, then:
The open() will block until there is a corresponding writer open.
poll() will give a POLLIN revent when data is ready to be read, or EOF occurs.
read() will block until either the requested number of bytes is read, the connection is closed (returns 0), it is interrupted by a signal, or some fatal IO error occurs. This blocking sort of defeats the purpose of using poll(), which is why poll() almost always is used with O_NONBLOCK. You could use an alarm() to wake up out of read() after a timeout, but that's overly complicated.
If the writer closes, then the reader will receive a poll() POLLHUP revent and read() will return 0 indefinitely afterwards. At this point, the reader must close its filehandle and reopen it.
If you open on the reading side with O_RDONLY | O_NONBLOCK, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read, or EOF occurs. poll() will also block until a writer is available, if none is present.
After all currently available data is read, read() will either return -1 and set errno == EAGAIN if the connection is still open, or it will return 0 if the connection is closed (EOF) or not yet opened by a writer. When errno == EAGAIN, this means it's time to return to poll(), since the connection is open but there is no more data. When errno == EINTR, read() has read no bytes yet and was interrupted by a signal, so it can be restarted.
If the writer closes, then the reader will receive a poll() POLLHUP revent, and read() will return 0 indefinitely afterwards. At this point the reader must close its filehandle and reopen it.
(Linux-specific:) If you open on the reading side with O_RDWR, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read. However, for named pipes, EOF will not cause POLLIN or POLLHUP revents.
read() will block until the requested number of bytes is read, it is interrupted by a signal, or some other fatal IO error occurs. For named pipes, it will not return errno == EAGAIN, nor will it even return 0 on EOF. It will just sit there until the exact number of bytes requested is read, or until it receives a signal (in which case it will return the number of bytes read so far, or return -1 and set errno == EINTR if no bytes were read so far).
If the writer closes, the reader will not lose the ability to read the named pipe later if another writer opens the named pipe, but the reader will not receive any notification either.
(Linux-specific:) If you open on the reading side with O_RDWR | O_NONBLOCK, then:
The open() will not block.
poll() will give a POLLIN revent when data is ready to be read. However, EOF will not cause POLLIN or POLLHUP revents on named pipes.
After all currently available data is read, read() will return -1 and set errno == EAGAIN. This is the time to return to poll() to wait for more data, possibly from other streams.
If the writer closes, the reader will not lose the ability to read the named pipe later if another writer opens the named pipe. The connection is persistent.
As you are rightly concerned, using O_RDWR with pipes is not standard, POSIX or elsewhere.
However, since this question seems to come up often, the best way on Linux to make "resilient named pipes" which stay alive even when one side closes, and which don't cause POLLHUP revents or return 0 for read(), is to use O_RDWR | O_NONBLOCK.
I see three main ways of handling named pipes on Linux:
(Portable.) Without poll(), and with a single pipe:
open(pipe, O_RDONLY);
Main loop:
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, the connection is closed, and all data has been received.
(Portable.) With poll(), and with the expectation that pipes, even named ones, are only opened once, and that once they are closed, must be reopened by both reader and writer, setting up a new pipeline:
open(pipe, O_RDONLY | O_NONBLOCK);
Main loop:
poll() for POLLIN events, possibly on multiple pipes at once. (Note: This prevents read() from getting multiple EOFs before a writer has connected.)
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EAGAIN, go back to poll() step.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, the connection is closed, and you must terminate, or close and reopen the pipe.
(Non-portable, Linux-specific.) With poll(), and with the expectation that named pipes never terminate, and may be connected and disconnected multiple times:
open(pipe, O_RDWR | O_NONBLOCK);
Main loop:
poll() for POLLIN events, possibly on multiple pipes at once.
read() as much data as needed, possibly looping on read() calls.
If read() == -1 and errno == EAGAIN, go back to poll() step.
If read() == -1 and errno == EINTR, read() all over again.
If read() == 0, something is wrong -- it shouldn't happen with O_RDWR on named pipes, but only with O_RDONLY or unnamed pipes; it indicates a closed pipe which must be closed and re-opened. If you mix named and unnamed pipes in the same poll() event-handling loop, this case may still need to be handled.

According to open(2) man page, you can pass O_RDONLY|O_NONBLOCK or O_WRONLY|O_NONBLOCK to avoid the open syscall to be blocked (you'll get errno == ENXIO in that case)
As I commented read also the fifo(7) and mkfifo(3) man pages.

Just keep an open O_WRONLY file descriptor in the reading process alongside the O_RDONLY one. This will achieve the same effect, ensuring that read() never returns end-of-file and that poll() and select() will block.
And it's 100% POSIX

Related

Semantically, why does epoll_wait sends me EPOLLIN, after read end had been shut down?

The question is straightforward. The shutdown(..., SHUT_RD) was called on the socket after read has returned zero. The epoll_wait keeps sending me EPOLLIN event for the socket.
I know I can, and must do, in my situation, call epoll_ctl to modify the event mask. Naive thoughts lead me to think that, if close removes record from the epoll list, then shutdown had to mark appropriate socket end.
The question is, semantically, why epoll_wait do report me with read available, given that read returned 0 socket were shut down for reading?

epoll_wait() will report EPOLLIN if trying to read from the socket would return immediately rather than blocking.
Once you do shutdown(..., SHUT_RD); all calls to read() return 0 immediately. Since it won't block, epoll_wait() reports that the socket is readable.

What if an fd is being closed while other function is blocking on it?

According to the manual page of pipe:
If a process attempts to read from an empty pipe, then read(2) will
block until data is available. If a process attempts to write to a
full pipe (see below), then write(2) blocks until sufficient data has
been read from the pipe to allow the write to complete. Nonblocking
I/O is possible by using the fcntl(2) F_SETFL operation to enable the
O_NONBLOCK open file status flag.
I have such a question:
Provided that a pipe buffer is empty;
A read is blocking on the pipe;
Now, if the write end is being closed, will the read be unblocked automatically?

Yes, the read will be unblocked and return EOF (0).

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.

is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

If a file is readable before epoll_ctl is called in edge-triggered mode, will a subsequent epoll_wait return immediately?

Does epoll guarantee that the first (or ongoing) call to epoll_wait after a file is registered with epoll_ctl for EPOLLIN and EPOLLET returns immediately in the case that the file was already readable prior to the epoll_ctl call? From my experiments with test programs, it appears that the answer is yes. Here are a couple examples to clarify my question:
Suppose we have initialized an epoll file efd and a file fd and the following event definition:
event.data.fd = fd;
event.events = EPOLLIN | EPOLLET;
Now consider this scenario:
thread1: writes data to fd
thread2: epoll_ctl (efd, EPOLL_CTL_ADD, fd, &event);
thread2: epoll_wait (efd, events, MAXEVENTS, -1);
Now does the call in step 3 return immediately? In my experience it does. Is this guaranteed?
Now consider a second scenario, extending the first:
thread1: writes data to fd
thread2: epoll_ctl (efd, EPOLL_CTL_ADD, fd, &event);
thread2: epoll_wait (efd, events, MAXEVENTS, -1);
thread2: epoll_ctl (efd, EPOLL_CTL_MOD, fd, &event);
thread2: epoll_wait (efd, events, MAXEVENTS, -1);
Does the call in step 5 return immediately? In my experience it does. Is it guaranteed?
The epoll man pages are not entirely clear on this issue. In particular, the man pages suggest that you should always read from a file until EAGAIN is returned when using edge-triggered mode. But it seems like those comments are assuming that you are not re-registering the file whenever you want to wait on the file.
what is the purpose of epoll's edge triggered option? is a related discussion. The first two comments to the first answer seem to confirm that the behavior that I see is expected.
https://gist.github.com/3900742 is a C test program that illustrates that epoll with a pipe seems to behave as I've described it.

As epoll is Linux specific there is no real specification, so it's pretty much up to what is actually implemented (the man pages try to describe that in some more user-friendly way, but don't provide all the details for edge-cases).
Looking at ep_insert and ep_modify both check the current event bits (irrespective of EPOLLET):
/*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/
revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt);
So that explains the behaviour you are seeing and it seems to have been done deliberately. But as there is no specification, there is no cast-iron guarantee that the behaviour won't change in the future.

Will select() block if called while there is still data to be read?

If a socket has data to be read and the select() function is called, will select():
Return immediately, indicating the socket is ready for reading, or
Block until more data is received on the socket
??

It can easily be tested, but I assure you select() will never block if there is data already available to read on one of the readfds. If it did block in that case, it wouldn't be very useful for programming with non-blocking I/O. Take the example where you are looping on select(), you see that there is data to be read, and you read it. Then while you are processing the data read, more data comes in. When you return to select() it blocks, waiting for more data. However your peer on the other side of the connection is waiting for a response to the data already sent. Your program ends up blocking forever. You could work around it with timeouts and such, but the whole point is to make non-blocking I/O efficient.
If an fd is at EOF, select() will never block even if called multiple times.

man 2 select seems to answer this question pretty directly:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
So at least according to the manual, it would return immediately if there is any data available.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string