Can dup2 really return EINTR? - linux

In the spec and two implementations:
According to POSIX, dup2() may return EINTR.
The linux man pages list it as permitted.
The FreeBSD man pages indicate it's not ever returned. Is this a bug - since its close implementation can EINTR (at least for TCP linger if nothing else).
In reality, can Linux return EINTR for dup2()? Presumably if so, it would be because close() decided to wait and a signal arrived (TCP linger or dodgy file system drivers that try to sync when closing).
In reality, does FreeBSD guarantee not to return EINTR for dup2()? In that case, it must be that it doesn't bother waiting for any outstanding operations on the old fd and simply unlinks the fd.
What does POSIX dup2() mean when it refers to "closing" (not in italics), rather than referencing the actual close() function - are we to understand it's just talking about "closing" it in an informal way (unlinking the file descriptor), or is it attempting to say that the effect should be as if the close() function were first called and then dup2() were called atomically.
If fildes2 is already a valid open file descriptor, it shall be closed first, unless fildes is equal to fildes2 in which case dup2() shall return fildes2 without closing it.
If dup2() does have to close, wait, then atomically dup, it's going to be a nightmare for implementors! It's much worse than the EINTR with close() fiasco. Cowardly POSIX doesn't even say if the dup took place in the case of EINTR...

Here's the relevant information from the C/POSIX library documentation with respect to the standard Linux implementation:
If OLD and NEW are different numbers, and OLD is a valid
descriptor number, then `dup2' is equivalent to:
close (NEW);
fcntl (OLD, F_DUPFD, NEW)
However, `dup2' does this atomically; there is no instant in the
middle of calling `dup2' at which NEW is closed and not yet a
duplicate of OLD.
It lists the possible error values returned by dup and dup2 as EBADF, EINVAL, and EMFILE, and no others. The documentation states that all functions that can return EINTR are listed as such, which indicates that these don't. Note that these are implemented via fcntl, not a call to close.

8 years later this still seems to be undocumented.
I looked at the linux sources and my conclusion is that dup2 can't return EINTR in a current version of Linux.
In particular, the function do_dup2 in fs/file.c ignores the return value of filp_close, which is what can cause close to return EINTR in some cases (see fs/open.c and fs/file.c).
The way dup2 works is it first makes the atomic file descriptor update, and then waits for any flushing that needs to happen on close. Any errors happening on flush are simply ignored.

Related

Why linux omits driver close function

all.
I have my module - character driver for several DMA channels. It has open/close/ioctl functions for a user for each DMA.
Everything worked fine when used from one application thread - 3 DMAs worked as required.
When I added another thread to utilze one of DMAs separately I fall into very strange case - application call to function close() is executed with return code zero WITHOUT(!) entering my driver's close function (printk text is absent). Just to test, I called close() again immediately - it returned with errno=9 (bad descriptor). Closing the next channel in the next code line works fine.
I inserted all possible semaphores to protect the code both in the application and in the driver - no success.
I clearly understand that the issue is in some racing condition - delaying a bit the second thread solves the problem. But I can't catch it.
What, makes me crazy is - how can it be that function call close() does not reach my driver? I traced the call till the assembler "svc 0" - the file handler is correct, the return code is 0.
So, in which condition Linux may omit calling the driver function close()?
UPDATE: I inserted a call to the IOCTL function immediately after "not working" close() - the return code was -9 - bad file descriptor. This means that Linux really closed this handler without entering my "close" function!
UPDATE2: I added a new IOCTL to the driver, which only calls the close() driver function directly. In the application: I called this new IOCTL just one line before calling close(). Everything now works fine!
More than that - I removed all semaphores for testing (I wrote everything reentrant, but added semaphores supposing an error)! Still everything is fine!
Thanks to Ian Abbot the problem is found.
My driver needs one large memory region for control and one huge region for each channel buffer. For this, after the first open() call I called mmap() for the control region All consecutive calls to open() mmapped() only their corresponding buffer.
This effectively lead to the reference counter of the first file handler to be set to 2. And when I tried to "normally" close it, my driver's release function was not called! And, as it wasn't called, it did not clean the driver state and blocked further reopening.
SUMMARY OF THE THINGS I MISSED/didn't know:
mmap system call increases the reference call of the file.
mmap system call is per file and not per driver.
Again, thousands of thanks to Ian Abbot!

what will happen if we close a closed socket

I wonder what will happen if we close a closed socket or a non-existing socket?
Will the exception affect the other sockets which are sending/receiving packets?
Edit:
Sorry, I didn't say it clearly. I mean I know what it will return from close or shutdown function and what the return means, but I don't know what it affects the existing sockets.
Potentially, yes. If you call close on a random integer which used to be an fd, you might break some other part of your code that's just opened another connection that got given the same fd number. Therefore, you should never double-close an fd: although it's perfectly safe from the kernel's point of view (you harmlessly get EBADF), it can seriously mess up your application.
Or close(): per http://pubs.opengroup.org/onlinepubs/000095399/functions/close.html
will return -1 and set errno to EBADF. The fildes argument is not a valid file descriptor.

Will select() block if called while there is still data to be read?

If a socket has data to be read and the select() function is called, will select():
Return immediately, indicating the socket is ready for reading, or
Block until more data is received on the socket
??
It can easily be tested, but I assure you select() will never block if there is data already available to read on one of the readfds. If it did block in that case, it wouldn't be very useful for programming with non-blocking I/O. Take the example where you are looping on select(), you see that there is data to be read, and you read it. Then while you are processing the data read, more data comes in. When you return to select() it blocks, waiting for more data. However your peer on the other side of the connection is waiting for a response to the data already sent. Your program ends up blocking forever. You could work around it with timeouts and such, but the whole point is to make non-blocking I/O efficient.
If an fd is at EOF, select() will never block even if called multiple times.
man 2 select seems to answer this question pretty directly:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
So at least according to the manual, it would return immediately if there is any data available.

epoll_create cleanup?

I'm using epoll_create to wait on a socket.
What is the life-cycle of the returned resource tied to? Is there something like an epoll_destroy or is it tied to the socket's close or destory call?
Can I re-use the result of epoll_create if close my socket and re-open a new one. Or should I just call epoll_create and forget about the previous result of epoll_create.
epoll_create(2) returns a file descriptor, so you just use close(2) on it when done.
Then, the idea of I/O multiplexing, often called Asynchronous I/O, is to wait for multiple events, and handle them one at a time. That means you generally need only one polling file descriptor.
epoll(7) manual page contains basic example of suggested API usage.

When does the write() system call write all of the requested buffer versus just doing a partial write?

If I am counting on my write() system call to write say e.g., 100 bytes, I always put that write() call in a loop that checks to see if the length that gets returned is what I expected to send and, if not, it bumps the buffer pointer and decreases the length by the amount that was written.
So once again I just did this, but now that there's StackOverflow, I can ask you all if people know when my writes will write ALL that I ask for versus give me back a partial write?
Additional comments: X-Istence's reply reminded me that I should have noted that the file descriptor was blocking (i.e., not non-blocking). I think he is suggesting that the only way a write() on a blocking file descriptor will not write all the specified data is when the write() is interrupted by a signal. This seems to make at least intuitive sense to me...
write may return partial write especially using operations on sockets or if internal buffers full. So good way is to do following:
while(size > 0 && (res=write(fd,buff,size))!=size) {
if(res<0 && errno==EINTR)
continue;
if(res < 0) {
// real error processing
break;
}
size-=res;
buf+=res;
}
Never relay on what usually happens...
Note: in case of full disk you would get ENOSPC not partial write.
You need to check errno to see if your call got interrupted, or why write() returned early, and why it only wrote a certain number of bytes.
From man 2 write
When using non-blocking I/O on objects such as sockets that are subject to flow control, write() and writev() may write fewer bytes than requested; the return value must be noted, and the remainder of the operation should be retried when possible.
Basically, unless you are writing to a non-blocking socket, the only other time this will happen is if you get interrupted by a signal.
[EINTR] A signal interrupted the write before it could be completed.
See the Errors section in the man page for more information on what can be returned, and when it will be returned. From there you need to figure out if the error is severe enough to log an error and quit, or if you can continue the operation at hand!
This is all discussed in the book: Advanced Unix Programming by Marc J. Rochkind, I have written countless programs with the help of this book, and would suggest it while programming for a UNIX like OS.
Writes shouldn't have any reason to ever write a partial buffer afaik. Possible reasons I could think of for a partial write is if you run out of disk space, you're writing past the end of a block device, or if you're writing to a char device / some other sort of device.
However, the plan to retry writes blindly is probably not such a good one - check errno to see whether you should be retrying first.

Resources