Is it possible to use epoll in one-shot level-triggered mode?
I couldn't find any information on it when I searched; it seems everyone uses edge-triggered mode.
When the EPOLLONESHOT flag is selected and you have pulled an event for a socket, then the socket won't get removed from epoll as many think but its events get disabled. You can enable them again using epoll_ctl / EPOLL_CTL_MOD.
An example case when the EPOLLONESHOT behavior comes handy is when you've read the available data from a socket into a buffer. That buffer would be emptied independently, but until it isn't empty, you have to disable the socket events, even if the socket has additional data. Then after the buffer got used and emptied, you can re-enable the socket.
The difference between the edge- and level-triggered "one shot" behaviors come only when you re-enable the socket. An example:
The socket receives 7K data (for now it's stored in a kernel buffer)
You wait for an input event, then the socket events get disabled due to the EPOLLONESHOT.
You read 4K into an application level buffer.
Later the application buffer gets used and emptied. You re-enable the socket with epoll_ctl / EPOLL_CTL_MOD.
Level-triggered EPOLLONESHOT:
Since 3K data is still present in the kernel buffers, the event is triggered again.
Edge-triggered EPOLLONESHOT:
It won't trigger an event again for the available data. You must test it by reading, and waiting for EAGAIN / EWOULDBLOCK.
If you want epoll to stop listening on a socket the you should use EPOLLONESHOT. If you do use EPOLLONESHOT then you will have to add the socket back to epoll after epoll signals on that socket. Thus EPOLLONESHOT is not EPOLLET. You can use EPOLLONESHOT without EPOLLET, but it might not be efficient. If you use both the flags, then you will have to make use of non blocking sockets and add a socket to epoll only once recv and send return with an EAGAIN error. Please refer to man page for details.
Related
How do I trigger an EPOLLRDHUP event on my tcp socket using the other thread programatically,
I have added the epoll instance with EPOLLRDHUP event and tried to generate the event, but it modifies the event on that FD , do not trigger it,
I want my first thread which is continuously waiting for event with epoll_wait(), should receive the event from EPOLLRDHUP, as soon as the other thread triggers it, I am not able to get how to trigger that event, I tried using write system call in another thread but that also do not trigger the event on socket FD I guess, poll should come out of blocking loop is my requirement, Please help, Thanks.
You can't generate epoll events on same file descriptor from another thread, EPOLLRDHUP would be generated based on something happening at the other end of the TCP connection.
If you have 1 thread waiting on epoll_wait() and you want to wake that thread up from another thread, you should create a pipe(), have your epoll_wait wait for read events on the reading side of the pipe in addition to any TCP sockets. When you want to wake up your thread, you write a byte on the writing side of the pipe.
(an eventfd could be used instead of the pipe to achieve the same too)
I'm doing select() on a blocking socket with no timeout select(sock+1, &rfd, NULL, NULL, NULL).
This happens in a thread whose objective is to dispatch incoming data. Another surveillance thread is managing a keep alive with the peer and when it detects a dead connection, it would close the socket.
I was expecting select() to return with -1 in that case. It does that on Windows but never on Linux, so the dispatch thread is locked forever when the peer disappear non-gracefully. For completeness, there is pending data to be transmitted on that, I've tried to play with SO_LINGER but that does not change anything.
The problem can be solved by setting a timeout in select() and in that case after close and timeout, select() ultimately exits with -1, but I thought, reading the doc, that select() with no timeout would still exit on close, even when the peer is not responding.
Do I misuse select() or is there a better way to handle half-open sockets ?
Yes you misuse the select. The man select states:
If a file descriptor being monitored by select() is closed in another thread, the result is unspecified. On some UNIX systems, select() unblocks and returns, with an indication that the file descriptor is ready (a subsequent I/O operation will likely fail with an error, unless another the file descriptor reopened between the time select() returned and the I/O operations was performed). On Linux (and some other systems), closing the file descriptor in another thread has no effect on select(). In summary, any application that relies on a particular behavior in this scenario must be considered buggy.
So you cannot close connection from other thread. Unfortunately the poll has the same issue.
EDIT
There are several possible solution and I have not sufficient information about your application. Following changed can be considered:
Use epoll instead of select if you are on linux or other modern polling mechanism if you on another OS. select is quite old function and it was designed in time when threading was not considered seriously.
Establish a communication channel between the select thread and the keep-alive thread. When keep alive thread detects a dead peer then don't close the socket itself but instructs the select thread to do that. Typically it can be done through a local socket. The local socket is added to select descriptor set and when the keep-alive thread writes something to it the select thread wakes up and can take an action.
I'm working on an IMAP server, and one of the operations is to upgrade the connection to use TLS (via the STARTTLS command). Our current architecture has one goroutine reading data from the socket, parsing the commands, and then sending logical commands over a channel. Another goroutine reads from that channel and executes the commands. This works great in general.
When executing STARTTLS, though, we need to stop the current in-progress Read() call, otherwise that Read() will consume bytes from the TLS handshake. We can insert another class in between, but then that class will be blocked on the Read() call and we have the same problem. If the network connection were a channel, we could add another signal channel and use a select{} block to stop reading, but network connections aren't channels (and simply wrapping it in a goroutine and channel just moves the problem to that goroutine).
Is there any way to stop a Read() call once it's begun, without waiting for a timeout to expire or something similar?
Read() call relies on your operating system behaviour under the hood. And its behaviour relies on socket behaviour.
If you're familiar with socket interface (which is almost a standard between operating systems with some small differences), you'll see that using synchronous communication mode for socket, read system call always blocks thread's execution until time out value expires, and you can't change this behaviour.
Go uses synchronous I/O under the hood for all its needs because goroutines make asynchronous communication unneeded by design.
There's also a way to break read: by shutting down the socket manually, which is not the best design decision for ones code, and in your specific case. So you should better play with smaller timeouts I think, or redesign your code to work in some other way.
There's really not much you can do to stop a Read call, unless you SetReadDeadline, or Close the connection.
One thing you could do is buffer it with the bufio package. This will allow you to Peek without actually reading something off the buffer. Peek will block just like Read does, but will allow you to decide what to do when something is available to read.
I have the memory that when we want to use select() over a socket descriptor, this socket should be set NONBLOCKING in advance.
but today, I read a source file where there seems no lines which set socket to NON-BLOCKING
Is my memory correct or not?
thanks!
duskwuff has the right idea when he says
In general, you do not need to set a socket as non-blocking to use it
in select().
This is true if your kernel is POSIX compliant with regard to select(). Unfortunately, some people use Linux, which is not, as the Linux select() man page says:
Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a subsequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which a
file descriptor is spuriously reported as ready. Thus it may be safer
to use O_NONBLOCK on sockets that should not block.
There was a discussion of this on lkml on or about Sat, 18 Jun 2011. One kernel hacker tried to justify the non POSIX compliance. They honor POSIX when it's convenient and desecrate it when it's not.
He argued "there may be two readers and the second will block." But such an application flaw is non sequiter. The kernel is not expected to prevent application flaws. The kernel has a clear duty: in all cases of the first read() after select(), the kernel must return at least 1 byte, EOF, or an error; but NEVER block. As for write(), you should always test whether the socket is reported writable by select(), before writing. This guarantees you can write at least one byte, or get an error; but NEVER block. Let select() help you, don't write blindly hoping you won't block. The Linux hacker's grumbling about corner cases, etc., are euphemisms for "we're too lazy to work on hard problems."
Suppose you read a serial port set for:
min N; with -icanon, set N characters minimum for a completed read
time N; with -icanon, set read timeout of N tenths of a second
min 250 time 1
Here you want blocks of 250 characters, or a one tenth second timeout. When I tried this on Linux in non blocking mode, the read returned for every single character, hammering the CPU. It was NECESSARY to leave it in blocking mode to get the documented behavior.
So there are good reasons to use blocking mode with select() and expect your kernel to be POSIX compliant.
But if you must use Linux, Jeremy's advice may help you cope with some of its kernel flaws.
It depends. Setting a socket as non-blocking does several things:
Makes read() / recv() return immediately with no data, instead of blocking, if there is nothing available to read on the socket.
If you are using select(), this is probably a non-issue. So long as you only read from a socket when select() tells you it is readable, you're fine.
Makes write() / send() return partial (or zero) writes, instead of blocking, if not enough space is available in kernel buffers.
This one is tricky. If your application is written to handle this situation, it's great, because it means your application will not block when a client is reading slowly. However, it means that your application will need to temporarily store writable data in its own application-level buffers, rather than writing directly to sockets, and selectively place sockets with pending writes in the writefds set. Depending on what your application is, this may either be a lifesaver or a huge added complication. Choose carefully.
If set before the socket is connected, makes connect() return immediately, before a connection is actually made.
Similarly, this is sometimes useful if your application needs to make connections to hosts that may respond slowly while continuing to respond on other sockets, but can cause issues if you aren't careful about how you handle these half-connected sockets. It's usually best avoided (by only setting sockets as non-blocking after they are connected, if at all).
In general, you do not need to set a socket as non-blocking to use it in select(). The system call already lets you handle sockets in a basic non-blocking fashion. Some applications will need non-blocking writes, though, and that's what the flag is still needed for.
send() and write() block if you provide more data than can be fitted into the socket send buffer. Normally in select() programming you don't want to block anywhere except in select(), so you use non-blocking mode.
With certain Windows APIs it indeed essential to use non-blocking mode.
Usually when you are using select(), you are using it is the basis of an event loop; and when using an event loop you want the event loop to block only inside select() and never anywhere else. (The reason for that is so that your program will always wake up whenever there is something to do on any of the sockets it is handling -- if, for example, your program was blocked inside recv() for socket A, it would be unable to handle any data coming in on socket B until it got some data from socket A first to wake it up; and vice versa).
Therefore it is best to set all sockets non-blocking when using select(). That way there is no chance of your program getting blocked on a single socket and ignoring the other ones for an extended period of time.
I believe that if we call close system call on a non-blocking socket it returns immediately, then how to handle the response? whether it is closed or not?
in other words what is the behavior of the socket system call close on a non-blocking socket?
It's not the blocking state of the socket, it's the SO_LINGER option that matters. From getsockopt(2):
SO_LINGER controls the action taken when unsent messages are queued on
socket and a close(2) is performed. If the socket promises reliable
delivery of data and SO_LINGER is set, the system will block the process
on the close(2) attempt until it is able to transmit the data or until it
decides it is unable to deliver the information (a timeout period, termed
the linger interval, is specified in seconds in the setsockopt() system
call when SO_LINGER is requested). If SO_LINGER is disabled and a
close(2) is issued, the system will process the close in a manner that
allows the process to continue as quickly as possible.
That is, with SO_LINGER enabled an error from close(2) on TCP socket would mean that kernel was not able to deliver data within linger interval (not counting other errors like invalid file descriptor, etc.). With lingering disabled - you never know. Also see The ultimate SO_LINGER page, or why is my tcp not reliable.
if we call close system call on a
non-blocking socket it returns
immediately
The socket is always closed: the connection may still be writing to the peer. But your question embodies a fallacy: if you call close() on any socket it will return immediately. Closing and writing to a socket is asynchronous. You can control that with SO_LINGER as per the other answer, although I suspect that only applies to blocking mode. Probably you should put the socket back into blocking mode before closing with a positive SO_LINGER if that's what you need to do.