Question about epoll and splice - linux

My application is going to send huge amount of data over network, so I decided (because I'm using Linux) to use epoll and splice. Here's how I see it (pseudocode):
epoll_ctl (file_fd, EPOLL_CTL_ADD); // waiting for EPOLLIN event
while(1)
{
epoll_wait (tmp_structure);
if (tmp_structure->fd == file_descriptor)
{
epoll_ctl (file_fd, EPOLL_CTL_DEL);
epoll_ctl (tcp_socket_fd, EPOLL_CTL_ADD); // wait for EPOLLOUT event
}
if (tmp_structure->fd == tcp_socket_descriptor)
{
splice (file_fd, tcp_socket_fd);
epoll_ctl (tcp_socket_fd, EPOLL_CTL_DEL);
epoll_ctl (file_fd, EPOLL_CTL_ADD); // waiting for EPOLLIN event
}
}
I assume, that my application will open up to 2000 TCP sockets. I want o ask you about two things:
There will be quite a lot of epoll_ctl calls, won't wit be slow when I will have so many sockets?
File descriptor has to become readable first and there will be some interval before socket will become writable. Can I be sure, that at the moment when socket becomes writable file descriptor is still readable (to avoid blocking call)?

1st question
You can use edge triggered rather then even triggered polling thus you do not have to delete socket each time.
You can use EPOLLONESHOT to prevent removing socket
File descriptor has to become readable first and there will be some interval before socket will become writable.
What kind of file descriptor? If this file on file system you can't use select/poll or other tools for this purpose, file will be always readable or writeable regardless the state if disk and cache. If you need to do staff asynchronous you may use aio_* API but generally just read from file write to file and assume it is non-blocking.
If it is TCP socket then it would be writeable most of the time. It is better to use
non-blocking calls and put sockets to epoll when you get EWOULDBLOCK.

Consider using EPOLLET flag. This is definitely for that case. When using this flag you can use event loop in a proper way without deregistering (or modifying mode on) file descriptors since first registration in epoll. :) enjoy!

Related

Group multiple file descriptors to one "virtual" file descriptor for exporting an FD over an API

If a subsystem has event handling capabilities, then it is common in the Unix/Linux world to add an API call to that subsystem to allow for exposing a file descriptor so that said event handling can be integrated into existing mainloops that use something like poll() or select(). For example, in Wayland, there's wl_display_get_fd(). If that FD shows activity, wl_display_read_events() and friends can be called.
This works trivially if that subsystem internally has exactly one FD. But what if there are multiple FDs that need to be watched for events?
I only see two solutions:
Expose all FDs. However, I am not aware of any API that does that.
Expose some sort of "virtual" FD that is in some way coupled to the internal, "real" FDs. Once a real FD receives data and is marked as readable, then so is that virtual FD. Once a real FD can be written to, then the virtual FD is automatically marked as writable etc.
#2 sounds cleaner to me. Is it possible to do that? Or are there better ways to deal with this?
If you're specifically working with Linux, then you can use the epoll mechanism. You first create an epoll instance with
int fd;
fd = epoll_create(1); // The argument is legacy and doesn't matter. It just has to be positive.
After that, you can add selectors that you care about.
if ( epoll_ctl(fd, EPOLL_CTL_ADD, some_file_descriptor, NULL) != 0 ) {
// handle error
}
That last argument can actually contain data that want passed back to you later when one of your file descriptors becomes ready. Check the man page for the specifics.
You can inquire about any ready descriptors using epoll_wait or epoll_pwait.

Using thread to write and select to read

Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
if the socket is non blocking, the write call will return immediately and the application will not know the status of the write (if it passed or failed).
is there a way of knowing the status of the write call without having to block on it.
Has any one tried to create a socket in non blocking mode and use a dedicated thread to write to the socket, but use the select system call to identify if data is available to read data.
Yes, and it works fine. Sockets are bi-directional. They have separate buffers for reading and writing. It is perfectly acceptable to have one thread writing data to a socket while another thread is reading data from the same socket at the same time. Both threads can use select() at the same time.
if the socket is non blocking, the write call will
return immediately and the application will not
know the status of the write (if it passed or failed).
The same is true for blocking sockets, too. Outbound data is buffered in the kernel and transmitted in the background. The difference between the two types is that if the write buffer is full (such as if the peer is not reading and acking data fast enough), a non-blocking socket will fail to accept more data and report an error code (WSAEWOULDBLOCK on Windows, EAGAIN or EWOULDBLOCK on other platforms), whereas a blocking socket will wait for buffer space to clear up and then write the pending data into the buffer. Same thing with reading. If the inbound kernel buffer is empty, a non-blocking socket will fail with the same error code, whereas a blocking socket will wait for the buffer to receive data.
select() can be used with both blocking and non-blocking sockets. It is just more commonly used with non-blocking sockets than blocking sockets.
is there a way of knowing the status of the write
call without having to block on it.
On non-Windows platforms, about all you can do is use select() or equivalent to detect when the socket can accept new data before writing to it. On Windows, there are ways to receive a notification when a pending read/write operation completes if it does not finish right away.
But either way, outbound data is written into a kernel buffer and not transmitted right away. Writing functions, whether called on blocking or non-blocking sockets, merely report the status of writing data into that buffer, not the status of transmitting the data to the peer. The only way to know the status of the transmission is to have the peer explicitly send back a reply message once it has received the data. Some protocols do that, and others do not.
is there a way of knowing the status of the write call without having
to block on it.
If the result of the write call is -1, then check errno to for EAGAIN or EWOULDBLOCK. If it's one of those errors, then it's benign and you can go back to waiting on a select call. Sample code below.
int result = write(sock, buffer, size);
if ((result == -1) && ((errno == EAGAIN) || (errno==EWOULDBLOCK)) )
{
// write failed because socket isn't ready to handle more data. Try again later (or wait for select)
}
else if (result == -1)
{
// fatal socket error
}
else
{
// result == number of bytes sent.
// TCP - May be less than the number of bytes passed in to write/send call.
// UDP - number of bytes sent (should be the entire thing)
}

Will select() block if called while there is still data to be read?

If a socket has data to be read and the select() function is called, will select():
Return immediately, indicating the socket is ready for reading, or
Block until more data is received on the socket
??
It can easily be tested, but I assure you select() will never block if there is data already available to read on one of the readfds. If it did block in that case, it wouldn't be very useful for programming with non-blocking I/O. Take the example where you are looping on select(), you see that there is data to be read, and you read it. Then while you are processing the data read, more data comes in. When you return to select() it blocks, waiting for more data. However your peer on the other side of the connection is waiting for a response to the data already sent. Your program ends up blocking forever. You could work around it with timeouts and such, but the whole point is to make non-blocking I/O efficient.
If an fd is at EOF, select() will never block even if called multiple times.
man 2 select seems to answer this question pretty directly:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
So at least according to the manual, it would return immediately if there is any data available.

epoll_create cleanup?

I'm using epoll_create to wait on a socket.
What is the life-cycle of the returned resource tied to? Is there something like an epoll_destroy or is it tied to the socket's close or destory call?
Can I re-use the result of epoll_create if close my socket and re-open a new one. Or should I just call epoll_create and forget about the previous result of epoll_create.
epoll_create(2) returns a file descriptor, so you just use close(2) on it when done.
Then, the idea of I/O multiplexing, often called Asynchronous I/O, is to wait for multiple events, and handle them one at a time. That means you generally need only one polling file descriptor.
epoll(7) manual page contains basic example of suggested API usage.

Client connections with epoll

I'm programming an application(client/server) in C++ for linux using epoll y pthreads but I don't know how to handle the connect() calls for attach a new connection in the descriptor list if a loop with epoll_wait() is running(Edge-triggered), How to can I do it?... I could to use a dummy file descriptor to trigger an event and scape of wait?, or a simple call to connect() could fire the event??...
Sorry for my bad english...
Yes, you can use another file descriptor that's just for waking up your epoll_wait() loop. Use pipe() to create the file descriptor. Add the reading end of the pipe to your epoll list, and write a single byte to the writing end when you want to wake it up. The reading side can just read that byte and discard it.

Resources