I'm using epoll_create to wait on a socket.
What is the life-cycle of the returned resource tied to? Is there something like an epoll_destroy or is it tied to the socket's close or destory call?
Can I re-use the result of epoll_create if close my socket and re-open a new one. Or should I just call epoll_create and forget about the previous result of epoll_create.
epoll_create(2) returns a file descriptor, so you just use close(2) on it when done.
Then, the idea of I/O multiplexing, often called Asynchronous I/O, is to wait for multiple events, and handle them one at a time. That means you generally need only one polling file descriptor.
epoll(7) manual page contains basic example of suggested API usage.
Related
I can imagine situation where 100 requests come to single Node.js server. Each of them require some DB interactions, which is implemented some natively async code - using task queue or at least microtask queue (e.g. DB driver interface is promisified).
How does Node.js return response when request handler stopped being sync? What happens to connection from api/web client where these 100 requests from description originated?
This feature is available at the OS level and is called (funnily enough) asynchronous I/O or non-blocking I/O (Windows also calls/called it overlapped I/O).
At the lowest level, in C (C#/Swift), the operating system provides an API to keep track of requests and responses. There are various APIs available depending on the OS you're on and Node.js uses libuv to automatically select the best available API at compile time but for the sake of understanding how asynchronous API works let's look at the API that is available to all platforms: the select() system call.
The select() function looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, time *timeout);
The fd_set data structure is a set/list of file descriptors that you are interested in watching for I/O activity. And remember, in POSIX sockets are also file descriptors. The way you use this API is as follows:
// Pseudocode:
// Say you just sent a request to a mysql database and also sent a http
// request to google maps. You are waiting for data to come from both.
// Instead of calling `read()` which would block the thread you add
// the sockets to the read set:
add mysql_socket to readfds
add maps_socket to readfds
// Now you have nothing else to do so you are free to wait for network
// I/O. Great, call select:
select(2, &readfds, NULL, NULL, NULL);
// Select is a blocking call. Yes, non-blocking I/O involves calling a
// blocking function. Yes it sounds ironic but the main difference is
// that we are not blocking waiting for each individual I/O activity,
// we are waiting for ALL of them
// At some point select returns. This is where we check which request
// matches the response:
check readfds if mysql_socket is set {
then call mysql_handler_callback()
}
check readfds if maps_socket is set {
then call maps_handler_callback()
}
go to beginning of loop
So basically the answer to your question is we check a data structure what socket/file just triggered an I/O activity and execute the appropriate code.
You no doubt can easily spot how to generalize this code pattern: instead of manually setting and checking the file descriptors you can keep all pending async requests and callbacks in a list or array and loop through it before and after the select(). This is in fact what Node.js (and javascript in general) does. And it is this list of callbacks/file-descriptors that is sometimes called the event queue - it is not a queue per-se, just a collection of things you are waiting to execute.
The select() function also has a timeout parameter at the end which can be used to implement setTimeout() and setInterval() and in browsers process GUI events so that we can run code while waiting for I/O. Because remember, select is blocking - we can only run other code if select returns. With careful management of timers we can calculate the appropriate value to pass as the timeout to select.
The fd_set data structure is not actually a linked list. In older implementations it is a bitfield. More modern implementation can improve on the bitfield as long as it complies with the API. But this partly explains why there is so many competing async API like poll, epoll, kqueue etc. They were created to overcome the limitations of select. Different APIs keep track of the file descriptors differently, some use linked lists, some hash tables, some catering for scalability (being able to listen to tens of thousands of sockets) and some catering for speed and most try to do both better than the others. Whatever they use, in the end what is used to store the request is just a data structure that keeps tracks of file descriptors.
I have a driver, which handles several TCP connections.
Is there a way to perform something similar to user space application api's select/poll()/epoll() in kernel given a list of struct sock's?
Thanks
You may want to write your own custom sk_buff handler, which calls the kernel_select() that tries to lock the semaphore and does a blocking wait when the socket is open.
Not sure if you have already gone through this link Simulate effect of select() and poll() in kernel socket programming
On the kernel side it's easy to avoid using sys_epoll() interface outright. After all, you've got a direct access to kernel objects, no need to jump through hoops.
Each file object, sockets included, "overrides" a poll method in its file_operations "vtable". You can simply loop around all your sockets, calling ->poll() on each of them and yielding periodically or when there's no data available.
If the sockets are fairly high traffic, you won't need anything more than this.
A note on the API:
poll() method requires a poll_table() argument, however if you do not intend to wait on it, it can safely be initialized to null:
poll_table pt;
init_poll_funcptr(&pt, NULL);
...
// struct socket *sk;
...
unsigned event_mask = sk->ops->poll(sk->file, sk, &pt);
If you do want to wait, just play around with the callback set into poll_table by init_poll_funcptr().
If a socket has data to be read and the select() function is called, will select():
Return immediately, indicating the socket is ready for reading, or
Block until more data is received on the socket
??
It can easily be tested, but I assure you select() will never block if there is data already available to read on one of the readfds. If it did block in that case, it wouldn't be very useful for programming with non-blocking I/O. Take the example where you are looping on select(), you see that there is data to be read, and you read it. Then while you are processing the data read, more data comes in. When you return to select() it blocks, waiting for more data. However your peer on the other side of the connection is waiting for a response to the data already sent. Your program ends up blocking forever. You could work around it with timeouts and such, but the whole point is to make non-blocking I/O efficient.
If an fd is at EOF, select() will never block even if called multiple times.
man 2 select seems to answer this question pretty directly:
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
So at least according to the manual, it would return immediately if there is any data available.
My application is going to send huge amount of data over network, so I decided (because I'm using Linux) to use epoll and splice. Here's how I see it (pseudocode):
epoll_ctl (file_fd, EPOLL_CTL_ADD); // waiting for EPOLLIN event
while(1)
{
epoll_wait (tmp_structure);
if (tmp_structure->fd == file_descriptor)
{
epoll_ctl (file_fd, EPOLL_CTL_DEL);
epoll_ctl (tcp_socket_fd, EPOLL_CTL_ADD); // wait for EPOLLOUT event
}
if (tmp_structure->fd == tcp_socket_descriptor)
{
splice (file_fd, tcp_socket_fd);
epoll_ctl (tcp_socket_fd, EPOLL_CTL_DEL);
epoll_ctl (file_fd, EPOLL_CTL_ADD); // waiting for EPOLLIN event
}
}
I assume, that my application will open up to 2000 TCP sockets. I want o ask you about two things:
There will be quite a lot of epoll_ctl calls, won't wit be slow when I will have so many sockets?
File descriptor has to become readable first and there will be some interval before socket will become writable. Can I be sure, that at the moment when socket becomes writable file descriptor is still readable (to avoid blocking call)?
1st question
You can use edge triggered rather then even triggered polling thus you do not have to delete socket each time.
You can use EPOLLONESHOT to prevent removing socket
File descriptor has to become readable first and there will be some interval before socket will become writable.
What kind of file descriptor? If this file on file system you can't use select/poll or other tools for this purpose, file will be always readable or writeable regardless the state if disk and cache. If you need to do staff asynchronous you may use aio_* API but generally just read from file write to file and assume it is non-blocking.
If it is TCP socket then it would be writeable most of the time. It is better to use
non-blocking calls and put sockets to epoll when you get EWOULDBLOCK.
Consider using EPOLLET flag. This is definitely for that case. When using this flag you can use event loop in a proper way without deregistering (or modifying mode on) file descriptors since first registration in epoll. :) enjoy!
I'm programming an application(client/server) in C++ for linux using epoll y pthreads but I don't know how to handle the connect() calls for attach a new connection in the descriptor list if a loop with epoll_wait() is running(Edge-triggered), How to can I do it?... I could to use a dummy file descriptor to trigger an event and scape of wait?, or a simple call to connect() could fire the event??...
Sorry for my bad english...
Yes, you can use another file descriptor that's just for waking up your epoll_wait() loop. Use pipe() to create the file descriptor. Add the reading end of the pipe to your epoll list, and write a single byte to the writing end when you want to wake it up. The reading side can just read that byte and discard it.