Asynchronous Sockets in Linux-- Polling vs. Callback via - linux

In deciding to implement asynchronous sockets in my simple server (linux), I have run into a problem. I was going to continually poll(), and do some cleanup and caching between calls. Now this seems wastefull, so I did more digging and found a way to possibly implement some callbacks on i/o.
Would I incur a performace penalty, and more importantly would it work, if I created a socket with O_NONBLOCK, use SIOCSPGRP ioctl() to send a SIGIO on i/o, and use sigaction() to define a callback function during i/o.
In addition, can I define different functions for different sockets?

"I was going to continually poll(), and do some cleanup and caching between calls. Now this seems wasteful"
Wasteful how? Did you actually try and implement this?
You have your fd list. You call poll or (better) epoll() with the list. When it triggers, you walk the fd list and deal with each one appropriately. You need to cache incoming and outgoing data, so each fd needs some kind of struct. When I've done this, I've used a hash table for the fd structs (generating a key from the fd), but you are probably fine, at least initially, just using a fixed length array and checking in case the OS issues you a weirdly high fd (nb, I have never seen that happen and I've squinted thru more logs than I can count). The structs hold pointers to incoming and outgoing buffers, perhaps a state variable, eg:
struct connection {
int fd; // mandatory for the hash table version
unsigned char *dataOut;
unsigned char *dataIn;
int state; // probably from an enum
};
struct connection connected[1000]; // your array, or...
...probably a linked list is actually best for the fd's, I had an unrelated requirement for the hash table.
Start there and refine stepwise. I think you are just trying to find an easy way out -- that you may pay for later by making other things harder ;) $0.02.

Related

How multiple simultaneous requests are handled in Node.js when response is async?

I can imagine situation where 100 requests come to single Node.js server. Each of them require some DB interactions, which is implemented some natively async code - using task queue or at least microtask queue (e.g. DB driver interface is promisified).
How does Node.js return response when request handler stopped being sync? What happens to connection from api/web client where these 100 requests from description originated?
This feature is available at the OS level and is called (funnily enough) asynchronous I/O or non-blocking I/O (Windows also calls/called it overlapped I/O).
At the lowest level, in C (C#/Swift), the operating system provides an API to keep track of requests and responses. There are various APIs available depending on the OS you're on and Node.js uses libuv to automatically select the best available API at compile time but for the sake of understanding how asynchronous API works let's look at the API that is available to all platforms: the select() system call.
The select() function looks something like this:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, time *timeout);
The fd_set data structure is a set/list of file descriptors that you are interested in watching for I/O activity. And remember, in POSIX sockets are also file descriptors. The way you use this API is as follows:
// Pseudocode:
// Say you just sent a request to a mysql database and also sent a http
// request to google maps. You are waiting for data to come from both.
// Instead of calling `read()` which would block the thread you add
// the sockets to the read set:
add mysql_socket to readfds
add maps_socket to readfds
// Now you have nothing else to do so you are free to wait for network
// I/O. Great, call select:
select(2, &readfds, NULL, NULL, NULL);
// Select is a blocking call. Yes, non-blocking I/O involves calling a
// blocking function. Yes it sounds ironic but the main difference is
// that we are not blocking waiting for each individual I/O activity,
// we are waiting for ALL of them
// At some point select returns. This is where we check which request
// matches the response:
check readfds if mysql_socket is set {
then call mysql_handler_callback()
}
check readfds if maps_socket is set {
then call maps_handler_callback()
}
go to beginning of loop
So basically the answer to your question is we check a data structure what socket/file just triggered an I/O activity and execute the appropriate code.
You no doubt can easily spot how to generalize this code pattern: instead of manually setting and checking the file descriptors you can keep all pending async requests and callbacks in a list or array and loop through it before and after the select(). This is in fact what Node.js (and javascript in general) does. And it is this list of callbacks/file-descriptors that is sometimes called the event queue - it is not a queue per-se, just a collection of things you are waiting to execute.
The select() function also has a timeout parameter at the end which can be used to implement setTimeout() and setInterval() and in browsers process GUI events so that we can run code while waiting for I/O. Because remember, select is blocking - we can only run other code if select returns. With careful management of timers we can calculate the appropriate value to pass as the timeout to select.
The fd_set data structure is not actually a linked list. In older implementations it is a bitfield. More modern implementation can improve on the bitfield as long as it complies with the API. But this partly explains why there is so many competing async API like poll, epoll, kqueue etc. They were created to overcome the limitations of select. Different APIs keep track of the file descriptors differently, some use linked lists, some hash tables, some catering for scalability (being able to listen to tens of thousands of sockets) and some catering for speed and most try to do both better than the others. Whatever they use, in the end what is used to store the request is just a data structure that keeps tracks of file descriptors.

Is read guaranteed to return as soon as data is available?

I want to implement a simple notification protocol using TCP sockets. The server will write a byte to a socket to notify the client, and the client reads from the socket, waiting until some data arrives, at which point it can return from the read call and perform some work.
while (1) {
/* Wait for any notifications */
char buf[32];
if (read(fd, buf, sizeof(buf)) <= 0) {
break;
}
/* Received notification */
do_work();
}
My question is, is read guaranteed to return as soon as any data is available to read, or is the kernel allowed to keep waiting until some condition is met (e.g. some minimum number of bytes received, not necessarily the count which I pass into read) before it returns from the read call? If the latter is true, is there a flag that will disable that behavior?
I am aware that I could use the O_NONBLOCK flag and call read in a loop, but the purpose of this is to use as little CPU time as possible.
There are multiple implicit questions here:
Is read guaranteed to return immediately or shortly after a relevant event?
No. The kernel is technically allowed to make you wait as long as it wants.
In practice, it'll return immediately (modulo rescheduling delays).
This is true for poll and O_NONBLOCK as well. Linux is not a realtime OS, and offers no hard timing guarantees, just its best effort.
Is read allowed to wait indefinitely for multiple bytes to become available?
No, this would cause deadlocks. read needs to be able to return with a single byte, even if there are no guarantees about when it will do so.
In practice, Linux makes the same effort for 1 byte as it does for 1,048,576.
Is sending a single byte on a socket a practical way of waking up a remote process as soon as possible?
Yes, your example is perfectly fine.

Using select()/poll() in device driver

I have a driver, which handles several TCP connections.
Is there a way to perform something similar to user space application api's select/poll()/epoll() in kernel given a list of struct sock's?
Thanks
You may want to write your own custom sk_buff handler, which calls the kernel_select() that tries to lock the semaphore and does a blocking wait when the socket is open.
Not sure if you have already gone through this link Simulate effect of select() and poll() in kernel socket programming
On the kernel side it's easy to avoid using sys_epoll() interface outright. After all, you've got a direct access to kernel objects, no need to jump through hoops.
Each file object, sockets included, "overrides" a poll method in its file_operations "vtable". You can simply loop around all your sockets, calling ->poll() on each of them and yielding periodically or when there's no data available.
If the sockets are fairly high traffic, you won't need anything more than this.
A note on the API:
poll() method requires a poll_table() argument, however if you do not intend to wait on it, it can safely be initialized to null:
poll_table pt;
init_poll_funcptr(&pt, NULL);
...
// struct socket *sk;
...
unsigned event_mask = sk->ops->poll(sk->file, sk, &pt);
If you do want to wait, just play around with the callback set into poll_table by init_poll_funcptr().

Golang: Best way to read from a hashmap w/ mutex

This is a continuation from here: Golang: Shared communication in async http server
Assuming I have a hashmap w/ locking:
//create async hashmap for inter request communication
type state struct {
*sync.Mutex // inherits locking methods
AsyncResponses map[string]string // map ids to values
}
var State = &state{&sync.Mutex{}, map[string]string{}}
Functions that write to this will place a lock. My question is, what is the best / fastest way to have another function check for a value without blocking writes to the hashmap? I'd like to know the instant a value is present on it.
MyVal = State.AsyncResponses[MyId]
Reading a shared map without blocking writers is the very definition of a data race. Actually, semantically it is a data race even when the writers will be blocked during the read! Because as soon as you finish reading the value and unblock the writers - the value may not exists in the map anymore.
Anyway, it's not very likely that proper syncing would be a bottleneck in many programs. A non-blocking lock af a {RW,}Mutex is probably in the order of < 20 nsecs even on middle powered CPUS. I suggest to postpone optimization not only after making the program correct, but also after measuring where the major part of time is being spent.

Proper handling of context data in libaio callbacks?

I'm working with kernel-level async I/O (i.e. libaio.h). Prior to submitting a struct iocb using io_submit I set the callback using io_set_callback that sticks a function pointer in iocb->data. Finally, I get the completed events using io_getevents and run each callback.
I'd like to be able to use some context information within the callback (e.g. a submission timestamp). The only method by which I can think of doing this is to continue using io_getevents, but have iocb->data point to a struct with context and the callback.
Is there any other methods for doing something like this, and is iocb->data guaranteed to be untouched when using io_getevents? My understanding is that there is another method by which libaio will automatically run callbacks which would be an issue if iocb->data wasn't pointing to a function.
Any clarification here would be nice. The documentation on libaio seems to really be lacking.
One solution, which I would imagine is typical, is to "derive" from iocb, and then cast the pointer you get back from io_getevents() to your struct. Something like this:
struct my_iocb {
iocb cb;
void* userdata;
// ... anything else
};
When you issue your jobs, whether you do it one at a time or in a batch, you provide an array of pointers to iocb structs, which means they may point to my_iocb as well.
When you retrieve the notifications back from io_getevents(), you simply cast the io_event::obj pointer to your own type:
io_event events[512];
int num_events = io_getevents(ioctx, 1, 512, events, NULL);
for (int i = 0; i < num_events; ++i) {
my_iocb* job = (my_iocb*)events[i].obj;
// .. do stuff with job
}
If you don't want to block in io_getevents, but instead be notified via a file descriptor (so that you can block in select() or epoll(), which might be more convenient), I would recommend using the (undocumented) eventfd integration.
You can tie an aiocb to an eventfd file descriptor with io_set_eventfd(iocb* cb, int fd). Whenever the job completes, it increments the eventfd by one.
Note, if you use this mechanism, it is very important to never read more jobs from the io context (with io_getevents()) than what the eventfd counter said there were, otherwise you introduce a race condition from when you read the eventfd counter and reap the jobs.

Resources