Windows OS, A threading model where ThreadPool associated with an IO Completion, ~releasing~ a thread each time an Async IO is completed, the ~released~ thread is then used to deal with the IO completion.
While linux select can be used for async IO it doesn't seem to support the IO Completion / ThreadPool logic.
Is there any Linux equivalent to the IO Completion / ThreadPool above threading model ?
I don't know of anything that does that directly, but you can combine a select() loop with a thread pool of your own to get similar behavior. When select() returns and you examine your fd_sets to see which file descriptors are ready, push each of those descriptors to a thread pool for processing. You'll still need a main thread to run the select() loop, independent of the thread pool for processing the I/O events.
Most of the complexity in this approach will be in how you keep track of which descriptors need to be included in the select() call at each iteration, and in the thread pool itself (since POSIX provides threads but not a standard thread pool API).
You might be interested in using a library like GLib, which provides a modular main loop that helps with polling on a varying set of file descriptors, as well as a thread pool implementation.
Related
I have a multi-threaded system in which a main thread has to wait in blocking state for one of the following 4 events to happen:
inter-process semaphore (sem_wait())
pthread condition (pthread_cond_wait())
recv() from socket
timeout expiring
Ideally I'd like a mechanism to unblock the main thread when any of the above occurs, something like a ppoll() with suitable timeout parameter. Non-blocking and polling is out of the picture due to the impact on the CPU usage, while having separate threads blocking on different events is not ideal due to the increased latency (one thread unblocking from one of the events should eventually wake up the main one).
The code will be almost exclusively compiled under Linux with gcc toolchain, if that helps, but some portability would be good, if at all possible.
Thanks in advance for any suggestion
The mechanisms for waiting on multiple types of objects on Unix-like systems are not that great. In general, the idea is to, wherever possible, use file descriptors for IPC rather than multiple different IPC mechanisms.
From your comment, it sounds like you can edit or change the condition variable, but not the code that signals the semaphore. So what I'd recommend is something like the following.
Change the condition variable to either a pipe (for more portability) or an eventfd(2) object (Linux-specific). The notifying thread writes to the pipe whenever it wants to signal the main thread. This will allow you to select(2) or poll(2) or whatever in the main thread on both that pipe and the socket.
Because you're stuck with the semaphore, I think the best option would be to create another thread, whose sole purpose is to wait for the semaphore using sem_wait(), and then write to another pipe or eventfd(2) object when it is notified by whatever process is doing sem_post(). In the main thread, just add this other file descriptor to your select(2) set.
So you'll have three descriptors: one for the socket, one taking the place of the condition variable, and one which is written to when the semaphore is incremented. You can then wait on all three using your favorite I/O multiplexing method, and include directly whatever timeout you'd like.
I heard from a person that he chooses asynchronous IO instead of synchronized IO in the design of his system, because there will be less threads and threads are expensive.
But isn't asynchronous IO's advantage is to release the CPU from waiting for the IO to finish and do something else and then being interrupted by the IO operation once it finishes running. Does that change the number of threads or processes? I don't know.
Thanks.
The CPU is released in any case. It does not make sense to keep a CPU core idle while an IO is in progress. Another thread can be scheduled.
Async IO really is about having less threads. An IO is just a kernel data structure. An IO does not need to be "on a thread". Async IO decouples threads and IOs.
If you have 1M connected sockets and are reading on all of them you really don't want 1M threads.
I suggest you read this https://stackoverflow.com/a/14797071/2241168 . It's practical example to illustrate the difference between async and sync model.
The root of the confusion is that you actually have three options:
In a single thread, start I/O and wait for it to finish ("blocking" or "synchronous" I/O)
In a single thread, start I/O and do something else while it is running ("asynchronous" I/O)
In multiple threads, start I/O and "wait" for it to finish ("blocking" or "synchronous" I/O, but with multiple threads). In this case only one thread waits while the others go on doing useful work.
So if you use async I/O, you are avoiding both alternatives -- blocking the CPU or using multiple threads.
I am using Linux aio (io_submit() / io_getevents()) for file I/O. Since some operations do not have aio equilvalents (open(), fsync(), fallocate()), I use a worker thread that may block without impacting the main thread. My question is, should I add close() to this list?
All files are opened with O_DIRECT on XFS, but I am interested in both the general answer to the question, and on the specific answer with regard to my choice of filesystem and open mode.
Note that using a worker thread for close() is not trivial since close() is often called in cleanup paths, which aren't good places to launch a worker thread request and wait for it. So I'm hoping that close() is non-blocking in this scenario.
For this question, "blocking" means waiting on an I/O operation, or on some lock that may only be released when an I/O operation completes, but excluding page fault servicing.
close() may block on some filesystems. When possible, code should be written as portably as is practical. As such, you should definitely add close() to the list of calls that are called only from your blocking worker thread.
However, you mention that you often have to call close() in cleanup paths. If these are cleanup paths that execute at the termination of your application, it may not make as much of a difference even if close() does block if you call it directly.
Alternatively, what you could do would be to have a queue that is fed to a pool of workers. In glibc AIO, this is what is done for many calls. When you initialize AIO with aio_init(), glibc sets up a queue and a pool of worker threads. Every time an AIO call is made, glibc simply adds the relevant task and data to the queue. In the background, the worker threads wait on the queue and execute blocking calls and code and then perform any relevant actions.
If you really do have the need for a non-blocking close() (and other) calls, it may be to your advantage to simply setup a task queue and a thread pool and simply submit specific calls to the queue and have the thread pool execute calls as they come in.
We do have a Thread reading data from multiple Sockets by using async IO with WSARecvFrom() and IoCompletionPort.
The received Data (packet size about 1500 Byte) should be processed by the Main Thread. The Main Thread handles also all other sync stuff.
If the Main Thread is associated with a Window and we use PostMessage() to send a Message to that Window it uses very much time till the Main Window thread get the Message via GetMessage() and can Process the Data. If we should process a lot of Network Messages this Method is not usable.
Would PostThreadMessage() in Socket Thread and GetMessage() in Main Thread have better performance ?
Would SetEvent () in Socket Thread and WaitForMultipleObjects () in Main Thread have better performance ?
Is there a more efficient way for signaling the arrival of the data to the Main Thread ?
Are there any Win32 synchronisation functions what also directly could send the Data to be processed by the Main Thread without having an application buffer guarded by semaphores ?
Essentially, no. There is no faster mechanism available than PostMessage(), PostThreadMessage(), QueueUserAPC etc. inter-thread comms calls.
The time taken for such calls to transfer a buffer pointer or 'this' instance pointer is normally swamped by the large amount of time required to render such data onto a GUI component.
If you have loaded up your main thread with a pile of non-GUI activity that prevents the prompt handling of messages, or you are posting messages faster than the GUI thread can reasonably consume them, then you have a big problem.
If the latency of PostMessage() is actually the root cause of your problem, (which I doubt), then you could maybe aggregate the buffers into a container object and PostMessage that instead, so improving the overall bandwidth of the posted messages.
Would SetEvent () in Socket Thread and WaitForMultipleObjects () in
Main Thread have better performance ?
I suspect not noticeably. It would have to be WaitForMultipleObjects(Ex), to handle the 'normal' GUI messages as well as your stuff. It won't help if the root cause is an main thread that is overloaded for other reasons, or if messages are posted faster than you can handle them.
For example on windows there is MsgWaitForMultipleObjects that lets you asynchronously wait for windows messages, socket events, asynchronous io (IOCompletionRoutine), AND mutex handles.
On Unix you have select/poll that gives you everything except possibility to break out when some pthread_mutex is unlocked.
The story:
I have application that has main thread that does something with multiple sockets, pipes or files. Now from time to time there is a side job (a db transaction) that might take a longer time and if done synchronously in main thread would disrupt normal servicing of sockets. So I want to do the db operation in separate thread. That thread would wait on some mutex when idle until main thread decides to give it some job and unlocks the mutex so db thread can grab it.
The problem is how the db thread can notify back the main thread that it has finished the job. Main thread has to process sockets, so it cannot afford sleeping in pthread_mutex_lock. Doing periodic pthread_mutex_trylock is the last I would want to do. Currently I consider using a pipe, but is this the better way?
Using a pipe is a good idea here. Make sure that no other process has the write end of the pipe open, and then select() or poll() in the main thread the read end for reading. Once your worker thread is done with the work, close() the write end. The select() in the main thread wakes up immediately.
I don't think waiting on a mutex and something else would be possible, because on Linux, mutexes are implemented with the futex(2) system call, which doesn't support file descriptors.
I don't know how well it applies to your specific problem, but posix has message queues.