Designing multithread application (looking for design patterns) - multithreading

I'm preparing to write a multithread network application. At the moment I'm wondering what's the best thread pattern for my program. Whole application will handle up to 1000 descriptors (local files, network connections on various protocols and additional descriptors for timers and signals handling). Application will be optimized for Linux. Program will run on regular personal computers, so I assume, that they will have at least Pentium 4.
Here's my current idea:
One thread will handle network I/O
using epoll.
Second thread will
handle local-like I/O (disk I/O,
timers, signal handling) using epoll
Third thread
will handle UI (CLI, GTK+ or Qt)
Handling each network connection in separate thread will kill CPU because of too many context switches.
Maybe there's better way to do this?
Do you know any documents/books about designing multirhread applications? I'm looking for answers on questions like: What's the rational number of threads? etc.

You're on the right track. You want to use a thread pool pattern to handle the networking rather than one thread per network connection.
This website may also be helpful to you and lists the most common design patterns and in what situations they can be used.
http://sourcemaking.com/design_patterns/
To handle the disk I/O you might like to consider using mmap under linux. It's very fast and efficient. That way, you will let the kernel do the work and you probably won't need a separate thread for that.
I'm currently playing with Boost::asio which seems to be quite good. It uses epoll on linux. As it appears you are using a cross platform gui toolkit like Qt, then boost asio will also provide cross platform support so you will be able to use it on windows or linux. I think there might be a cross platform mmap too.

Related

Is there a way to get notified when a packet arrives over socket rather than keep on polling using recv()

I have an application which keeps waiting for a packet over UDP. I do this using recv() call (NON-BLOCKING).
The application is multi-threaded, the purpose of other threads is to do some processing when the particular packet is received.
Since, in IDLE times, one thread keeps on polling for packet the CPU usage for 1 core is near 100%.
Therefore, to remove this intensive polling (and in general, for information) is there a way such that I can get notified when the packet is received? i.e. something similar to registering a parse callback which can be called when any packet is received on that socket.
P.S. I cannot have a delay of more than 5 ms between successive recv() calls.
OS Info : Debian 8u2, Kernel 3.16
Platform : Intel i3, x86_64
There are several ways how to get informed about received data.
select()
As mentioned in comments above select is an old and highly portable mechanism how to wake up a thread when socket is ready for reading and writing. The select has a bad performance if the number of socket is high because the sets of sockets cannot be reused between calls and it is required to iterate over whole set of sockets to find which is readable or writable. The sockets added to set for select should not be written and read from another thread so ot is difficult to use it in multithreaded application. An example how to use it is in man select.
poll()
It is a newer mechanism that select. It eliminates some select performance drawbacks but it some are still present like iterating through set of sockets to find which socket is readable or writable. poll is portable across unixes and windows supports it since Vista.
epoll()
epoll is a modern linux specific polling method. It is quite new (added to kernel in 2002). It eliminates almost all of poll and select performance problems. The only drawback is that it is not portable outside linux ecosystem. Some OSes have own proprietary polling mechanism as well. For example FreeBSD has kpoll.
library based polling
The low level access to select, poll, epoll can be encapsulated and a library may provide unified API for all of these methods. The well know library providing that is http://libevent.org/

How to benchmark Linux threaded programs?

I'm trying to compare the performance of threaded programs (on Linux). Since the programs use different thread synchronization methods and different lock granularity, running the programs on a shared server or desktop would not be good, since the other tasks may interfere with the scheduling of my programs. I don't have dedicated hosts, so I thought that using qemu would be a good option.
What I want to know is:
Are there any alternatives for this task?
I suppose that there is no way to reproduce scheduling done by guest Linux system on qemu, if
I - need to? (Suppose my program goes unusually skow or fast -- I'd like to know if I can run it again, but keeping exactly the same scheduling for its threads). Or is there a way?

Looking for a Linux threadpool api with OS scheduler support

I'm looking for a thread pool abstraction in Linux that provides the same level of kernel scheduler support that the Win32 thread pool provides. Specifically, I'm interested in finding a thread pool that maintains a certain number of running threads. When a running pool thread blocks on I/O, I want the thread pool to be smart enough to start another thread running.
Anyone know of anything like this for linux?
You really can't do this without OS support. There's no good way to tell that a thread is blocked on I/O. You wind up having to atomically increment a counter before each operation that might block and decrement it after. Then you need a thread to monitor that counter and create an additional thread if it's above zero. (Remove threads if they're idle more than a second or so.)
Generally speaking, it's not worth the effort. This only works so well on Windows because it's the "Windows way" and Windows is built from the ground up for it. For Linux, you should be using epoll or boost::asio. Use something that does things the "Linux way" rather than trying to make the Windows way work on non-Windows operating systems.
You can write your own wrappers that use IOCP on Windows, epoll on Linux, and so on. But these already exist, so you need not bother.

Is there a way to take advantage of multi-core when dealing with network connections?

When we doing network programming, no matter you use multi-process, multi-thread or select/poll(epoll), there is only one process/thread to deal with accept the connection on same port. And if you want to take advantage of multi-cores, you need to create worker processes/threads. But what about the bound is dealing with network connections? Is there a way to take advantage of multi-core when dealing with network connections?
I found some materials. And seems this is hard to complete.
Three-way hand shaking will be implicit done by the kernel. And in smp structure operating system will be divided into several critical zones. The same critical zone can't be run on more than one core at the same time.
All modern operating systems that run on PC hardware already have their network stacks heavily optimized for multi-core CPUs. For example, the packet handling code that pushes data to and from the network card is going to be independent of the TCP/IP stack code so a hardware interrupt can run to completion without disturbing the TCP code.
For most real-world applications though, the bulk of the work is between the packets. Data that comes in has to be processed and data that goes out has to be generated. That's up to application code, and that code can take advantage of multiple cores either by using multiple threads or multiple processes. How you do that best is very application and operating system specific. Windows, for example, has I/O completion ports which combine job discovery with multi-threaded job dispatch. Linux has epoll.
With just the network traffic, that's almost soley done by the network card (i.e. not the computer's CPU). Communication with the network card is usually single-threaded (queued by the OS so you can send/receive on multiple threads) because a NIC can only push/pop stuff off it's stack one-at-a-time.
It's up to your process to do what it needs in response to received data. That can be done on one thread and you can spawn other threads upon receive of data on that master thread and divide work up that way. If you have a language that supports asynchronous communications, I would try to get it to do most of the work to use multiple threads.

Linux and I/O completion ports?

Using winsock, you can configure sockets or seperate I/O operations to "overlap". This means that calls to perform I/O are returned immediately, while the actual operations are completed asynchronously by separate worker threads.
Winsock also provides "completion ports". From what I understand, a completion port acts as a multiplexer of handles (sockets). A handle can be demultiplexed if it isn't in the middle of an I/O operation, i.e. if all its I/O operations are completed.
So, on to my question... does linux support completion ports or even asynchronous I/O for sockets?
If you're looking for something exactly like IOCP, you won't find it, because it doesn't exist.
Windows uses a notify on completion model (hence I/O Completion Ports). You start some operation asynchronously, and receive a notification when that operation has completed.
Linux applications (and most other Unix-alikes) generally use a notify on ready model. You receive a notification that the socket can be read from or written to without blocking. Then, you do the I/O operation, which will not block.
With this model, you don't need asynchronous I/O. The data is immediately copied into / out of the socket buffer.
The programming model for this is kind of tricky, which is why there are abstraction libraries like libevent. It provides a simpler programming model, and abstracts away the implementation differences between the supported operating systems.
There is a notify on ready model in Windows as well (select or WSAWaitForMultipleEvents), which you may have looked at before. It can't scale to large numbers of sockets, so it's not suitable for high-performance network applications.
Don't let that put you off - Windows and Linux are completely different operating systems. Something that doesn't scale well on one system may work very well on the other. This approach actually works very well on Linux, with performance comparable to IOCP on Windows.
IOCP is pronounced "asynchronous I/O" on various UNIX platforms:
POSIX AIO is the standard
Kernel AIO, epoll and io_uring seem to be a Linux-specific implementations
Kqueue is the *BSD and Mac OSX implementation
Message Passing Interface (MPI) is an option for high-performance computing
obligatory Boost reference - Boost.Asio
Use boost::asio. Hands down. It has a mild learning curve, but it's cross-platform, and automatically uses the best available method for the system you're compiling on. There's simply no reason not to.
I know that this isn't quite an answer to your question, but it's the best advice I could give.
So, on to my question... does linux support completion ports or even asynchronous I/O for sockets?
With regard to sockets, in 5.3 and later kernels, Linux has something analogous to completion ports in the shape of io_uring (for files/block devices io_uring support appeared in the 5.1 kernel).
Read the blog entry from Google on libevent, you can implement IOCP semantics on Unix using asynchronous IO but cannot directly implement asynchronous IO semantics using IOCP,
http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x-only.html
For an example cross platform asynchronous IO with a BSD socket API look at ZeroMQ as recently published on LWN.net,
http://www.zeromq.org/
LWN article,
http://lwn.net/Articles/370307/
Boost ASIO implements Windows style IOCP (Proactor design pattern) on Linux using epoll (Reactor pattern). See http://think-async.com/Asio/asio-1.5.3/doc/asio/overview/core/async.html

Resources