Instant Messaging Server Design - multithreading

Let's suppose we have an instant messaging application, client-server based, not p2p. The actual protocol doesn't matter, what matters is the server architecture. The said server can be coded to operate in single-threaded, non-parallel mode using non-blocking sockets, which by definition allow us to perform operations like read-write effectively immediately (or instantly). This very feature of non-blocking sockets allows us to use some sort of select/poll function at the very core of the server and waste next to no time in the actual socket read/write operations, but rather to spend time processing all this information. Properly coded, this can be very fast, as far as I understand. But there is the second approach, and that is to multithread aggressively, creating a new thread (obviously using some sort of thread pool, because that very operation can be (very) slow on some platforms and under some circumstances), and letting those threads to work in parallel, while the main background thread handles accept() and stuff. I've seen this approach explained in various places over the Net, so it obviously does exist.
Now the question is, if we have non-blocking sockets, and immediate read/write operations, and a simple, easily coded design, why does the second variant even exist? What problems are we trying to overcome with the second design, i.e. threads? AFAIK those are usually used to work around some slow and possibly blocking operations, but no such operations seem to be present there!

I'm assuming you're not talking about having a thread per client as such a design is usually for completely diffreent reasons, but rather a pool of threads each handles several concurrent clients.
The reason for that arcitecture vs a single threaded server is simply to take advantage of multiple processors. You're doing more work than simply I/O. You have to parse the messages, do various work, maybe even run some more heavyweight crypto algorithms. All this takes CPU. If you want to scale, taking advantage of multiple processors will allow you to scale even more, and/or keep the latency even lower per client.
Some of the gain in such a design can be a bit offset by the fact you might need more locking in a multithreaded environment, but done right, and certainly depening on what you're doing, it can be a huge win - at the expense of more complexity.
Also, this might help overcome OS limitations . The I/O paths in the kernel might get more distributed among the procesors. Not all operating systems might fully be able to thread the IO from a single threaded applications. Back in the old days there were'nt all the great alternatives to the old *nix select(), which usually had a filedesciptor limit of 1024, and similar APIs severly started degrading once you told it to monitor too many socket. Spreading all those clients on multiple threads or processes helped overcome that limit.
As for a 1:1 mapping between threads, there's several reasons to implement that architecture:
Easier programming model, which might lead to less hard to find bugs, and faster to implement.
Support blocking APIs. These are all over the place. Having a thread handle many/all of the clients and then go on to do a blocking call to a database is going to stall everyone. Even reading files can block your application, and you usually can't monitor regular file handles/descriptors for IO events - or when you can, the programming model is often exceptionally complicated.
The drawback here is it won't scale, atleast not with the most widely used languages/framework. Having thousands of native threads will hurt performance. Though some languages provides a much more lightweight approach here, such as Erlang and to some extent Go.

Related

Why not to use massively multi-threaded code?

Asynchronous and other event-based programming paradigms seem to be spreading like wildfire these days, with the popularity of node.js, Python 3.5's recent async improvements, and what not else.
Not that I particularly mind this or that I haven't already been doing it for a long time myself, but I've been trying to wrap my head around the real reasons why. Searching around for the evils of synchronous programming consistently seems to net the preconceived notion that "you can't have a thread for each request", without really qualifying that statement.
Why not, though? A thread might not be the cheapest resource one could think of, but it hardly seems "expensive". On 64-bit machines, we have more than enough virtual address space to handle all the threads we could ever want, and, unless your call chains are fairly deep, each thread shouldn't necessarily have to require more physical RAM than a single page* for stack plus whatever little overhead the kernel and libc need. As for performance, my own casual testing shows that Linux can handle well over 100,000 thread creations and tear-downs per second on a single CPU, which can hardly be a bottleneck.
That being said, it's not like I think event-based programming is all just a ruse, seeing as how it seems to have been the primary driver allowing such HTTP servers as lighttpd/nginx/whatever to overtake Apache in highly concurrent performance**. However, I've been trying to find some kind of actual inquiry into the reason why massively-multithreaded programs are slower without being able to find any.
So then, why is this?
*My testing seems to show that each thread actually requires two pages. Perhaps there's some dirtying of the TLS going on or something, but nevertheless it doesn't seem to change a lot.
**Though it should also be said that Apache, at that time, was using process-based concurrency rather than thread-based, which obviously makes a lot of difference.
If you have a thread for each request, then you can't do a little bit of work for each of 100 requests without switching contexts 100 times. While many things computers have to do have gotten faster over time, context switching is still expensive because it blows out the caches and modern systems are more dependent on these caches than ever.
That is a loaded question. I've heard different responses over time because I've had that conversation so many times before with different developers. Mainly, my gut feeling is most developers hate it because it is harder to write multi-threaded code and sometimes it is easy to shoot yourself in the foot unnecessarily. That said, each situation is different. Some programs lend themselves to multi-threading rather nicely, like a webserver. Each thread can take a request and essentially processes it without needing much outside resources. It has a set of procedures to apply on a request to decide how to process it. It decides what to do with it and passes it off. So it is fairly independent and can operate in its own world fairly safely. So it is a nice thread.
Other situations might not lend themselves so nicely. Especially when you need shared resources. Things can get hairy fast. Even if you do what seems like perfect context switching, you might still get race conditions. Then the nightmares begin. This is seen quite often in huge monolithic applications where they opted to use threads and open the gates of hell upon their dev team.
In the end, I think we will probably not see more threading in the day-to-day development, but we will move to a more event driven like world. We are going down that route with web development with the emergence of micro-services. So there will probably be more threading used, but not in a way that is visible to the developer using the framework. It will just be apart of the framework. At least that is my opinion.
Once the number of ready or running threads (versus threads pending on events) and/or processes goes beyond the number of cores, then those threads and/or processes are competing for the same cores, same cache, and the same memory bus.
Unless there are a massive number of simultaneous events to pend on, I don't see the purpose of massively multi-threaded code, except for super computers with a large number of processors and cores, and that code is usually massively multi-processing, with multiple memory buses.

Would handling each TCP connection in a separate thread improve latency?

I have an FTP server, implemented on top of QTcpServer and QTcpSocket.
I take advantage of the signals and slots mechanism to support multiple TCP connections simultaneously, even though I have a single thread. My code returns as soon as possible to the event loop, it doesn't block (no wait functions), and it doesn't use nested event loops anywhere. That way I already have cooperative multitasking, like Win3.1 applications had.
But a lot of other FTP servers are multithreaded. Now I'm wondering if using a separate thread for handling each TCP connection would improve performance, and especially latency.
On one hand, threads add to latency because you need to start a new thread for each new connection, but on the other, with my cooperative multitasking, other TCP connections have to wait until I've returned to the main loop before their readyRead()/bytesWritten() signals can be handled.
In your current system and ignoring file I/O time one processor is always doing something useful if there's something useful to be done, and waiting ready-to-go if there's nothing useful to be done. If this were a single processor (single core) system you would have maximized throughput. This is often a very good design -- particularly for an FTP server where you don't usually have a human waiting on a packet-by-packet basis.
You have also minimized average latency (for a single processor system.) What you do not have is consistent latency. Measuring your system's performance is likely to show a lot of jitter -- a lot of variation in the time it takes to handle a packet. Again because this is FTP and not real-time process control or human interaction, jitter may not be a problem.
Now, however consider that there is probably more than one processor available on your system and that it may be possible to overlap I/O time and processing time.
To take full advantage of a multi-processor(core) system you need some concurrency.
This normally translates to using multiple threads, but it may be possible to achieve concurrency via asynchronous (non-blocking) file reads and writes.
However, adding multiple threads to a program opens up a huge can-of-worms.
If you do decide to go the MT route, I'd suggest that you consider depending on a thread-aware I/O library. QT may provide that for you (I'm not sure.) If not, take a look at boost::asio (or ACE for an older, but still solid solution). You'll discover that using the MT capabilities of such a library involves a considerable investment in learning time; however as it turns out the time to add on multithreading "by-hand" and get it right is even worse.
So I'd say stay with your existing solution unless you are worried about unused Processor cycles and/or jitter in which case start learning QT's multithreading support or boost::asio.
Do you need to start a new thread for each new connection? Could you not just have a pool of threads that acts on requests as and when they arrive. This should reduce some of the latency. I have to say that in general a multi-threaded FTP server should be more responsive that a single-threaded one. Is it possible to have an event based FTP server?

Lightweight Threads in Operating Systems

It is said that one of the main benefits of Node (and presumable twisted et al) over more conventional threaded servers, is the very high concurrency enabled by the event loop model. The biggest reason for this is that each thread has a high memory footprint and swapping contexts is comparatively expensive. When you have thousands of threads the server spends most of its time swapping from thread to thread.
My question is, why don't operating systems or the underlying hardware support much more lightweight threads? If they did, could you solve the 10k problem with plain threads? If they can't, why is that?
Modern operating systems can support the execution of a very large number of threads.
More generally, hardware keeps getting faster (and recently, it has been getting faster in a way that is much friendlier to multithreading and multiprocessing than to single-threaded event loops - ie, increased number of cores, rather than increased processing throughput capabilities in a single core). If you can't afford the overhead of a thread today, you can probably afford it tomorrow.
What the cooperative multitasking systems of Twisted (and presumably Node.js et al) offers over pre-emptive multithreading (at least in the form of pthreads) is ease of programming.
Correctly using multithreading involves being much more careful than correctly using a single thread. An event loop is just the means of getting multiple things done without going beyond your single thread.
Considering the proliferation of parallel hardware, it would be ideal for multithreading or multiprocessing to get easier to do (and easier to do correctly). Actors, message passing, maybe even petri nets are some of the solutions people have attempted to solve this problem. They are still very marginal compared to the mainstream multithreading approach (pthreads). Another approach is SEDA, which uses multiple threads to run multiple event loops. This also hasn't caught on.
So, the people using event loops have probably decided that programmer time is worth more than CPU time, and the people using pthreads have probably decided the opposite, and the people exploring actors and such would like to value both kinds of time more highly (clearly insane, which is probably why no one listens to them).
The issue isn't really how heavyweight the threads are but the fact that to write correct multithreaded code you need locks on shared items and that prevents it from scaling with the number of threads because threads end up waiting for each other to gain locks and you rapidly reach the point where adding additional threads has no effect or even slows the system down as you get more lock contention.
In many cases you can avoid locking, but it's very difficult to get right, and sometimes you simply need a lock.
So if you are limited to a small number of threads, you might well find that removing the overhead of having to lock resources at all, or even think about it, makes a single threaded program faster than a multithreaded program no matter how many threads you add.
Basically locks can (depending on your program) be really expensive and can stop your program scaling beyond a few threads. And you almost always need to lock something.
It's not the overhead of a thread that's the problem, it's the synchronization between the threads. Even if you could switch between threads instantly, and had infinite memory none of that helps if each thread just ends up waiting in a queue for it's turn at some shared resource.

Concurrency: Processes vs Threads

What are the main advantages of using a model for concurrency based on processes over one
based on threads and in what contexts is the latter appropriate?
Fault-tolerance and scalability are the main advantages of using Processes vs. Threads.
A system that relies on shared memory or some other kind of technology that is only available when using threads, will be useless when you want to run the system on multiple machines. Sooner or later you will need to communicate between different processes.
When using processes you are forced to deal with communication via messages, for example, this is the way Erlang handles communication. Data is not shared, so there is no risk of data corruption.
Another advantage of processes is that they can crash and you can feel relatively safe in the knowledge that you can just restart them (even across network hosts). However, if a thread crashes, it may crash the entire process, which may bring down your entire application. To illustrate: If an Erlang process crashes, you will only lose that phone call, or that webrequest, etc. Not the whole application.
In saying all this, OS processes also have many drawbacks that can make them harder to use, like the fact that it takes forever to spawn a new process. However, Erlang has it's own notion of processes, which are extremely lightweight.
With that said, this discussion is really a topic of research. If you want to get into more of the details, you can give Joe Armstrong's paper on fault-tolerant systems]1 a read, it explains a lot about Erlang and the philosophy that drives it.
The disadvantage of using a process-based model is that it will be slower. You will have to copy data between the concurrent parts of your program.
The disadvantage of using a thread-based model is that you will probably get it wrong. It may sound mean, but it's true-- show me code based on threads and I'll show you a bug. I've found bugs in threaded code that has run "correctly" for 10 years.
The advantages of using a process-based model are numerous. The separation forces you to think in terms of protocols and formal communication patterns, which means its far more likely that you will get it right. Processes communicating with each other are easier to scale out across multiple machines. Multiple concurrent processes allows one process to crash without necessarily crashing the others.
The advantage of using a thread-based model is that it is fast.
It may be obvious which of the two I prefer, but in case it isn't: processes, every day of the week and twice on Sunday. Threads are too hard: I haven't ever met anybody who could write correct multi-threaded code; those that claim to be able to usually don't know enough about the space yet.
In this case Processes are more independent of eachother, while Threads shares some resources e.g. memory. But in a general case Threads are more light-weight than Processes.
Erlang Processes is not the same thing as OS Processes. Erlang Processes are very light-weight and Erlang can have many Erlang Processes within the same OS Thread. See Technically why is processes in Erlang more efficient than OS threads?
First and foremost, processes differ from threads mostly in the way their memory is handled:
Process = n*Thread + memory region (n>=1)
Processes have their own isolated memory.
Processes can have multiple threads.
Processes are isolated from each other on the operating system level.
Threads share their memory with their peers in the process.
(This is often undesirable. There are libraries and methods out there to remedy this, but that is usually an artificial layer over operating system threads.)
The memory thing is the most important discerning factor, as it has certain implications:
Exchanging data between processes is slower than between threads. Breaking the process isolation always requires some involvement of kernel calls and memory remapping.
Threads are more lightweight than processes. The operating system has to allocate resources and do memory management for each process.
Using processes gives you memory isolation and synchronization. Common problems with access to memory shared between threads do not concern you. Since you have to make a special effort to share data between processes, you will most likely sync automatically with that.
Using processes gives you good (or ultimate) encapsulation. Since inter process communication needs special effort, you will be forced to define a clean interface. It is a good idea to break certain parts of your application out of the main executable. Maybe you can split dependencies like that.
e.g. Process_RobotAi <-> Process_RobotControl
The AI will have vastly different dependencies compared to the control component. The interface might be simple: Process_RobotAI --DriveXY--> Process_RobotControl.
Maybe you change the robot platform. You only have to implement a new RobotControl executable with that simple interface. You don't have to touch or even recompile anything in your AI component.
It will also, for the same reasons, speed up compilation in most cases.
Edit: Just for completeness I will shamelessly add what the others have reminded me of :
A crashing process does not (necessarily) crash your whole application.
In General:
Want to create something highly concurrent or synchronuous, like an algorithm with n>>1 instances running in parallel and sharing data, use threads.
Have a system with multiple components that do not need to share data or algorithms, nor do they exchange data too often, use processes. If you use a RPC library for the inter process communication, you get a network-distributable solution at no extra cost.
1 and 2 are the extreme and no-brainer scenarios, everything in between must be decided individually.
For a good (or awesome) example of a system that uses IPC/RPC heavily, have a look at ros.

When is multi-threading not a good idea? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I was recently working on an application that sent and received messages over Ethernet and Serial. I was then tasked to add the monitoring of DIO discretes. I throught,
"No reason to interrupt the main
thread which is involved in message
processing, I'll just create
another thread that monitors DIO."
This decision, however, proved to be poor. Sometimes the main thread would be interrupted between a Send and a Receive serial message. This interruption would disrupt the timing and alas, messages would be lost (forever).
I found another way to monitor the DIO without using another thread and Ethernet and Serial communication were restored to their correct functionality.
The whole fiasco, however, got me thinking. Are their any general guidelines about when not to use multiple-threads and/or does anyone have anymore examples of situations when using multiple-threads is not a good idea?
**EDIT:Based on your comments and after scowering the internet for information, I have composed a blog post entitled When is multi-threading not a good idea?
On a single processor machine and a desktop application, you use multi threads so you don't freeze the app but for nothing else really.
On a single processor server and a web based app, no need for multi threading because the web server handles most of it.
On a multi-processor machine and desktop app, you are suggested to use multi threads and parallel programming. Make as many threads as there are processors.
On a multi-processor server and a web based app, no need again for multi threads because the web server handles it.
In total, if you use multiple threads for other than un-freezing desktop apps and any other generic answer, you will make the app slower if you have a single core machine due to the threads interrupting each other.
Why? Because of the hardware switches. It takes time for the hardware to switch between threads in total. On a multi-core box, go ahead and use 1 thread for each core and you will greatly see a ramp up.
To paraphrase an old quote: A programmer had a problem. He thought, "I know, I'll use threads." Now the programmer has two problems. (Often attributed to JWZ, but it seems to predate his use of it talking about regexes.)
A good rule of thumb is "Don't use threads, unless there's a very compelling reason to use threads." Multiple threads are asking for trouble. Try to find a good way to solve the problem without using multiple threads, and only fall back to using threads if avoiding it is as much trouble as the extra effort to use threads. Also, consider switching to multiple threads if you're running on a multi-core/multi-CPU machine, and performance testing of the single threaded version shows that you need the performance of the extra cores.
Multi-threading is a bad idea if:
Several threads access and update the same resource (set a variable, write to a file), and you don't understand thread safety.
Several threads interact with each other and you don't understand mutexes and similar thread-management tools.
Your program uses static variables (threads typically share them by default).
You haven't debugged concurrency issues.
Actually, multi threading is not scalable and is hard to debug, so it should not be used in any case if you can avoid it. There is few cases where it is mandatory : when performance on a multi CPU matters, or when you deal whith a server that have a lot of clients taking a long time to answer.
In any other cases, you can use alternatives such as queue + cron jobs or else.
You might want to take a look at the Dan Kegel's "The C10K problem" web page about handling multiple data sources/sinks.
Basically it is best to use minimal threads, which in sockets can be done in most OS's w/ some event system (or asynchronously in Windows using IOCP).
When you run into the case where the OS and/or libraries do not offer a way to perform communication in a non-blocking manner, it is best to use a thread-pool to handle them while reporting back to the same event loop.
Example diagram of layout:
Per CPU [*] EVENTLOOP ------ Handles nonblocking I/O using OS/library utilities
| \___ Threadpool for various blocking events
Threadpool for handling the I/O messages that would take long
Multithreading is bad except in the single case where it is good. This case is
The work is CPU Bound, or parts of it is CPU Bound
The work is parallelisable.
If either or both of these conditions are missing, multithreading is not going to be a winning strategy.
If the work is not CPU bound, then you are waiting not on threads to finish work, but rather for some external event, such as network activity, for the process to complete its work. Using threads, there is the additional cost of context switches between threads, The cost of synchronization (mutexes, etc), and the irregularity of thread preemption. The alternative in most common use is asynchronous IO, in which a single thread listens to several io ports, and acts on whichever happens to be ready now, one at a time. If by some chance these slow channels all happen to become ready at the same time, It might seem like you will experience a slow-down, but in practice this is rarely true. The cost of handling each port individually is often comparable or better than the cost of synchronizing state on multiple threads as each channel is emptied.
Many tasks may be compute bound, but still not practical to use a multithreaded approach because the process must synchronise on the entire state. Such a program cannot benefit from multithreading because no work can be performed concurrently. Fortunately, most programs that require enormous amounts of CPU can be parallelized to some level.
Multi-threading is not a good idea if you need to guarantee precise physical timing (like in your example). Other cons include intensive data exchange between threads. I would say multi-threading is good for really parallel tasks if you don't care much about their relative speed/priority/timing.
A recent application I wrote that had to use multithreading (although not unbounded number of threads) was one where I had to communicate in several directions over two protocols, plus monitoring a third resource for changes. Both protocol libraries required a thread to run the respective event loop in, and when those were accounted for, it was easy to create a third loop for the resource monitoring. In addition to the event loop requirements, the messages going through the wires had strict timing requirements, and one loop couldn't be risked blocking the other, something that was further alleviated by using a multicore CPU (SPARC).
There were further discussions on whether each message processing should be considered a job that was given to a thread from a thread pool, but in the end that was an extension that wasn't worth the work.
All-in-all, threads should if possible only be considered when you can partition the work into well defined jobs (or series of jobs) such that the semantics are relatively easy to document and implement, and you can put an upper bound on the number of threads you use and that need to interact. Systems where this is best applied are almost message passing systems.
In priciple everytime there is no overhead for the caller to wait in a queue.
A couple more possible reasons to use threads:
Your platform lacks asynchronous I/O operations, e.g. Windows ME (No completion ports or overlapped I/O, a pain when porting XP applications that use them.) Java 1.3 and earlier.
A third-party library function that can hang, e.g. if a remote server is down, and the library provides no way to cancel the operation and you can't modify it.
Keeping a GUI responsive during intensive processing doesn't always require additional threads. A single callback function is usually sufficient.
If none of the above apply and I still want parallelism for some reason, I prefer to launch an independent process if possible.
I would say multi-threading is generally used to:
Allow data processing in the background while a GUI remains responsive
Split very big data analysis onto multiple processing units so that you can get your results quicker.
When you're receiving data from some hardware and need something to continuously add it to a buffer while some other element decides what to do with it (write to disk, display on a GUI etc.).
So if you're not solving one of those issues, it's unlikely that adding threads will make your life easier. In fact it'll almost certainly make it harder because as others have mentioned; debugging mutithreaded applications is considerably more work than a single threaded solution.
Security might be a reason to avoid using multiple threads (over multiple processes). See Google chrome for an example of multi-process safety features.
Multi-threading is scalable, and will allow your UI to maintain its responsivness while doing very complicated things in the background. I don't understand where other responses are acquiring their information on multi-threading.
When you shouldn't multi-thread is a mis-leading question to your problem. Your problem is this: Why did multi-threading my application cause serial / ethernet communications to fail?
The answer to that question will depend on the implementation, which should be discussed in another question. I know for a fact that you can have both ethernet and serial communications happening in a multi-threaded application at the same time as numerous other tasks without causing any data loss.
The one reason to not use multi-threading is:
There is one task, and no user interface with which the task will interfere.
The reasons to use mutli-threading are:
Provides superior responsiveness to the user
Performs multiple tasks at the same time to decrease overall execution time
Uses more of the current multi-core CPUs, and multi-multi-cores of the future.
There are three basic methods of multi-threaded programming that make thread safety implemented with ease - you only need to use one for success:
Thread Safe Data types passed between threads.
Thread Safe Methods in the threaded object to modify data passed between.
PostMessage capabilities to communicate between threads.
Are the processes parallel? Is performance a real concern? Are there multiple 'threads' of execution like on a web server? I don't think there is a finite answer.
A common source of threading issues is the usual approaches employed to synchronize data. Having threads share state and then implement locking at all the appropriate places is a major source of complexity for both design and debugging. Getting the locking right to balance stability, performance, and scalability is always a hard problem to solve. Even the most experienced experts get it wrong frequently. Alternative techniques to deal with threading can alleviate much of this complexity. The Clojure programming language implements several interesting techniques for dealing with concurrency.

Resources