Real-life examples of async non-blocking loop vs. multithreading? - multithreading

I have been developing in Node.js recently and have a good idea of what is going on as far as the event loop. Given that I had experience with javascript, Node made sense for me to use, but I wonder, has anyone ever stopped using a multithreading system and went to async for PERFORMANCE? Or made a choice to go with async instead of multithreading for performance?
What are real-life examples of async non-blocking I/O triumphing over multithreading in the real world?

Regardless you do multithreading or not, all modern operating system I/O operations are inherently async. So saying that "async is faster" is not actually true. However, async programming allows you to conserve threads with almost the same performance. That's important on a server because number of threads you can get is finite (limited by available memory).
So with async you don't necessarily get better performance but better scalability on I/O heavy tasks. Async could even perform worse on CPU-intensive tasks since you don't benefit of parallelizing work between server cores. I/O on the other hand, works with interrupts and DMA which does not involve CPU. That's how you can get good-enough performance because you can keep executing until hardware notifies you of I/O completion by signaling an interrupt.
Edit: I just figured out that I did not answer your actual question. But as long as you know how async benefits you, you may not need real world examples on that. Just know that:
communicating with database is I/O
reading from or writing to disk is I/O
sending/receiving packets over network is I/O
the rest is CPU
Depending on if you use I/O or CPU more async could give you great scalability or worse performance. Typically web applications are I/O intensive, which benefits from async programming.

It is easier to code, because you do not need to manage consistencies/data exchange between multiple processes/threads.
There is no real advantage for this model when creating just a simple web application, but when having a server managing multiple connections at the same time, where the data that is passed is somehow tied together.
Perfomance-wise there are some advantages because having multiple threads/processes tend to use way more memory than just a single thread.
Simple math: Having one thread handling one connection and each thread needs 1MB in RAM it is very easy to figure out how much connection you could on one computer. With node.js you have one process handling multiple connections and it takes for example 20MB in RAM and 4kb per new connection. I think you get the idea.
Async I/O is not faster, but it helps you consume fewer resources while doing the same thing multple times at the same time.


What is I/O Scaling Problem in nodeJS Performance?

Can you please explain what is the I/O Scaling Problem in Node.js Performance?
I am reading the book "Basarat Ali Syed - Beginning Node.js-Apress" but the explanation is not enough to I/O Scaling.
A server typically has a mix of computational things to do and I/O things to do (getting data from somewhere like a database or a disk or another server). In today's modern servers with pretty fast multi-core processors, it is more common for a given server request to be limited by I/O than by CPU.
So, if you're going to scale a server to be able to handle lots of requests and to handle them with good performance, you have to find a way to be able to most efficiently handle lots of I/O requests because that's probably what your server is limited by. This would be the "I/O scaling problem". How to scale your server and code architecture to handle lots of I/O requests very efficiently.
It so happens that the node.js single-threaded architecture with asynchronous I/O is very efficient at doing lots of I/O and can be more efficient than other server architectures that use multiple threads and blocking I/O calls.
If you go to your Table of Contents in that book, you will see the following:
Understanding Node.js Performance
The I/O Scaling Problem
Traditional Web Servers Using a Process Per Request
Traditional Web Servers Using a Thread Pool
The Nginx way
Node.js Performance Secret
I don't have the book myself, but I would presume that "The I/O Scaling Problem" section of the book describes it for you. And, then you can read about the node.js performance secret for how it handles this. The servers that use a process or a thread per request take more system resources to have lots of requests in flight at the same time (which is one key to handle lots of requests). The node.js non-blocking I/O model, on the other hand, is very efficient at handling lots and lots of in-flight requests.

Why would I choose a threaded/process-based approach vs. asynchronous web server

As I've done some more research into web server software, I've begun to question if Apache's thread/process based method is the way to go vs. the the asynchronous request handling provided by servers like Nginx a Lighttpd, which tend to scale better with heavier loads.
I understand there are many other differences between these latter two and Apache. My question is under what circumstances would I pick a thread/process based method over the asynchronous handling.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.
This is old, but worth answering. Let's first start by saying how each model works.
In threaded, you have a request come in to a handler, the handler spawns a new OS thread to handle that request, and all work for that request happens in that thread until a response is sent and the thread is ended. This model supports as many concurrent requests as threads that your server can spawn (but threads can be somewhat heavyweight).
When doing async a request comes in to a handler but instead of creating a thread to deal with it, it adds the connection to what's known as an event loop. The event loop listens for data/state changes on the connection and fires callbacks each time "something" happens. Once the connection is added to the event loop, the handler immediately listens for new connections to add. This allows you to have many (sometimes 100K) concurrent connections at the same time.
Are there any features/technologies that I can't use with an asynchronous method (or would function poorly/not as well)?
Yes, when you're doing number crunching. The architecture of an async (or "evented") system is such that it is great at passing data around but not processing data. It can handle thousands of concurrent operations, but because it only runs on one OS thread, the callbacks it fires need to do as little as possible to get the most throughput. This is because if one of your callbacks does some number crunching that takes 5 seconds, your entire server is frozen for 5 seconds until that operation completes. The idea is to get data, send it to where it's going (database, API, etc) and send a response all with minimal processing.
Async is good for network I/O: passing data between multiple sources/destinations (and also user interfaces, but that's beyond this post).
What situations would cause the performance of an asynchronous method to perform worse than a thread/process based approach? Are these common or rare cases, and how big is the difference?
See above, but any time you're doing more CPU work than network I/O, you should switch to a threaded model. However, there are architecture workarounds...for instance, you could have an async app, and anytime it needs to do real work, it sends a job to a worker queue. However, if every request requires CPU processing then that architecture is overkill and you might as well just use a threaded server.
Are there any other factors I should take into consideration when comparing the two? Keep in mind I'm focusing mainly on the thread/process based method vs. asynchronous, not any particular server software which happens to utilize one of these methods. These concerns might be difficulty of managing/debugging, security issues, etc.
Programming in async is generally more complicated than threaded. That said, if you're not doing the programming yourself (ie you're choosing between nginx and apache) then I usually recommend you go async (nginx) because you'll generally be able to squeeze more juice out of your server that way. I'm always in favor of using as much async in the stack as possible.
That said, if you're programming an app and trying to decide whether to use a threaded or async model, you will have to take developer time into account. Unless you're using a language that has green threads over an event loop (like scheme), expect to tear your hair out quite a bit over rogue exceptions crashing your entire app and in general wrapping your head around CPS/using callbacks for everything. Futures/promises are your friend, but are only a bandaid to make async nicer.
Async, when used in a server, can squeeze (a lot) more concurrent operations than threading if you're doing network IO and nothing else.
If you're doing any kind of number crunching, either use a threaded app server or use an async app with a background queuing system.
Async is a lot harder to program in unless your language supports "fake" threading over it (ie green threads). Once you get past the initial hump you're fine, generally. If you don't have green threads, use promises.
If you have the choice between threaded and async as a component in your stack (apache vs nginx), and they provide the exact same features, slightly favor async. Don't just pick it because you think it will make everything 20x faster though.
Processes have several advantages compared to threads and async models related to security and reliability. Most websites don't need these particular advantages, but sometimes they're indispensable.
Security: you can run your worker processes in a sandbox, as a low privileged user, and handle only one request per worker process. This mitigates against some kinds of security vulnerabilities: even if an attacker takes over your entire worker process, as long as you sandboxed it tightly based on request metadata (i.e. it doesn't have write access to all your data), then it can't harm system stability or affect the responses made to requests.
Security #2: sometimes you need to sandbox untrusted code, or to enforce segregation between different code or different requests, and the only way to do this is with a separate one-shot process. (Think running user-provided code.)
Reliability: memory leaks and memory corruption are much less severe if you teardown and replace worker processes regularly (or for each request).
It's easy to enforce hard limits on CPU time, disk and network quota, etc. spent on handling a user request in a separate process. Even if the request-handling code goes into an infinite loop, the master process (or the OS) can enforce a timeout.

Redis is single-threaded, then how does it do concurrent I/O?

Trying to grasp some basics of Redis I came across an interesting blog post .
The author states:
Redis is single-threaded with epoll/kqueue and scale indefinitely in terms of I/O concurrency.
I surely misunderstand the whole threading thing, because I find this statement puzzling. If a program is single-threaded, how does it do anything concurrently? Why it is so great that Redis operations are atomic, if the server is single-threaded anyway?
Could anybody please shed some light on the issue?
Well it depends on how you define concurrency.
In server-side software, concurrency and parallelism are often considered as different concepts. In a server, supporting concurrent I/Os means the server is able to serve several clients by executing several flows corresponding to those clients with only one computation unit. In this context, parallelism would mean the server is able to perform several things at the same time (with multiple computation units), which is different.
For instance a bartender is able to look after several customers while he can only prepare one beverage at a time. So he can provide concurrency without parallelism.
This question has been debated here:
What is the difference between concurrency and parallelism?
See also this presentation from Rob Pike.
A single-threaded program can definitely provide concurrency at the I/O level by using an I/O (de)multiplexing mechanism and an event loop (which is what Redis does).
Parallelism has a cost: with the multiple sockets/multiple cores you can find on modern hardware, synchronization between threads is extremely expensive. On the other hand, the bottleneck of an efficient storage engine like Redis is very often the network, well before the CPU. Isolated event loops (which require no synchronization) are therefore seen as a good design to build efficient, scalable, servers.
The fact that Redis operations are atomic is simply a consequence of the single-threaded event loop. The interesting point is atomicity is provided at no extra cost (it does not require synchronization). It can be exploited by the user to implement optimistic locking and other patterns without paying for the synchronization overhead.
OK, Redis is single-threaded at user-level, OTOH, all asynchronous I/O is supported by kernel thread pools and/or split-level drivers.
'Concurrent', to some, includes distributing network events to socket state-machines. It's single-threaded, runs on one core, (at user level), so I would not refer to this as concurrent. Others differ..
'scale indefinitely in terms of I/O concurrency' is just being economical with the truth. They may get more belief if they said 'can scale better than one-thread-per-client, providing the clients don't ask for much', though they may then feel obliged to add 'blown away on heavy loading by other async solutions that use all cores at user level'.

Why are event-based network applications inherently faster than threaded ones?

We've all read the benchmarks and know the facts - event-based asynchronous network servers are faster than their threaded counterparts. Think lighttpd or Zeus vs. Apache or IIS. Why is that?
I think event based vs thread based is not the question - it is a nonblocking Multiplexed I/O, Selectable sockets, solution vs thread pool solution.
In the first case you are handling all input that comes in regardless of what is using it- so there is no blocking on the reads- a single 'listener'. The single listener thread passes data to what can be worker threads of different types- rather than one for each connection. Again, no blocking on writing any of the data- so the data handler can just run with it separately. Because this solution is mostly IO reads/writes it doesn't occupy much CPU time- thus your application can take that to do whatever it wants.
In a thread pool solution you have individual threads handling each connection, so they have to share time to context switch in and out- each one 'listening'. In this solution the CPU + IO ops are in the same thread- which gets a time slice- so you end up waiting on IO ops to complete per thread (blocking) which could traditionally be done without using CPU time.
Google for non-blocking IO for more detail- and you can prob find some comparisons vs. thread pools too.
(if anyone can clarify these points, feel free)
Event-driven applications are not inherently faster.
From Why Events Are a Bad Idea (for High-Concurrency Servers):
We examine the claimed strengths of events over threads and show that the
weaknesses of threads are artifacts of specific threading implementations
and not inherent to the threading paradigm. As evidence, we present a
user-level thread package that scales to 100,000 threads and achieves
excellent performance in a web server.
This was in 2003. Surely the state of threading on modern OSs has improved since then.
Writing the core of an event-based server means re-inventing cooperative multitasking (Windows 3.1 style) in your code, most likely on an OS that already supports proper pre-emptive multitasking, and without the benefit of transparent context switching. This means that you have to manage state on the heap that would normally be implied by the instruction pointer or stored in a stack variable. (If your language has them, closures ease this pain significantly. Trying to do this in C is a lot less fun.)
This also means you gain all of the caveats cooperative multitasking implies. If one of your event handlers takes a while to run for any reason, it stalls that event thread. Totally unrelated requests lag. Even lengthy CPU-invensive operations have to be sent somewhere else to avoid this. When you're talking about the core of a high-concurrency server, 'lengthy operation' is a relative term, on the order of microseconds for a server expected to handle 100,000 requests per second. I hope the virtual memory system never has to pull pages from disk for you!
Getting good performance from an event-based architecture can be tricky, especially when you consider latency and not just throughput. (Of course, there are plenty of mistakes you can make with threads as well. Concurrency is still hard.)
A couple important questions for the author of a new server application:
How do threads perform on the platforms you intend to support today? Are they going to be your bottleneck?
If you're still stuck with a bad thread implementation: why is nobody fixing this?
It really depends what you're doing; event-based programming is certainly tricky for nontrivial applications. Being a web server is really a very trivial well understood problem and both event-driven and threaded models work pretty well on modern OSs.
Correctly developing more complex server applications in an event model is generally pretty tricky - threaded applications are much easier to write. This may be the deciding factor rather than performance.
It isn't about the threads really. It is about the way the threads are used to service requests. For something like lighttpd you have a single thread that services multiple connections via events. For older versions of apache you had a process per connection and the process woke up on incoming data so you ended up with a very large number when there were lots of requests. Now however with MPM apache is event based as well see apache MPM event.

When to use asynchronous operations in asio

When should I use asynchronous operations in boost::asio instead of synchronous operations in seperate threads?
Does the Rationale section help?
Most programs interact with the outside world in some way, whether it be via a file, a network, a serial cable, or the console. Sometimes, as is the case with networking, individual I/O operations can take a long time to complete. This poses particular challenges to application development.
Boost.Asio provides the tools to manage these long running operations, without requiring programs to use concurrency models based on threads and explicit locking.
I would strongly urge you to use a asynchronous approach whenever possible. A asynchronous call doesn't necessarily create a thread, so by sticking with an asynchronous operation you may reduce the overhead that is associated with threads. In addition threads are usually harder to develop and maintain.
Hope it helps.
