How does event-driven programming help a webserver that only does IO? - multithreading

I'm considering a few frameworks/programming methods for our new backend project. It regards a BackendForFrontend implementation, which aggregates downstream services. For simplicity, these are the steps it goes trough:
Request comes into the webserver
Webserver makes downstream request
Downstream request returns result
Webserver returns request
How is event-driven programming better than "regular" thread-per-request handling? Some websites try to explain, and it often comes down to something like this:
The second solution is a non-blocking call. Instead of waiting for the answer, the caller continues execution, but provides a callback that will be executed once data arrives.
What I don't understand: we need a thread/handler to await this data, right? Its nice that the event handler can continue, but we still need (in this example) a thread/handler per request that awaits each downstream request, right?
Consider this example: the downstream requests take n seconds to return. In this n seconds, r requests come in. In the thread-per-request we need r threads: one for every request. After n seconds pass, the first thread is done processing and available for a new request.
When implementing a event-driven design, we need r+1 threads: an event loop and r handlers. Each handler takes a request, performs it, and calls the callback once done.
So how does this improve things?

What I don't understand: we need a thread/handler to await this data,
right?
Not really. The idea behind NIO is that no threads ever get blocked.
It is interesting because the operating system already works in a non-blocking way. It is our programming languages that were modeled in a blocking manner.
As an example, imagine that you had a computer with a single CPU. Any I/O operation that you do will be orders of magnitude slower than the CPU, right?. Say you want to read a file. Do you think the CPU will stay there, idle, doing nothing while the disk head goes and fetches a few bytes and puts them in the disk buffer? Obviously not. The operating system will register an interruption (i.e. a callback) and will use the valuable CPU for something else in the mean time. When the disk head has managed to read a few bytes and made them available to be consumed, an interruption will be triggered and the OS will then give attention to it, restore the previous process block and allocate some CPU time to handle the available data.
So, in this case, the CPU is like a thread in your application. It is never blocked. It is always doing some CPU-bound stuff.
The idea behind NIO programming is the same. In the case you're exposing, imagine that your HTTP server has a single thread. When you receive a request from your client you need to make an upstream request (which represents I/O). So what a NIO framework would do here is to issue the request and register a callback for when the response is available.
Immediately after that your valuable single thread is released to attend yet another request, which is going to register another callback, and so on, and so on.
When the callback resolves, it will be automatically scheduled to be processed by your single thread.
As such, that thread works as an event loop, one in which you're supposed to schedule only CPU bound stuff. Every time you need to do I/O, that's done in a non-blocking way and when that I/O is complete, some CPU-bound callback is put into the event loop to deal with the response.
This is a powerful concept, because with a very small amount threads you can process thousands of requests and therefore you can scale more easily. Do more with less.
This feature is one of the major selling points of Node.js and the reason why even using a single thread it can be used to develop backend applications.
Likewise this is the reason for the proliferation of frameworks like Netty, RxJava, Reactive Streams Initiative and the Project Reactor. They all are seeking to promote this type of optimization and programming model.
There is also an interesting movement of new frameworks that leverage this powerful features and are trying to compete or complement one another. I'm talking of interesting projects like Vert.x and Ratpack. And I'm pretty sure there are many more out there for other languages.

The whole idea of non-blocking paradigm is achieved by this idea called
"Event Loop"
Interesting references:
http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch06lev1sec2.html
Understanding the Event Loop
https://www.youtube.com/watch?v=8aGhZQkoFbQ

Related

How Do Callbacks work in Non-blocking Design?

Looked at a few other questions but didn't quite find what I was looking for. Im using Scala but my questions is very high level and so is hopefully agnostic of any languages.
A regular scenario:
Thread A runs a function and there is some blocking work to be done (say a DB call).
The function has some non-blocking code (eg. Async block in Scala) to cause some sort of 'worker' Thread B (in a different pool) to pick up the I/O task.
The method in Thread A completes returning a Future which will eventually contain the result and Thread A is returned to its pool to quickly pick up another request to process.
Q1. Some thread somewhere usually has to wait?
My understanding of non-blocking architectures is that the common approach is to still have some Thread waiting/blocking on the I/O work somewhere - its just a case of having different pools which have access to different cores so that a small number of request processing threads can manage a large number of concurrent requests without ever waiting on a CPU core.
Is this a correct general understanding?
Q2. How the callback works ?
In the above scenario - Thread B that is doing the I/O work will run the callback function (provided by Thread A) if/when the I/O work has completed - which completes the Future with some Result.
Thread A is now off doing something else and has no association any more with the original request. How does the Result in the Future get sent back to the client socket? I understand that different languages have different implementations of such a mechanism but at a high level my current assumption is that (regardless of the language/framework) some framework/container objects must always be doing some sort of orchestration so that when a Future task is completed the Result gets sent back to the original socket handling the request.
I have spent hours trying to find articles which will explain this but every article seems to just deal with real low-level details. I know Im missing some details but i am having difficulty asking my question because Im not quite sure which parts Im missing :)
My understanding of non-blocking architectures is that the common approach is to still have some Thread waiting/blocking on the I/O work somewhere
If a thread is getting blocked somewhere, it is not really a non-blocking architecture. So no, that's not really a correct understanding of it. That doesn't mean that this is necessarily bad. Sometimes you just have to deal with blocking (using JDBC, for example). It would be better to push it off into a fixed thread pool designated for blocking, rather than allowing the entire application to suffer thread starvation.
Thread A is now off doing something else and has no association any more with the original request. How does the Result in the Future get sent back to the client socket?
Using Futures, it really depends on the ExecutionContext. When you create a Future, where the work is done depends on the ExecutionContext.
val f: Future[?] = ???
val g: Future[?] = ???
f and g are created immediately, and the work is submitted to a task queue in the ExecutionContext. We cannot guarantee which will actually execute or complete first in most cases. What you do with the values matters is well. Obviously if you use an Await to wait for the completion of the Futures, then we block the current thread. If we map them and do something with the values, then we again need another ExecutionContext to submit the task to. This gives us a chain of tasks that are asynchronously getting submitted and re-submitted to the executor for execution every time we manipulate the Future.
Eventually there needs to be some onComplete at the end of that chain to return the pass along that value to something, whether it's writing to stream, or something else. ie., it is probably out of the hands of the original thread.
Q1: No, at least not at the user code level. Hopefully your async I/O ultimately comes down to an async kernel API (e.g. select()). Which in turn will be using DMA to do the I/O and trigger an interrupt when it's done. So it's async at least down to the hardware level.
Q2: Thread B completes the Future. If you're using something like onComplete, then thread B will trigger that (probably by creating a new task and handing that task off to a thread pool to pick it up later) as part of the completing call. If a different thread has called Await to block on the Future, it will trigger that thread to resume. If nothing has accessed the Future yet, nothing in particular happens - the value sits there in the Future until something uses it. (See PromiseCompletingRunnable for the gritty details - it's surprisingly readable).

How Node.js event loop model scales well

I know this question has been discussed in the past in much details (How is Node.js inherently faster when it still relies on Threads internally?) but I still fail to properly understand node.js event loop model and being a single threaded model how it handles concurrent requests.
Uptil now my understanding is : We receive an IO request --> a thread is spawned internally by node.js and IO request is handed to it --> since this is an IO request so CPU hands it to DMA controller and frees this thread --> this thread again goes into the thread pool to serve a different request --> DMA is still doing the IO, once DMA get all the data a sort of event is fired --> this event is captured by the node.js system and it puts the supplied callback function on the event loop --> whenever event loop get the opportunity it executed the callback on the data fetched by the IO -- > thanks to closures, callback function executes on the data fetched by the callback only
So this process goes on repeatedly. Please someone elucidate on my understand and provide some information
There is only one thread (the main thread) for dealing with network I/O (file I/O is a slightly different story because not all platforms provide usable asynchronous, non-blocking file I/O APIs, so the synchronous file I/O APIs are used on those platforms in a threadpool).
So when network requests come in, they're all handled by the main thread which uses (indirectly via libuv) epoll/kqueue/IOCP/etc. for detecting (in a non-blocking way) when data is available (or when there is an incoming TCP connection for example). If there is data available, it calls out appropriately to javascript as needed, passing the socket data. If there is no data on the socket (and there's nothing else for the event loop to do, e.g. firing timers), then execution proceeds to the next iteration of the event loop where the process starts all over again.
As far as associating socket data with socket javascript objects goes, it's the combination of C++ wrapper objects (e.g. tcp_wrap, udp_wrap, etc.) and javascript objects that makes sure the data gets to the appropriate place.
Here's a slightly older diagram that explains what happens in a single cycle of node's event loop. Some of it may have changed slightly since node v0.9, but it gets you the general idea:
node.js has a single threaded model which eliminates the need for locks and semaphores (used in the traditional multithreaded model). Locks and semaphores can add some costs in terms of performance and, more importantly, can provide a lot of rope to hang yourself with (in other words, many pitfalls). IO operations happen in parallel and because work between IOs is typically very small, this single threaded model usually works quite nicely.
(side note: if you have an app that does a lot of work between IO operations, i.e. CPU intense apps, that is a case where node doesn't not scale well)
I like to think of the argument for why node's model scales well is the same as why people think NoSQL scales better than SQL databases. Obviously Java (multi-threaded) and SQL scale; big companies like Facebook and Twitter have proven that. However, like in SQL, there are a lot of things you could do incorrectly to slow down your performance. Node.js doesn't eliminate all potential problems, it just does a good job of restricting many of the common causes.

How does Asynchronous programming work in a single threaded programming model?

I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.
Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.
Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.
To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!

measuring http request time with node.js

I use node.js to send an http request. I have a requirement to measure how much time it took.
start = getTime()
http.send(function(data) {end=getTime()})
If I call getTime inside the http response callback, there is the risk that my callback is not being called immediately when the response cames back due to other events in the queue. Such a risk also exists if I use regular java or c# synchronous code for this task, since maybe another thread got attention before me.
start = getTime()
http.send()
end=getTime()
How does node.js compares to other (synchronous) platform - does it make my chance for a good measure better or worse?
Great observations!
Theory:
If you are performing micro-benchmarking, there exists a number of considerations which can potentially skew the measurements:
Other events in the event loop which are ready to fire along with the http send in question, and get executed sequentially before the send get a chance - node specific.
Thread / Process switching which can happen any time within the span of send operation - generic.
Kernel’s I/O buffers being in limited volume causes arbitrary delays - OS / workload / system load specific.
Latency incurred in gathering the system time - language / runtime specific.
Chunking / Buffering of data: socket [ http implementation ] specific.
Practice:
Noe suffers from (1), while a dedicated thread of Java / C# do not have this issue. But as node implements an event driven non-blocking I/O model, other events will not cause blocking effects, rather will be placed into the event queue. Only the ones which are ready will get fired, and the latency incurred due to them will be a function of how much I/O work they have to carry out, and any CPU bound actions performed in their associated callbacks. These, in practice, would become negligible and evened out in the comparison, due to the more visible effects of items (2) to (5). In addition, writes are generally non-blocking, which means they will be carried out without waiting for the next loop iteration to run. And finally, when the write is carried out, the callback is issued in-line and sequentially, there is no yielding to another event in between.
In short, if you compare a dedicated Java thread doing blocking I/O with a Node code, you will see Java measurements good, but in large scale applications, the thread context switching effort will offset this gain, and the node performance will stand out.
Hope this helps.

Thread vs async execution. What's different?

I believed any kind of asynchronous execution makes a thread in invisible area. But if so,
Async codes does not offer any performance gain than threaded codes.
But I can't understand why so many developers are making many features async form.
Could you explain about difference and cost of them?
The purpose of an asynchronous execution is to prevent the code calling the asynchronous method (the foreground code) from being blocked. This allows your foreground code to go on doing useful work while the asynchronous thread is performing your requested work in the background. Without asynchronous execution, the foreground code must wait until the background task is completed before it can continue executing.
The cost of an asynchronous execution is the same as that of any other task running on a thread.
Typically, an async result object is registered with the foreground code. The async result object can either raise an event when the background task is completed, or the foreground code can periodically check the async result object to see if its completion flag has been set.
Concurrency does not necessarily require threads.
In Linux, for example, you can perform non-blocking syscalls. Using this type of calls, you can for instance start a number of network reads. Your code can keep track of the reads manually (using handles in a list or similar) and periodically ask the OS if new data is available on any of the connections. Internally, the OS also keeps a list of ongoing reads. Using this technique, you can thus achieve concurrency without any (extra) threads, neither in your program nor in the OS.
If you use threads and blocking IO, you would typically start one thread per read. In this scenario, the OS will instead have a list of ongoing threads, which it parks when the tread tries to read data when there is none available. Threads are resumed as data becomes available.
Having the OS switch between threads might involve slightly more overhead in the form of context switching - switching program counter and register content. But the real deal breaker is usually stack allocation per thread. This size is a couple of megabytes by default on Linux. If you have a lot of concurrency in your program, this might push you in the direction of using non-blocking calls to handle more concurrency per thread.
So it is possible to do async programming without threads. If you want to do async programming using only blocking OS-calls you need to dedicate a thread to do the blocking while you continue. But if you use non-blocking calls you can do a lot of concurrent things with just a single thread. Have a look at Node.js, which have great support for many concurrent connections while being single-threaded for most operations.
Also check out Golang, which achieve a similar effect using a sort of green threads called goroutines. Multiple goroutines run concurrently on the same OS thread and they are restrictive in stack memory, pushing the limit much further.
Async codes does not offer any performance gain than threaded codes.
Asynchornous execution is one of the traits of multi-threaded execution, which is becoming more relevant as processors are packing in more cores.
For servers, multi-core only mildly relevant, as they are already written with concurrency in mind and will scale natrually, but multi-core is particularly relevant for desktop apps, which traditionally do only a few things concurrently - often just one foreground task with a background thread. Now, they have to be coded to do many things concurrently if they are to take advantage of the power of the multi-core cpu.
As to the performance - on single-core - the asynchornous tasks slow down the system as much as they would if run sequentially (this a simplication, but true for the most part.) So, running task A, which takes 10s and task B which takes 5s on a single core, the total time needed will be 15s, if B is run asynchronously or not. The reason is, is that as B runs, it takes away cpu resources from A - A and B compete for the same cpu.
With a multi-core machine, additional tasks run on otherwise unused cores, and so the situation is different - the additional tasks don't really consume any time - or more correctly, they don't take away time from the core running task A. So, runing tasks A and B asynchronously on multi-core will conume just 10s - not 15s as with single core. B's execution runs at the same time as A, and on a separate core, so A's execution time is unaffected.
As the number of tasks and cores increase, then the potential improvements in performance also increase. In parallel computing, exploiting parallelism to produce an improvement in performance is known as speedup.
we are already seeing 64-core cpus, and it's esimated that we will have 1024 cores commonplace in a few years. That's a potential speedup of 1024 times, compared to the single-threaded synchronous case. So, to answer your question, there clearly is a performance gain to be had by using asynchronous execution.
I believed any kind of asynchronous execution makes a thread in invisible area.
This is your problem - this actually isn't true.
The thing is, your whole computer is actually massively asynchronous - requests to RAM, communication via a network card, accessing a HDD... those are all inherently asynchronous operations.
Modern OSes are actually built around asynchronous I/O. Even when you do a synchronous file request, for example (e.g. File.ReadAllText), the OS sends an asynchronous request. However, instead of giving control back to your code, it blocks while it waits for the response to the asynchronous request. And this is where proper asynchronous code comes in - instead of waiting for the response, you give the request a callback - a function to execute when the response comes back.
For the duration of the asynchronous request, there is no thread. The whole thing happens on a completely different level - say, the request is sent to the firmware on your NIC, and given a DMA address to fill the response. When the NIC finishes your request, it fills the memory, and signals an interrupt to the processor. The OS kernel handles the interrupt by signalling the owner application (usually an IOCP "channel") the request is done. This is still all done with no thread whatsoever - only for a short time right at the end, a thread is borrowed (in .NET this is from the IOCP thread pool) to execute the callback.
So, imagine a simple scenario. You need to send 100 simultaneous requests to a database engine. With multi-threading, you would spin up a new thread for each of those requests. That means a hundred threads, a hundread thread stacks, the cost of starting a new thread itself (starting a new thread is cheap - starting a hundred at the same time, not so much), quite a bit of resources. And those threads would just... block. Do nothing. When the response comes, the threads are awakened, one after another, and eventually disposed.
On the other hand, with asynchronous I/O, you can simply post all the requests from a single thread - and register a callback when each of those is finished. A hundred simultaneous requests will cost you just your original thread (which is free for other work as soon as the requests are posted), and a short time with threads from the thread pool when the requests are finished - in "worst" case scenario, about as many threads as you have CPU cores. Provided you don't use blocking code in the callback, of course :)
This doesn't necessarily mean that asynchronous code is automatically more efficient. If you only need a single request, and you can't do anything until you get a response, there's little point in making the request asynchronous. But most of the time, that's not your actual scenario - for example, you need to maintain a GUI in the meantime, or you need to make simultaneous requests, or your whole code is callback-based, rather than being written synchronously (a typical .NET Windows Forms application is mostly event-based).
The real benefit from asynchronous code comes from exactly that - simplified non-blocking UI code (no more "(Not Responding)" warnings from the window manager), and massively improved parallelism. If you have a web server that handles a thousand requests simultaneously, you don't want to waste 1 GiB of address space just for the completely unnecessary thread stacks (especially on a 32-bit system) - you only use threads when you have something to do.
So, in the end, asynchronous code makes UI and server code much simpler. In some cases, mostly with servers, it can also make it much more efficient. The efficiency improvements come precisely from the fact that there is no thread during the execution of the asynchronous request.
Your comment only applies to one specific kind of asynchronous code - multi-threaded parallelism. In that case, you really are wasting a thread while executing a request. However, that's not what people mean when saying "my library offers an asynchronous API" - after all, that's a 100% worthless API; you could have just called await Task.Run(TheirAPIMethod) and gotten the exact same thing.

Resources