Is "event based" the same as "asynchronous"?
No it doesn't imply that the events are asynchronous.
In a event driven single threaded system, you can fire events, but they are all processed serially. They may yield as part of their processing, but nothing happens concurrently, if they yield, they stop processing and have to wait until they are messaged again to start processing again.
Examples of this are Swing ( Java ), Twisted ( Python ), Node.js ( JavaScript ), EventMachine ( Ruby )
All of these examples are event driven message loops, but they are all single threaded, every event will block all subsequent events on that same thread.
In programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
So just because something is event driven doesn't make it asynchronous, and just because something is asynchronous doesn't make it event driven either; much less concurrent.
They are essentially orthogonal concepts.
"event driven" essentially means that the code associated to certain function calls is bind at runtime (and can change through the execution).
Who "fires" the event doesn't know what will handle it, and who handle the event is defined to respond to the event through an association defined while the program executes. Typically though function pointers, reference or pointers to object carrying virtual methods etc.)
"asynchronous" means that a program flow doesn't have to wait for a call to be executed before proceed over (mostly implemented with a call that returns immediately after delegating the execution to another thread or process)
Not all events are asynchronous (think to the windows SendMessage, respect to PostMessage), and not all asynchronous calls are necessary implemented by "events" (although the use of the event mechanism is quite common to implement asynchronous calls)
One meaning of asynchronous is that at a point where you emit an computation, you don't wait for an answer, but you get the answer later. The answer comes in orthogonal to you normal control flow.
One way the answer comes in is using events: They happen spontaneously in this case, without your code triggering them. In a handler you may process the result.
Whereas the computation and answer is connected by the point in control flow for synchronous mode, you have to do the connection yourself in asynchronous mode. For example by use of a sequence number or something.
Related
I've been using various concurrency constructs for a while now without much consideration for how all the magic happens, which has recently made me increasingly uneasy.
In an attempt to remedy this ... feeling, I have been reading up on how async works under the hood. When I say async, in this case I'm referring to userland / greenthread / cooperative multitasking, although I assume some of the concepts will also apply to traditional OS managed threads insofar as a scheduler and workers are involved.
I see how a worker can suspend itself and let other workers execute, but at the lowest level in non-blocking library code, how does the scheduler know when a previously suspended worker's job is done and to wake up that worker?
For example if you fire up a worker in some sort of async block and perform an operation that would normally block (e.g. HTTP request, SQL query, other I/O), then even though your calling code is async, that operation (library code) better play nice with your async framework or you've effectively defeated the purpose of using it and blocked your scheduler from calling other waiting operations (the, What Color is Your Function problem) while it waits for your blocking call, which was executed inside your non-blocking calling code, to complete.
So now we've got async code calling other async library code, and now I'm asking myself the question all over again - how does the async library code know when to suspend and resume operation?
The idea of firing off a HTTP request, moving on, and returning later to check for results is weird to think about for me - not conceptually but from an implementation standpoint.
How do you perform a partial operation, e.g. sending TCP packets and then continuing with the rest of the program execution, only to come back later and check if results have been delivered. Delivered to what? A socket?
Now we're another layer deep and you are using socket selects to avoid creating threads and blocking, but, again...
how do those sockets start an operation, move on before completion, and then how does select know when data is available?
Are you continually checking some buffer to see if bytes have been delivered in an infinite loop and moving on if not?
Anyhow - I think you see where I'm going here....
I focused mainly on HTTP as a motivating example, but the same question applies for any normally blocking operations - how does it all work at the bottom?
Here are some of the resources I found helpful while researching the topic and which informed this question:
David Beazley's excellent video Build Your Own Async where he walks you through a simple implementation of a scheduler which fire callbacks and suspend execution by sleeping on a waiting queue. I found this video tremendously instructive, but it stops a bit short in that it shows you how using an async sleep frees up the scheduler to execute other workers, but doesn't really go into what would happen when you call code in those workers that itself must be non-blocking so it plays nice with the scheduler.
How does non-blocking IO work under the hood - This got me further along in my understanding, but still left with a few uncertainties.
Are there explicit considerations about the latency of any single request in the node.js event loop? AFAICT every IO call returns an eventEmitter which emits an event. The processing of all the events is multiplexed through the use of a pipe. So it is possible that the event that needs to be processed for an important request may be placed too far back into the pipe. Is there some sort of priority queue that can be used to schedule the order of execution of eventHandlers ?
Here's why I asked this question in the first place. I decided to give a gist.github link because the reason is long and related to the technical question
It's not clear exactly what you're asking here. Your Javascript does not directly add things to the event queue (that is only done with native code). Instead, you call some async operation and the native code behind that operation adds something to the event queue when the async operation completes.
This article The Node.js Event Loop, Timers, and process.nextTick() gives you a lot of details about how the event queue is serviced and how it handles different types of events (timers, I/O, etc...).
In general things are FIFO (first in, first out) within an event type with some exceptions.
process.nextTick() will run its callback BEFORE waiting I/O events.
setImmediate() will runs its callback AFTER waiting I/O events.
More detail here: setImmediate vs. nextTick and nextTick vs setImmediate, visual explanation and setTimeout vs. setImmediate vs. nextTick
So it is possible that the event that needs to be processed for an important request may be placed too far back into the pipe.
You'd have to show us the specific situation you're concerned about. If you yourself are scheduling a callback with setTimeout(), setImmediate() or process.nextTick(), then you have some control over when it happens by which of these three you pick. If you aren't scheduling it yourself (e.g. it's the completion callback of some async operation), then you don't control it's scheduling in the event loop. It will go into the sub-queue that matches its type and be served FIFO from that phase or the event loop (as described in the above articles).
Is there some sort of priority queue that can be used to schedule the order of execution of eventHandlers ?
There is no exposed priority system. Within an event type, things are FIFO. Again, if you give us an actual coding example so we can see exactly what you're trying to do, we could offer some help on what your choices are. You may be able to use the setTimeout(), setImmediate() and process.nextTick() tools that are already available or you may want to implement your own task queuing and prioritization system that runs off some of the above three methods that allows you to prioritize things that are already queued yourself.
About priorities of execution in the event loop:
setImmediate() runs before setTimeout(fn, 0)
nextTick() triggers the callback on next tick (iteration)
Natively the event loop in node.js does not support priorities. You can always implement your own priority queue or use an existing one like here and assign your functions to the priority queue.
I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.
Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.
Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.
To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!
Event-driven and asynchronous are often used as synonyms. Are there any differences between the two?
Also, what is the difference between epoll and aio? How do they fit together?
Lastly, I've read many times that AIO in Linux is horribly broken. How exactly is it broken?
Thanks.
Events is one of the paradigms to achieve asynchronous execution.
But not all asynchronous systems use events. That is about semantic meaning of these two - one is super-entity of another.
epoll and aio use different metaphors:
epoll is a blocking operation (epoll_wait()) - you block the thread until some event happens and then you dispatch the event to different procedures/functions/branches in your code.
In AIO, you pass the address of your callback function (completion routine) to the system and the system calls your function when something happens.
Problem with AIO is that your callback function code runs on the system thread and so on top of the system stack. A few problems with that as you can imagine.
They are completely different things.
The events-driven paradigm means that an object called an "event" is sent to the program whenever something happens, without that "something" having to be polled in regular intervals to discover whether it has happened. That "event" may be trapped by the program to perform some actions (i.e. a "handler") -- either synchronous or asynchronous.
Therefore, handling of events can either be synchronous or asynchronous. JavaScript, for example, uses a synchronous eventing system.
Asynchronous means that actions can happen independent of the current "main" execution stream. Mind you, it does NOT mean "parallel", or "different thread". An "asynchronous" action may actually run on the main thread, blocking the "main" execution stream in the meantime. So don't confuse "asynchronous" with "multi-threading".
You may say that, technically speaking, an asynchronous operation automatically assumes eventing -- at least "completed", "faulted" or "aborted/cancelled" events (one or more of these) are sent to the instigator of the operation (or the underlying O/S itself) to signal that the operation has ceased. Thus, async is always event-driven, but not the other way round.
Event driven is a single thread where events are registered for a certain scenario. When that scenario is faced, the events are fired. However even at that time each of the events are fired in a sequential manner. There is nothing Asynchronous about it. Node.js (webserver) uses events to deal with multiple requests.
Asynchronous is basically multitasking. It can spawn off multiple threads or processes to execute a certain function. It's totally different from event driven in the sense that each thread is independent and hardly interact with the main thread in an easy responsive manner. Apache (webserver) uses multiple threads to deal with incoming requests.
Lastly, I've read many times that AIO in Linux is horribly broken. How exactly is it broken?
AIO as done via KAIO/libaio/io_submit comes with a lot of caveats and is tricky to use well if you want it to behave rather than silently blocking (e.g. only works on certain types of fd, when using files/block devices only actually works for direct I/O but those are the tip of the iceberg). It did eventually gain the ability to indicate file descriptor readiness with the 4.19 kernel) which is useful for programs using sockets.
POSIX AIO on Linux is actually a userspace threads implementation by glibc and comes with its own limitations (e.g. it's considered slow and doesn't scale well).
These days (2020) hope for doing arbitrary asynchronous I/O on Linux with less pain and tradeoffs is coming from io_uring...
Does an asynchronous call always create a new thread? What is the difference between the two?
Does an asynchronous call always create or use a new thread?
Wikipedia says:
In computer programming, asynchronous events are those occurring independently of the main program flow. Asynchronous actions are actions executed in a non-blocking scheme, allowing the main program flow to continue processing.
I know async calls can be done on single threads? How is this possible?
Whenever the operation that needs to happen asynchronously does not require the CPU to do work, that operation can be done without spawning another thread. For example, if the async operation is I/O, the CPU does not have to wait for the I/O to complete. It just needs to start the operation, and can then move on to other work while the I/O hardware (disk controller, network interface, etc.) does the I/O work. The hardware lets the CPU know when it's finished by interrupting the CPU, and the OS then delivers the event to your application.
Frequently higher-level abstractions and APIs don't expose the underlying asynchronous API's available from the OS and the underlying hardware. In those cases it's usually easier to create threads to do asynchronous operations, even if the spawned thread is just waiting on an I/O operation.
If the asynchronous operation requires the CPU to do work, then generally that operation has to happen in another thread in order for it to be truly asynchronous. Even then, it will really only be asynchronous if there is more than one execution unit.
This question is darn near too general to answer.
In the general case, an asynchronous call does not necessarily create a new thread. That's one way to implement it, with a pre-existing thread pool or external process being other ways. It depends heavily on language, object model (if any), and run time environment.
Asynchronous just means the calling thread doesn't sit and wait for the response, nor does the asynchronous activity happen in the calling thread.
Beyond that, you're going to need to get more specific.
No, asynchronous calls do not always involve threads.
They typically do start some sort of operation which continues in parallel with the caller. But that operation might be handled by another process, by the OS, by other hardware (like a disk controller), by some other computer on the network, or by a human being. Threads aren't the only way to get things done in parallel.
JavaScript is single-threaded and asynchronous. When you use XmlHttpRequest, for example, you provide it with a callback function that will be executed asynchronously when the response returns.
John Resig has a good explanation of the related issue of how timers work in JavaScript.
Multi threading refers to more than one operation happening in the same process. While async programming spreads across processes. For example if my operations calls a web service, The thread need not wait till the web service returns. Here we use async programming which allows the thread not wait for a process in another machine to complete. And when it starts getting response from the webservice it can interrupt the main thread to say that web service has completed processing the request. Now the main thread can process the result.
Windows always had asynchronous processing since the non preemptive times (versions 2.13, 3.0, 3.1, etc) using the message loop, way before supporting real threads. So to answer your question, no, it is not necessary to create a thread to perform asynchronous processing.
Asynchronous calls don't even need to occur on the same system/device as the one invoking the call. So if the question is, does an asynchronous call require a thread in the current process, the answer is no. However, there must be a thread of execution somewhere processing the asynchronous request.
Thread of execution is a vague term. In a cooperative tasking systems such as the early Macintosh and Windows OS'es, the thread of execution could simply be the same process that made the request running another stack, instruction pointer, etc... However, when people generally talk about asynchronous calls, they typically mean calls that are handled by another thread if it is intra-process (i.e. within the same process) or by another process if it is inter-process.
Note that inter-process (or interprocess) communication (IPC) is commonly generalized to include intra-process communication, since the techniques for locking, and synchronizing data are usually the same regardless of what process the separate threads of execution run in.
Some systems allow you to take advantage of the concurrency in the kernel for some facilities using callbacks. For a rather obscure instance, asynchronous IO callbacks were used to implement non-blocking internet severs back in the no-preemptive multitasking days of Mac System 6-8.
This way you have concurrent execution streams "in" you program without threads as such.
Asynchronous just means that you don't block your program waiting for something (function call, device, etc.) to finish. It can be implemented in a separate thread, but it is also common to use a dedicated thread for synchronous tasks and communicate via some kind of event system and thus achieve asynchronous-like behavior.
There are examples of single-threaded asynchronous programs. Something like:
...do something
...send some async request
while (not done)
...do something else
...do async check for results
The nature of asynchronous calls is such that, if you want the application to continue running while the call is in progress, you will either need to spawn a new thread, or at least utilise another thread you that you have created solely for the purposes of handling asynchronous callbacks.
Sometimes, depending on the situation, you may want to invoke an asynchronous method but make it appear to the user to be be synchronous (i.e. block until the asynchronous method has signalled that it is complete). This can be achieved through Win32 APIs such as WaitForSingleObject.