Asynchronous IO: How does the thread get contacted after IO is complete?

Asynchronous IO: How does the thread get contacted after IO is complete? - node.js

So on an asynchronous framework such as Node or Netty, a worker thread can be given an IO job, which it initiates along with a callback. Then it returns and picks up a different task while that IO job, be it disk read, DB query, etc, runs.
My question is, after the IO is done, how is that event/callback picked up for further processing? I'm assuming in a synchronous operation, there is a thread right there waiting. But in an asynchronous environment, what picks up the completion of the IO, along with the response data? Does the worker thread periodically check for completion? Or does something register the completion event somehow with Node or Netty?
Sorry for lumping Netty and Node together, I'm assuming they do this similarly.

In Netty, any IO operation is asynchronous,and the read or write method will return a Future Object which can add Listener, after future is done , the listener will call back. In listener, you can do anything you would like.

Related

Can two callbacks execute the code at the same time(Parrallely) or not?

I am doing an IO wait operation inside a for loop now the thing is when all of the operations terminates I want to send the response to the server. Now I was just wondering that suppose two IO operation terminates exactly at the same time now can they execute code at the same time(parallel) or will they execute serially?
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.

node.js runs Javascript with a single thread. That means that two pieces of Javascript can never be running at the exact same moment.
node.js processes I/O completion using an event queue. That means when an I/O operation completes, it places an event in the event queue and when that event gets to the front of the event queue and the JS interpreter has finished whatever else it was doing, then it will pull that event from the event queue and call the callback associated with it.
Because of this process, even if two I/O operations finish at basically the same moment, one of them will put its completion event into the event queue before the other (access to the event queue internally is likely controlled by a mutex so one will get the mutex before the other) and that one's completion callback will get into the event queue first and then called before the other. The two completion callbacks will not run at the exact same time.
Keep in mind that more than one piece of Javascript can be "in flight" or "in process" at the same time if it contains non-blocking I/O operations or other asynchronous operations. This is because when you "wait" for an asynchronous operation to complete in Javscript, you return control back to the system and you then resume processing only when your completion callback is called. While the JS interpreter is waiting for an asynchronous I/O operation to complete and the associated callback to be called, then other Javascript can run. But, there's still only one piece of Javascript actually ever running at a time.
As far as I know, as Node is Concurrent but not the Parallel language so I don't think they will execute at the same time.
Yes, that's correct. That's not exactly how I'd describe it since "concurrent" and "parallel" don't have strict technical definitions, but based on what I think you mean by them, that is correct.

you can use Promise.all :
let promises = [];
for(...)
{
promises.push(somePromise); // somePromise represents your IO operation
}
Promise.all(promises).then((results) => { // here you send the response }
You don't have to worry about the execution order.

Node.js is designed to be single thread. So basically there is no way that 'two IO operation terminates exactly at the same time' could happen. They will just finish one by one.

thread with a forever loop with one inherently asynch operation

I'm trying to understand the semantics of async/await in an infinitely looping worker thread started inside a windows service. I'm a newbie at this so give me some leeway here, I'm trying to understand the concept.
The worker thread will loop forever (until the service is stopped) and it processes an external queue resource (in this case a SQL Server Service Broker queue).
The worker thread uses config data which could be changed while the service is running by receiving commands on the main service thread via some kind of IPC. Ideally the worker thread should process those config changes while waiting for the external queue messages to be received. Reading from service broker is inherently asynchronous, you literally issue a "waitfor receive" TSQL statement with a receive timeout.
But I don't quite understand the flow of control I'd need to use to do that.
Let's say I used a concurrentQueue to pass config change messages from the main thread to the worker thread. Then, if I did something like...
void ProcessBrokerMessages() {
foreach (BrokerMessage m in ReadBrokerQueue()) {
ProcessMessage(m);
}
}
// ... inside the worker thread:
while (!serviceStopped) {
foreach (configChange in configChangeConcurrentQueue) {
processConfigChange(configChange);
}
ProcessBrokerMessages();
}
...then the foreach loop to process config changes and the broker processing function need to "take turns" to run. Specifically, the config-change-processing loop won't run while the potentially-long-running broker receive command is running.
My understanding is that simply turning the ProcessBrokerMessages() into an async method doesn't help me in this case (or I don't understand what will happen). To me, with my lack of understanding, the most intuitive interpretation seems to be that when I hit the async call it would go off and do its thing, and execution would continue with a restart of the outer while loop... but that would mean the loop would also execute the ProcessBrokerMessages() function over and over even though it's already running from the invocation in the previous loop, which I don't want.
As far as I know this is not what would happen, though I only "know" that because I've read something along those lines. I don't really understand it.
Arguably the existing flow of control (ie, without the async call) is OK... if config changes affect ProcessBrokerMessages() function (which they can) then the config can't be changed while the function is running anyway. But that seems like it's a point specific to this particular example. I can imagine a case where config changes are changing something else that the thread does, unrelated to the ProcessBrokerMessages() call.
Can someone improve my understanding here? What's the right way to have
a block of code which loops over multiple statements
where one (or some) but not all of those statements are asynchronous
and the async operation should only ever be executing once at a time
but execution should keep looping through the rest of the statements while the single instance of the async operation runs
and the async method should be called again in the loop if the previous invocation has completed
It seems like I could use a BackgroundWorker to run the receive statement, which flips a flag when its job is done, but it also seems weird to me to create a thread specifically for processing the external resource and then, within that thread, create a BackgroundWorker to actually do that job.

You could use a CancelationToken. Most async functions accept one as a parameter, and they cancel the call (the returned Task actually) if the token is signaled. SqlCommand.ExecuteReaderAsync (which you're likely using to issue the WAITFOR RECEIVE is no different. So:
Have a cancellation token passed to the 'execution' thread.
The settings monitor (the one responding to IPC) also has a reference to the token
When a config change occurs, the monitoring makes the config change and then signals the token
the execution thread aborts any pending WAITFOR (or any pending processing in the message processing loop actually, you should use the cancellation token everywhere). any transaction is aborted and rolled back
restart the execution thread, with new cancellation token. It will use the new config

So in this particular case I decided to go with a simpler shared state solution. This is of course a less sound solution in principle, but since there's not a lot of shared state involved, and since the overall application isn't very complicated, it seemed forgivable.
My implementation here is to use locking, but have writes to the config from the service main thread wrapped up in a Task.Run(). The reader doesn't bother with a Task since the reader is already in its own thread.

Single thread synchronous and asynchronous confusion

Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}

In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).

in Node.js event loop - will callbacks to say i/o events say 'new HTTP request received' be executed in the 'I/O Callbacks' phase OR 'Poll' phase

I see that Event loop typically phases through the following cycle in each iteration: Timers -> I/O Callbacks -> idle -> Poll -> Check -> Close
as per the official Node.js docs https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/ .
Now it also says 'I/O Callbacks' phase executes callbacks for some system operations such as types of TCP errors' for example typical operations like TCP connection errors.
and in the 'Poll' phase - it says 'retrieve new I/O events' . like incoming connections , data, etc
I am confused. in which phase are I/O events (callback handlers) like 'new HTTP Request received', 'data received from database per previous query' are executed?

Generally
Generally, you shouldn't care about these phases. Even the distinction between a microtask (like nextTick) and a macrotask (like setImmediate) is not very important for every day NodeJS developers. That article is an in depth look at how Node handles thing internally.
All a user typically has to care about is that when they register a request - the callback they provide will eventually be called at some later time in the future which is typically fast enough as long as they don't "block the event loop" by doing a lot of synchronous CPU bound work on the Node process.
Your specific question
in which phase are I/O events (callback handlers) like 'new HTTP Request received', 'data received from database per previous query' are executed?
They are executed in the poll phase:
If the poll queue is not empty, the event loop will iterate through its queue of callbacks executing them synchronously until either the queue has been exhausted, or the system-dependent hard limit is reached.
Note that the poll phase is not the only place where I/O callbacks might be executed. The (somewhat poorly named) I/O callbacks phase is also in charge of some callbacks. This is due to how libuv works and should be transparent in your code. In addition - some libraries (like DB libraries) might do their own scheduling and run callback code inside a timer (in the timers phase) - and some asynchronous callbacks like close callbacks run in their own phase.

There is nothing to be confused about. All async callbacks are executed in the callback phase. It is one of the only two places where the interpreter execute javascript: when the script starts and in the callback phase.

When does nodeJS check for the callback queue for executing callbacks?

As I understand , a Node JS server continues to listen on a port for any incoming requests which means the thread is continuously busy ? When does it break from this continuous never ending loop and check if there are any events to be processed from the call back queue?
2) When Node JS starts executing code of callback functions, the server is essentially stopped? It is not listening for further requests? I mean since only single thread is going to do both the task only one can be done at a time?
Is this understanding correct or there's more to it?

Yes, you are right. Everything in nodejs runs on the main thread except the I/o calls and fs file calls which go into to OS kernel and thread pool respectively for execution. All the synchronous code runs on the main thread and while this code is running, nodejs cannot process any further requests. That is why it is not advisable to use a long for loop because it may block the main thread for much time. You need to make child threads to handle that.

Node thread keeps an event loop and whenever any task get completed, it fires the corresponding event which signals the event listener function to get executed. The event loop simply iterate over the event queue which is basically a list of events and callbacks of completed operations. there is generally a main loop that listens for events, and then triggers a callback function when one of those events is detected.
(source: abdelraoof.com)
Similar event loop questions are here:
Node.js Event loop
Understanding the Event Loop
Source:
http://abdelraoof.com/blog/2015/10/28/understanding-nodejs-event-loop/
http://www.tutorialspoint.com/nodejs/nodejs_event_loop.htm
http://chimera.labs.oreilly.com/books/1234000001808/ch03.html#chap3_id35941348

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string