I've read some articles about this stuff. However, I still get stuck in a point. For example, I have two function:
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
People said that asynchronous programming is recommended in IO bound. However, I can't see anything different in this case. The Async one seems to be more ugly.
What's wrong with my point?
nodejs is asynchronous by default which mean that it won't execute your statement in order like in other language for example
database.query("SELECT * FROM hugetable", function(rows) {
var result = rows;
});
console.log("Hello World");
In other language, it will wait until the query statement finish execution.
But in nodejs, it will execute the query statement separately and continue execute to log Hello World to the screen.
so when you say
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
it will return data before db.query return data
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
but in node.js way the function that pass as parameter is called callback which mean it will call whenever the getDataAsync() finish its execution
We use callback in nodejs because we don't know when db.query() finishes its execution (as they don't finish execution in order) but when it finishes it will call the callback.
In your first example, the thread will get blocked at this point, until the data is retrieved from the db
db.query("some query");
In the second example, the thread will not get blocked but it will be available to serve more requests.
{
return JSON.stringify(result);
}
This function will be called as soon as the data is available from the db.
That's why node.js is called non-blocking IO which means your thread is never blocked.
Asynchronous is used when some operation blocks the execution. It is not an problem in Multi thread application or server. But NodeJS is single threaded application which should not blocked by single operation. If it is blocked by operation like db.query("some query");, Node.js will wait to finish it.
It is similar to you just stands idle in front of rice cooker until it is cooked. Generally, you will do other activities while rice is cooking. When whistle blows, you can do anything with cooked rice. Similarly NodeJS will sends the asychronous operation to event loop which will intimate us when operation is over. Meanwhile Node.js can do processing other operation like serving other connections.
You said Ugly. Asynchronous does not mean to use only callbacks. You can use Promise, co routine or library async.
Related
Node is single-threaded, but there are a lot of functions(modules like http, fs) that allow us to do a background task and the event loop takes care of executing the callbacks.
However, is this true for a database connection?
Let's say I have the following code.
const mysql = require('mysql');
function callDatabase(id) {
var result;
var connection = mysql.createConnection(
{
host : '192.168.1.14',
user : 'root',
password : '',
database : 'test'
}
);
connection.connect();
var queryString = 'SELECT name FROM test WHERE id = 1';
connection.query(queryString, function(err, rows, fields) {
if (err) throw err;
for (var i in rows) {
result = rows[i].name;
}
connection.end();
return result;
});
}
Does, mysql.createConnection, connection.connect, connection.query, connection.end spin up a new thread to execute in the background, leaving Node to run the remaining synchronous code?
If yes, in what queue will the callback be enqueued and how to write this sort of code such that a background task is initiated.
Anything that may be blocking (file system operations, network connections, etc) are generally asynchronous in Node, in order to avoid blocking on the main thread. That these functions take a parameter for a callback function is a sure hint that you have asynchronous operations (or "background tasks") in progress.
You don't show it in your sample code, but connect() and end() do take callback functions so you know when a connection is actually made or ends. It looks like the mysql library, however, also maintains an internal queue to make sure you can't attempt a query until a connection has been made and that only one operation at a time can be executed.
Note that createConnection() does not have a callback function. All it does is create a new data structure (connection) that gets used. It doesn't do any I/O itself, so doesn't need to run asynchronously.
Also note that you don't generally "spin up" your own threads. Node takes care of this thread management for you (largely by running things on the main worker thread), for the most part, and hides how threads themselves work for most developers. You typically hear that Node is "single threaded", and you should treat it this way.
Modern Node code makes extensive use of async/await and Promises to do this sort of thing. Slightly older code uses callback functions. Even older code uses Node events. In reality - if you dig far enough down, they're all using events and possibly presenting the simplified (more modern) interfaces.
The mysql module appears to date from the "callback" era and hasn't yet been updated for Promises/async/await. Under the covers, as noted, it uses Node events to track network (or unix domain socket) connections and transfers.
I'm writing a personal project in Node and I'm trying to figure out when a task should be asynchronously splitted. Let's say I have this "4-Step-Task", they are not very expensive (the most expensive its the one who iterates over an array of objects and trying to match a URL with a RegExp, and the array probably won't have more than 20 or 30 objects).
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
The other way will be just executing one after another, but nothing else will progress until this task is done. With the above approach, others tasks can progress at least a little bit between each part.
Is there any criteria about when this approach should be prefered over a classic synchronous one?
Sorry my bad english, not my native language.
All you've described is synchronous code that isn't very long to run. First off, there's no reason to even use promises for that type of code. Secondly, there's no reason to break it up into chunks. All you would be doing with either of those choices is making the code more complicated to write, more complicated to test and more complicated to understand and it would also run slower. All of those are undesirable.
If you force even synchronous code into a promise, then a .then() handler will give some other code a chance to run between .then() handlers, but only certain types of events can be run there because processing a resolved promise is one of the highest priority things to do in the event queue system. It won't, for example, allow another incoming http request arriving on your server to start to run.
If you truly wanted to allow other requests to run and so on, you would be better off just putting the code (without promises) into a WorkerThread and letting it run there and then communicate back the result via messaging. If you wanted to keep it in the main thread, but let any other code run, you'd probably have to use a short setTimeout() delay to truly let all possible other types of tasks run in between.
So, if this code doesn't take much time to run, there's just really no reason to mess with complicating it. Just let it run in the fastest, quickest and simplest way.
If you want more concrete advice, then please show some actual code and provide some timing information about how long it takes to run. Iterating through an array of 20-30 objects is nothing in the general scheme of things and is not a reason to rewrite it into timesliced pieces.
As for code that iterates over an array/list of items doing matching against some string, this is exactly what the Express web server framework does on every incoming URL to find the matching routes. That is not a slow thing to do in Javascript.
Asynchronous programming is a better fit for code that must respond to events – for example, any kind of graphical UI. An example of a situation where programmers use async but shouldn't is any code that can focus entirely on data processing and can accept a “stop-the-world” block while waiting for data to download.
I use it extensivly with a rest API server as we have no idea of how long a request can take to for a server to respond . So in order for us not to "block the app" while waiting for the server response async requests are most useful
part1().then(y => {
doTheSecondPart
}).then(z => {
doTheThirdPart
}).then(c => {
doTheFourthPart
});
As you have described in your sample is much more of a synchronous procedural process that would not necessarily allow your interface to still work while your algorithm is busy with a process
In the case of a server call, if you still waiting for server to respond the algorithm using then is still using up resources and wont free your app up to run any other user interface events, while its waiting for the process to reach the next then statement .
You should use Async Await in this instance where you waiting for a user event or a server to respond but do not want your app to hang while waiting for server data...
async function wait() {
await new Promise(resolve => setTimeout(resolve,2000));
console.log("awaiting for server once !!")
return 10;
}
async function wait2() {
await new Promise(resolve => setTimeout(resolve,3000));
console.log("awaiting for server twice !!")
return 10;
}
async function f() {
let promise = new Promise((resolve, reject) => {
setTimeout(() => resolve("done!"), 1000)
});
let result = await promise;//.then(async function(){
console.log(result)
let promise6 = await wait();
let promise7 = await wait2();
//}); // wait until the promise resolves (*)
//console.log(result); // "done!"
}
f();
This sample should help you gain a basic understanding of how async/ Await works and here are a few resources to research it
Promises and Async
Mozilla Refrences
I'am trying to invoke the same function parallely, but with different arguements. I used Promise.all, but that doesn't seem to run the tasks parallely. I tried using bluebird, but still it seems that the executions are happening sequentially only. PFB the snippet of code and logs.
let records = await getRecords(query);
if (_.size(records) > 0) {
bluebird.map(records, function (record) {
return prepareFileContent(record.MESSAGE_PAYLOAD);
}, { concurrency: records.length }).then(function (data) {
finalData = data;
console.log("done");
});
}
export async function prepareFileContent(payload : string) : Promise<string>{
return new Promise<string>(function(resolve,reject){
try{
console.log("content generation starts");
//logic goes here
console.log("content generation ends");
resolve(details);
}
catch(err)
{
log.error("Error in parsing the payload:", err);
reject(err);
}
});`
Logs look something like this which shows that they are executed sequentially and not parallely. (from the time here, each one takes 4 seconds for execution)
2018-04-16T08:47:53.095Z content generation starts
2018-04-16T08:47:57.819Z content generation ends
2018-04-16T08:47:57.820Z content generation starts
2018-04-16T08:48:02.253Z content generation ends
2018-04-16T08:48:02.254Z content generation starts
2018-04-16T08:48:06.718Z content generation ends
2018-04-16T08:48:06.718Z content generation starts
2018-04-16T08:48:11.163Z content generation ends
2018-04-16T08:48:11.163Z content generation starts
2018-04-16T08:48:15.573Z content generation ends
2018-04-16T08:48:15.574Z content generation starts
Can someone help me out on how to achieve the same parallely and what am I missing here?
First off, node.js Javascript is single threaded. So, no two pieces of Javascript are ever truly run in parallel. When people speak of things running parallel, that only really applies to asynchronous operations that have a native code component such as networking operations, file operations, etc...
It appears that you're operating under an assumption that promises and functions like Bluebird's Promise.map() enable parallel operation. That is only true if the underlying operations you're monitoring with promises (your prepareFileContent() function in your example) are actually capable of running by themselves outside of the Javascript interpreter. But, the code from your function prepareFileContent() that you show us is just Javascript so it can't ever run in parallel with anything else. Remember, node.js runs your Javascript single threaded so it can't run two pieces of Javascript at the same time, ever.
So, your output is exactly as expected. bluebird.map() iterates through the array, calling your callback on each item in the array and collecting a promise from each function call. Then, it waits for all the promises to be done and collects all the resolved results into an array for you.
But, each of your callbacks is synchronous. They don't have any asynchronous part to them so all your code ends up running synchronously. Nothing runs in parallel.
What I Have:
I have a nodejs express server get endpoint that in turn calls other APIs that are time consuming(say about 2 seconds). I have called this function with a callback such that the res.send is triggered as a part of the call back. The res.send object packs an object that will be created after the results from these time consuming API calls is performed. So my res.send can only be sent when I have the entire information from the API call.
Some representative code.
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then
{
// do some operations
callback(response) // callback given with some data from the promise
}
}
app.get('/',function(req, res){
someFunctionCall(params, function(err, data){
res.send(JSON.stringify(data))
}
}
What I want
I want my server to be able to handle other parallel incoming get requests without being blocked due to the REST api calls in the other function. But the problem is that the callback will only be issued when the promises are fulfilled,each of those operations are async, but my thread will wait till the execution of all of them. And Node does not accept the next get request without executing the res.send or the res.end of the previous request. This becomes an issues when I have multiple requests coming in, each one is executed one after another.
Note: I do not want to go with the cluster method, I just want to know if it is possible to this without it.
You are apparently misunderstanding how node.js, asynchronous operations and promises work. Assuming your long running asynchronous operations are all properly written with asynchronous I/O, then neither your requestAsync1(params) or requestAsync2(params) calls are blocking. That means that while you are waiting for Promise.all() to call its .then() handler to signify that both of those asynchronous operations are complete, node.js is perfectly free to run any other events or incoming requests. Promises themselves do not block, so the node.js event system is free to process other events. So, you either don't have a blocking problem at all or if you actually do, it is not caused by what you asked about here.
To see if your code is actually blocking or not, you can temporarily add a simple timer that outputs to the console like this:
let startTime;
setInterval(function() {
if (!startTime) {
startTime = Date.now();
}
console.log((Date.now() - startTime) / 1000);
}, 100)
This will output a simple relative timestamp every 100ms when the event loop is not blocked. You would obviously not leave this in your code for production code, but it can be useful to show you when/if your event loop is blocked.
I do see an odd syntax issue in the code you included in your question. This code:
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then
{
// do some operations
callback(response) // callback given with some data from the promise
}
}
should be expressed like this:
someFunctionCall(params, callback)
{
// do some asyncronous requests.
Promise.all([requestAsync1(params),requestAsync2(params)]).then(function(response)
{
// do some operations
callback(response) // callback given with some data from the promise
}
});
But, an even better design would be to just return the promise and not switch back to a plain callback. Besides allowing the caller to use the more flexible promises scheme, you are also "eating" errors that may occur in either or your async operations. It's suggest this:
someFunctionCall(params) {
// do some asyncronous requests.
return Promise.all([requestAsync1(params),requestAsync2(params)]).then(function(results) {
// further processing of results
// return final resolved value of the promise
return someValue;
});
}
Then, then caller would use this like:
someFunctionCall(params).then(function(result) {
// process final result here
}).catch(function(err) {
// handle error here
});
Assume makeBurger() will take 10 seconds
In synchronous program,
function serveBurger() {
makeBurger();
makeBurger();
console.log("READY") // Assume takes 5 seconds to log.
}
This will take a total of 25 seconds to execute.
So for NodeJs lets say we make an async version of makeBurgerAsync() which also takes 10 seconds.
function serveBurger() {
makeBurgerAsync(function(count) {
});
makeBurgerAsync(function(count) {
});
console.log("READY") // Assume takes 5 seconds to log.
}
Since it is a single thread. I have troubling imagine what is really going on behind the scene.
So for sure when the function run, both async functions will enter event loops and console.log("READY") will get executed straight away.
But while console.log("READY") is executing, no work is really done for both async function right? Since single thread is hogging console.log for 5 seconds.
After console.log is done. CPU will have time to switch between both async so that it can run a bit of each function each time.
So according to this, the function doesn't necessarily result in faster execution, async is probably slower due to switching between event loop? I imagine that, at the end of the day, everything will be spread on a single thread which will be the same thing as synchronous version?
I am probably missing some very big concept so please let me know. Thanks.
EDIT
It makes sense if the asynchronous operations are like query DB etc. Basically nodejs will just say "Hey DB handle this for me while I'll do something else". However, the case I am not understanding is the self-defined callback function within nodejs itself.
EDIT2
function makeBurger() {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
return count;
}
function makeBurgerAsync(callback) {
var count = 0;
count++; // 1 time
...
count++; // 999999 times
callback(count);
}
In node.js, all asynchronous operations accomplish their tasks outside of the node.js Javascript single thread. They either use a native code thread (such as disk I/O in node.js) or they don't use a thread at all (such as event driven networking or timers).
You can't take a synchronous operation written entirely in node.js Javascript and magically make it asynchronous. An asynchronous operation is asynchronous because it calls some function that is implemented in native code and written in a way to actually be asynchronous. So, to make something asynchronous, it has to be specifically written to use lower level operations that are themselves asynchronous with an asynchronous native code implementation.
These out-of-band operations, then communicate with the main node.js Javascript thread via the event queue. When one of these asynchronous operations completes, it adds an event to the Javascript event queue and then when the single node.js thread finishes what it is currently doing, it grabs the next event from the event queue and calls the callback associated with that event.
Thus, you can have multiple asynchronous operations running in parallel. And running 3 operations in parallel will usually have a shorter end-to-end running time than running those same 3 operations in sequence.
Let's examine a real-world async situation rather than your pseudo-code:
function doSomething() {
fs.readFile(fname, function(err, data) {
console.log("file read");
});
setTimeout(function() {
console.log("timer fired");
}, 100);
http.get(someUrl, function(err, response, body) {
console.log("http get finished");
});
console.log("READY");
}
doSomething();
console.log("AFTER");
Here's what happens step-by-step:
fs.readFile() is initiated. Since node.js implements file I/O using a thread pool, this operation is passed off to a thread in node.js and it will run there in a separate thread.
Without waiting for fs.readFile() to finish, setTimeout() is called. This uses a timer sub-system in libuv (the cross platform library that node.js is built on). This is also non-blocking so the timer is registered and then execution continues.
http.get() is called. This will send the desired http request and then immediately return to further execution.
console.log("READY") will run.
The three asynchronous operations will complete in an indeterminate order (whichever one completes it's operation first will be done first). For purposes of this discussion, let's say the setTimeout() finishes first. When it finishes, some internals in node.js will insert an event in the event queue with the timer event and the registered callback. When the node.js main JS thread is done executing any other JS, it will grab the next event from the event queue and call the callback associated with it.
For purposes of this description, let's say that while that timer callback is executing, the fs.readFile() operation finishes. Using it's own thread, it will insert an event in the node.js event queue.
Now the setTimeout() callback finishes. At that point, the JS interpreter checks to see if there are any other events in the event queue. The fs.readfile() event is in the queue so it grabs that and calls the callback associated with that. That callback executes and finishes.
Some time later, the http.get() operation finishes. Internal to node.js, an event is added to the event queue. Since there is nothing else in the event queue and the JS interpreter is not currently executing, that event can immediately be serviced and the callback for the http.get() can get called.
Per the above sequence of events, you would see this in the console:
READY
AFTER
timer fired
file read
http get finished
Keep in mind that the order of the last three lines here is indeterminate (it's just based on unpredictable execution speed) so that precise order here is just an example. If you needed those to be executed in a specific order or needed to know when all three were done, then you would have to add additional code in order to track that.
Since it appears you are trying to make code run faster by making something asynchronous that isn't currently asynchronous, let me repeat. You can't take a synchronous operation written entirely in Javascript and "make it asynchronous". You'd have to rewrite it from scratch to use fundamentally different asynchronous lower level operations or you'd have to pass it off to some other process to execute and then get notified when it was done (using worker processes or external processes or native code plugins or something like that).