Synchronous Blocks in Node.js - node.js

I want to know if it's possible to have synchronous blocks in a Node.js application. I'm a total newbie to Node, but I couldn't find any good answers for the behavior I'm looking for specifically.
My actual application will be a little more involved, but essentially what I want to do is support a GET request and a POST request. The POST request will append an item to an array stored on the server and then sort the array. The GET request will return the array. Obviously, with multiple GETs and POSTs happening simultaneously, we need a correctness guarantee. Here's a really simple example of what the code would theoretically look like:
var arr = [];
app.get(/*URL 1*/, function (req, res) {
res.json(arr);
});
app.post(/*URL 2*/, function (req, res) {
var itemToAdd = req.body.item;
arr.push(itemToAdd);
arr.sort();
res.status(200);
});
What I'm worried about is a GET request returning the array before it is sorted but after the item is appended or, even worse, returning the array as it's being sorted. In a language like Java I would probably just use a ReadWriteLock. From what I've read about Node's asynchrony, it doesn't appear that arr will be accessed in a way that preserves this behavior, but I'd love to be proven wrong.
If it's not possible to modify the code I currently have to support this behavior, are there any other alternatives or workarounds to get the application to do what I want it to do?

What I'm worried about is a GET request returning the array before it is sorted but after the item is appended or, even worse, returning the array as it's being sorted.
In the case of your code here, you don't have to worry about that (although read on because you may want to worry about other things!). Node.js is single-threaded so each function will run in its entirety before returning control to the Event Loop. Because all your array manipulation is synchronous, your app will wait until the array manipulation is done before answering a GET request.
One thing to watch out for then, of course, is if (for example) the .sort() takes a long time. If it does, your server will block while that is going on. And this is why people tend to favor asynchronous operations instead. But if your array is guaranteed to be small and/or this is an app with a limited number of users (say, it's an intranet app for a small company), then what you're doing may work just fine.
To get a good understanding of the whole single-threaded + event loop thing, I recommend Philip Roberts's talk on the event loop.

Related

How is Node.js asynchronous AND single-threaded at the same time?

The question title basically says it all, but to rephrase it:What handles the asynchronous function execution, if the main (and only) thread is occupied with running down the main code block?
So far I have only found that the async code gets executed elsewhere or outside the main thread, but what does this mean specifically?
EDIT: The proposed Node.js Event loop question's answers may also address this topic, but I was looking for a less complex, more specific answer, rather then an explanation of Node.js concept. Also, it does not show up in search for anything similar to "node asynchronous single-threaded".
EDIT, #Mr_Thorynque: Running a query to get data from database and log it to console. Nothing gets logged, because Node, being async, does not wait for query to finish and for data to populate. (this is just an example as requested, NOT a part of my question)
var = data;
mysql.query(`SELECT *some rows from database*`, function (err, rows, fields) {
rows.forEach(function(row){
data += *gather the requested data*
});
});
console.log(data);
What it really comes down to is that the node process (which is single threaded) hands off the work to "something else". This could be the OS's I/O process, or a networked resource or whatever. By handing it off, it frees its thread to keep working on the next in-memory thing. It uses file handles to keep track of any pending work and in the event loop marries the two back together and fire the callback when the work is done.
Note that this is also why you can block processing in your own code if poorly designed. If your code runs complex tasks and doesn't hand off the work, you'll block the single thread.
This is about as simple an answer as I can make, I think that the details are well explained in the links in your comments.

Is NodeJS is suitable for websites - (ie. Port Blocking)?

I have gone through many painful months with is issue and I am now ready to let this go to the bin of "what-a-great-lie-for-websites-nodejs-is"!!!
All NodeJS tutorials discuss how to create a website. When done, it works. For one person at a time though. All requests sent to the port will be blocked by the first come-first-serve situation. Why? Because most requests sent to the nodejs server will have to get parsed, data requested from the database, data calculated and parsed, response prepared and sent back to the ajax call. (this is a mere simple website example).
Same applies for authentication - a request is made, data is parsed, authentication takes place, session is created and sent back to the requester.
No matter how you sugar coat this - All requests are done this way. Yes you can employ async functionality which will shorten the time spent on some portions, yes you can try promises, yes you can employ clustering, yes you can employ forking/spawning, etc... The result is always the same at all times: port gets blocked.
I tried responding with a reference so that we can use sockets to pass the data back and matching it with the reference - that also blocked it.
The point of the matter is this: when you ask for help, everyone wants all sort of code examples, but never go to the task of helping with an actual answer that works. The whole wide world!!!!! Which leads me to the answer that nodeJS is not suitable for websites.
I read many requests regarding this and all have been met with: "you must code properly"..! Really? Is there no NodeJS skilled and experienced person who can lend an answer on this one????!!!!!
Most of the NodeJS coders come from the PHP side - All websites using PHP never have to utilise any workaround whatsoever in order to display a web page and it never blocks 2 people at the same time. Thanks to the web server.
So how come NodeJS community cannot come to some sort of an asnwer on this one other than: "code carefully"???!!!
They want examples - each example is met with: "well that is a blocking code, do it another way", or "code carefully"!!! Come one people!!!
Think of the following scenario: One User, One page with 4 lists of records. Theoretically all lists should be filled up with records independently. What is happening because of how data is requested, prepared and responded, each list in reality is waiting for the next one to finish. That is on one session alone.
What about 2 people, 2 sessions at the same time?
So my question is this: is NodeJS suitable for a website and if it is, can anyone show and prove this with a short code??? If you can't prove it, then the answer is: "NodeJS is not suitable for websites!"
Here is an example based on the simplest tutorial and it is still blocking:
var express = require('express'),
fs = require("fs");
var app = express();
app.get('/heavyload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Heavy Load');
res.end();
fs.readFile(file, 'utf8', function (err,fileContent) {
});
});
app.get('/lightload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Light Load');
res.end();
});
app.listen(1337, function () {
console.log('Listening!')
});
Now, if you go to "/heavyload" it will immediately respond because that is the first thing sent to the browser, and then nodejs proceeds reading a heavy file (a large file). If you now go to the second call "/lightload" at the same time, you will see that it is waiting for the loading of the file to finish from the first call before it proceeds with the browser output. This is the simplest example of how nodejs simply fails in handling what otherwise would be simple in php and similar script languages.
Like mentioned before, I tried as many as 20 different ways to do this in my career of nodejs programmer. I totally love nodejs, but I cannot get past this obstacle... This is not a complaint - it is a call for help because I am at my end road with nodejs and I don't know what to do.
I thank you kindly.
So here is what I found out. I will answer it with an example of a blocking code:
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
res.status(200).json({test:'Heavy Load'});
This will block because it has to do the for loop for a long time and then after it finished it will send the output.
Now if you do the same code like this it won't block:
function doStuff(){
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
}
doStuff();
res.status(200).json({test:'Heavy Load'});
Why? Because the functions are run asynchronously...! So how will I then send the resulting response to the requesting client? Currently I am doing it as follows:
Run the doStuff function
Send a unique call reference which is then received by the ajax call on the client side.
Put the callback function of the client side into a waiting object.
Listen on a socket
When the doStuff function is completed, it should issue a socket message with the resulting response together with the unique reference
When the socket on the client side gets the message with the unique reference and the resulting response, it will then match it with the waiting callback function and run it.
Done! A bit of a workaround (as mentioned before), but it's working! It does require a socket to be listening. And that is my solution to this port-blocking situation in NodeJS.
Is there some other way? I am hoping someone answers with another way, because I am still feeling like this is some workaround. Eish! ;)
is NodeJS suitable for a website
Definitely yes.
can anyone show and prove this with a short code
No.
All websites using PHP never have to utilise any workaround whatsoever
in order to display a web page and it never blocks 2 people at the
same time.
Node.js doesn't require any workarounds as well, it simply works without blocking.
To more broadly respond to your question/complaint:
A single node.js machine can easily serve as a web-server for a website and handle multiple sessions (millions actually) without any need for workarounds.
If you're coming from a PHP web-server, maybe instead of trying to migrate an existing code to a new Node website, first play with online simple website example of Node.js + express, if that works well, you can start adding code that require long-running processes like reading from DBs or reading/writing to files and verify that visitors aren't being blocked (they shouldn't be blocked).
See this tutorial on how to get started.
UPDATE FOR EXAMPLE CODE
To fix the supplied example code, you should convert your fs.readFile call to fs.createReadStream. readFile is less recommended for large files, I don't think readFile literally blocked anything, but the need to allocate and move large amounts of bytes may choke the server, createReadStream uses chunks instead which is much easier on the CPU and RAM:
rstream = fs.createReadStream(file);
var length = 0;
rstream.on('data', function (chunk) {
length += chunk.length;
// Do something with the chunk
});
rstream.on('end', function () { // done
console.log('file read! length = ' + length);
});
After switching your code to createReadStream I'm able to serve continues calls to heavyload / lightload in ~3ms each
THE BETTER ANSWER I SHOULD HAVE WRITTEN
Node.js is a single process architecture, but it has multi-process capabilities, using the cluster module, it allows you to write master/workers code that distributes the load across multiple workers on multiple processes.
You can also use pm2 to do the clustering for you, it has a built-in load balancer to distribute to work, and also allows for scaling up/down without downtime, see this.
In your case, while one process is reading/writing a large file, other processes can accept incoming requests and handle them.

Reasonable handling scheme for Node.JS async errors in unusual places

In Java, I am used to try..catch, with finally to cleanup unused resources.
In Node.JS, I don't have that ability.
Odd errors can occur for example the database could shut down at any moment, any single table or file could be missing, etc.
With nested calls to db.query(..., function(err, results){..., it becomes tedious to call if(err) {send500(res); return} every time, especially if I have to cleanup resources, for example db.end() would definitely be appropriate.
How can one write code that makes async catch and finally blocks both be included?
I am already aware of the ability to restart the process, but I would like to use that as a last-resort only.
A full answer to this is pretty in depth, but it's a combination of:
consistently handling the error positional argument in callback functions. Doubling down here should be your first course of action.
You will see #izs refer to this as "boilerplate" because you need a lot of this whether you are doing callbacks or promises or flow control libraries. There is no great way to totally avoid this in node due to the async nature. However, you can minimize it by using things like helper functions, connect middleware, etc. For example, I have a helper callback function I use whenever I make a DB query and intend to send the results back as JSON for an API response. That function knows how to handle errors, not found, and how to send the response, so that reduces my boilerplate substantially.
use process.on('uncaughtExcepton') as per #izs's blog post
use try/catch for the occasional synchronous API that throws exceptions. Rare but some libraries do this.
consider using domains. Domains will get you closer to the java paradigm but so far I don't see that much talk about them which leads me to expect they are not widely adopted yet in the node community.
consider using cluster. While not directly related, it generally goes hand in hand with this type of production robustness.
some libraries have top-level error events. For example, if you are using mongoose to talk to mongodb and the connection suddenly dies, the connection object will emit an error event
Here's an example. The use case is a REST/JSON API backed by a database.
//shared error handling for all your REST GET requests
function normalREST(res, error, result) {
if (error) {
log.error("DB query failed", error);
res.status(500).send(error);
return;
}
if (!result) {
res.status(404).send();
return;
}
res.send(result); //handles arrays or objects OK
}
//Here's a route handler for /users/:id
function getUser(req, res) {
db.User.findById(req.params.id, normalREST.bind(null, res));
}
And I think my takeaway is that overall in JavaScript itself, error handling is basically woefully inadequte. In the browser, you refresh the page and get on with your life. In node, it's worse because you're trying to write a robust and long-lived server process. There is a completely epic issue comment on github that goes into great detail how things are just fundamentally broken. I wouldn't get your hopes up of ever having JavaScript code you can point at and say "Look, Ma, state-of-the-art error handling". That said, in practice if you follow the points I listed above, empirically you can write programs that are robust enough for production.
See also The 4 Keys to 100% Uptime with node.js.

node.js express custom format debug logging

A seemingly simple question, but I am unsure of the node.js equivalent to what I'm used to (say from Python, or LAMP), and I actually think there may not be one.
Problem statement: I want to use basic, simple logging in my express app. Maybe I want to output DEBUG messages, or INFO messages, or just some stats to the log for consumption by other back-end systems later.
1) I want all logs message, however, to contain some fields: remote-ip and request url, for example.
2) On the other hand, code that logs is everywhere in my app, including deep inside the call tree.
3) I don't want to pass (req,res) down into every node in the call tree (this just creates a lot of parameter passing where they are mostly not needed, and complicates my code, as I need to pass these into async callbacks and timeouts etc.)
In other systems, where there is a thread per request, I will store the (req,res) pair (where all the data I need is) in a thread-local-storage, and the logger will read this and format the message.
In node, there is only one thread. What is my alternative here? What's "the request context in which a specific piece of code is running under"?
The only way I can think of achieving something like this is by looking at a trace, and using reflection to look at local variables up the call tree. I hate that, plus would need to implement this for all callbacks, setTimeouts, setIntervals, new Function()'s, eval's, ... and the list goes on.
What are other people doing?

In Node.JS, any way to make Array.prototype.sort() yield to other processes?

FYI: I am doing this already with Web Workers, and it works fine, but I was just exploring what can and can't be done with process.nextTick.
So I have an array of a million elements that I'm sorting in Node.JS. I want Node to be responsive to other requests while it's doing this.
Is there any way to make Array.prototype.sort() not block other processes? Since this is a core function, I can't insert any process.nextTick().
I could implement quicksort manually, but I can't see how you do that efficiently, in a continuation-passing-style, which seems to be required for process.nextTick(). I can modify a for loop to do this, but sort() seems impossible.
While it's not possible to make Array.prototype.sort somehow behave asynchronously itself, asynchronous sorting is definitely possible, as shown by this sorting demo to demonstrate the speed advantage of setImmediate (shim) over setTimeout.
The source code does not seem to come with any license, unfortunately. The Github repo for the demo at https://github.com/jphpsf/setImmediate-shim-demo names Jason Weber as the author. You may want to ask him if you want to use (parts) of the code.
I think that if you use setImmediate (available since Node 0.10) the individual sort operations will be effectively interleaved with I/O callbacks. For such a big amount of work, I would not recommend process.nextTick (if it works at all, because there's a 1000 maxTickDepth limit). See setImmediate vs. nextTick for some backgroud.
Using setImmediate instead of plain "synchronous" processing will certainly be slower overall, so you could choose to handle a batch of individual sort operations per "tick" to speed things up, at the expense of Node not being responsive during that time. I think the right balance between speed and responsiveness wrt I/O can only be found with experimentation.
A much simpler alternative would be to do it more like web workers: Spawn a child process and do the sorting there. Biggest problem you face then is transferring the sorted data back to your main process(to generate some kind of output, presumably). AFAIK there's nothing like transferable objects for Node.js. After having buffered the sorted array, you could stream the results to the child process stdout and parse the data in the main process, or perhaps more simple; use child process messaging.
You may not have a spare cpu core lying around, so the child process would invade some other process cpu time. To avoid the sort process from hurting your other processes, you may need to assign it a low priority. It's seemingly not possible to do this with Node itself, but you could try using nice, as discussed here: https://groups.google.com/forum/#!topic/nodejs/9O-2gLJzmcQ . I have no experience in this matter.
Well, I initially thought you could use async.sortBy, but upon closer examination it seems that won't behave as you need. See Array.sort and Node.js for a similar question, although at the moment there's no accepted answer.
I know this is a rather old question, but I came across a similar situation, with still no simple solution that I could find.
I modified an exising quick sort and published a package that gives up execution to the eventloop periodically here:
https://www.npmjs.com/package/qsort-async
If you are familiar with a traditional quicksort, my only modification was to to the initial function which does the partitioning. Basically the function still modifies the array in place, but now returns a promise. It stops execution for other things in the eventloop if it tries to process too many elements in a single iteration. (I believe the default size I specified was 10000).
Note: Its important to use setImmedate here and not process.nextTick or setTimeout here. nextTick will actually place your execution before process IO and you will still have issues responding to other requests tied to IO. setTimeout is just too slow (which I believe one of the other answers linked a demo for).
Note 2: If something like a mergesort is more your style, you could do the same sort of logic in the 'merge' function of the of the sort.
const immediate = require('util').promisify(setImmediate);
async function quickSort(items, compare, left, right, size) {
let index;
if (items.length > 1) {
index = partition(items, compare, left, right);
if (Math.abs(left - right) > size) {
await immediate();
}
if (left < index - 1) {
await quickSort(items, compare, left, index - 1, size);
}
if (index < right) {
await quickSort(items, compare, index, right, size);
}
}
return items;
}
The full code is here: https://github.com/Mudrekh/qsort-async/blob/master/lib/quicksort.js

Resources