node.js: How to lock/synchronize a block of code? - node.js

Let's take the simple code snippet:
var express = require('express');
var app = express();
var counter = 0;
app.get('/', function (req, res) {
// LOCK
counter++;
// UNLOCK
res.send('hello world')
})
Let's say that app.get(...) is called a huge number of times, and as you can understand I don't want the line counter++ to be executed concurrently by the two different threads.
Therefore, I want to lock this line that only one thread can have access to this line. My question is how to do it in node.js?
I know there is a lock package: https://www.npmjs.com/package/locks, but I'm wondering whether there is a "native" way of doing it without an external library.

I don't want the line counter++ to be executed concurrently by the two different threads
That cannot happen in node.js with just regular Javascript coding.
node.js is single threaded and event-driven, so there's only ever one piece of Javascript code running at a time that can access that variable. You do not have to worry about the typical pre-emptive concurrency issues of multi-threaded systems.
That said, you can still have concurrency issues in node.js if you are using asynchronous code because the node.js asynchronous model returns control back to the system to process the next event and the asynchronous callback gets called on some future event. But, the concurrency issues are non-pre-emptive so you fully control when they can occur.
If you show us your actual code in your app.get() route handler, then we can advise more specifically about whether you do or don't have a concurrency issue there or not. And, if you do, we can advise on how to best deal with that.
Threads in the thread pool are all native code that runs behind the scenes. They only trigger actual Javascript to run by queuing events through the event queue. So, because all Javascript that runs is serialized through the event queue, you only get one piece of Javascript ever running at a time. The basic scheme of the event queue is that the interpreter runs a piece of Javascript until it returns control back to the system. At that point, the interpreter looks in the event queue and if there's an event waiting, it pulls that event out and calls the callback associated with that event. Meanwhile, if there is native code running in the background, when it completes, it adds an event to the event queue. That event is not processed until the current Javascript returns control back to the system and it can then grab the next event out of the event queue. So, it's this event-queue that serializes running only one piece of Javascript at a time.
Edit: Nodejs does now have WorkerThreads which enable separate threads of Javascript, but each thread has its own heap and its own variables so a variable from one thread cannot be directly accessed from another thread. You can configure shared memory that both WorkerThreads can access, but that isn't straight variables, but blocks of memory and if you want to use shared memory, then you do indeed need to code your own synchronization methods to make sure you are atomically accessing the variable. The code you show in your question is not using any of this so the access to the counter variable is already atomic and cannot be simultaneously accessed by any other Javascript, even if you are using WorkerThreads.

If you block thread none of the requests will execute all will be in the queue.
It 's not good practice to block the thread in Node.js
var express = require('express');
var app = express();
var counter = 0;
const getPromise = () => {
return new Promise((resolve) => {
setTimeout(() => {
resolve('Done')
}, 100);
});
}
app.get('/', async (req, res) => {
const localCounter = counter++;
// Use local counter for rest of operation so value won't vary
// LOCK: Use promise/callback
await getPromise(); // Not locked but waiting for getPromise to finish
console.log(localCounter); // Same value before lock
res.send('hello world')
})

Node.js is single-threaded, which means that any single process running your app will not have data races like you anticipate. In fact, a quick inspection of the locks library shows that they use a boolean flag and a system of Array objects to determine whether something is locked or not.
You should only really worry about this if you plan on sharing data with multiple processes. In that case, you could use Alan's lockfile approach from this stackoverflow thread here.

Related

Why does NodeJS spawn parallel threads/process to execute an expansive for loop?

I've been testing some code to see how does NodeJS event loop actually works. So I get in touch with this piece of code:
console.time('Time spending');
let list = [];
for (let index = 0; index < 1000000; index++) {
const data = JSON.stringify({
id: Date.now(),
index,
});
list.push(data);
}
console.log(list);
console.timeEnd('Time spending');
When this code is executed, NodeJS spawns eleven threads on SO (Ubuntu running on WSL 2). But why it does that?
This code is not being declared as an async code.
That's the worker pool. As mentioned in the great guide Don't block the event loop, Node.js has an "event loop" and a worker pool. Those threads you see are the worker pool, and the size is defined with the environment variable UV_THREADPOOL_SIZE, from libuv, which Node.js uses internally. The reason node.js spawns those threads has nothing to do with your expensive loop, it's just the default behavior at startup.
There's extensive documentation on how the event loop works on the official Node.js site, but essentially some operations, like filesystem I/O, are synchronous because the underlying operating system does not offer an asynchronous interface (or it's too new/experimental). Node.js works around that by using a thread pool where the event loop submits a task, like reading a file, which is usually a synchronous job, and goes to the next event while a thread does the dirty work of actually reading the file, it can block the thread, but it does not matter, because the event loop is not blocked. When it's done, it reaches back to the event loop with the data. So, for the event loop (and the programmer), the synchronous read was done asynchronously.
There are no parallel threads being used to run your code. Nodejs runs all your code you show in just one thread. You could just do this:
setTimeout(() => {
console.log("done with timeout");
}, 10 * 60 * 1000);
And, you would see the same number of threads. What you are seeing has nothing to do with your specific code.
The other threads you see are just other threads that nodejs uses for it's own internal purposes such as the worker pool for disk I/O, asynchronous crypto, some other built-in operations and other internal housekeeping operations.
Also, Javascript code marked as async still runs in the one main Javascript thread so your reference that nothing is async wouldn't change things either. It doesn't matter (from a thread point of view) whether code is async or not.
Your big for loop blocks the entire event loop so no other Javascript code or events can run until your for loop finishes. There's no really much to learn about the event loop from this code except that your loop blocks the event loop until the loop completes.

Correct way to run synchronous code in node.js without blocking

I have a websocket server in node.js which allows users to solve a given puzzle.
I also have a code that generates random puzzle for about 20 seconds. In the meantime I still want to handle new connections/disconnects, but this synchronous code blocks the event loop.
Here's the simplified code:
io.on('connection', socket => {
//
});
io.listen(port);
setInterval(function() {
if (game.paused)
game.loadRound();
}, 1000);
loadRound runs about 20 seconds, that blocks all connections and setInterval itself
What would be the correct way to run this code without blocking event loop?
You have three basic choices:
Redesign loadRound() so that it doesn't block the event loop. Since you've shared none of the code for it, we can't advise on the feasibility of that, but if it's doing any I/O, then it does not need to block the event loop. Even if it's all just CPU work, it could be designed to do its job in small chunks to allow the event loop some cycles, but often that's more work to redesign it that way than options 2 and 3 below.
Move loadRound() to a worker thread (new in node.js) and communicate the result back via messaging.
Move loadRound() to a separate node.js process using the child_process module and communicate the result back via any number of means (stdio, messaging, etc...).

node.js performance optimization and single threaded architecture

I'm running a Node.js app with express and want to start increasing its performance. Several routes are defined. Let's have an basic example:
app.get('/users', function (req, res) {
User.find({}).exec(function(err, users) {
res.json(users);
}
});
Let's assume we have 3 clients A, B and C, who try to use this route. Their requests arrive on the server in the order A, B, C with 1 millisecond difference in between.
1. If I understand the node.js architecture correctly, every request will be immediately handled, because Users.find() is asynchronous and there is non-blocking code?
Let's expand this example with a synchronous call:
app.get('/users', function (req, res) {
var parameters = getUserParameters();
User.find({parameters}).exec(function(err, users) {
res.json(users);
}
});
Same requests, same order. getUserParameters() takes 50 milliseconds to complete.
2. A will enter the route callback-function and blocks the node.js thread for 50 milliseconds. B and C won't be able to enter the function and have to wait. When A finishes getUsersParameters() it will continue with the asynchronous User.find() function and B will now enter the route callback-function. C will still have to wait for 50 more milliseconds. When B enters the asynchronous function, C's requests can be finally handled. Taken together: C has to wait 50 milliseconds for A to finish, 50 milliseconds for B to finish and 50 milliseconds for itself to finish (for simplicity, we ignore the waiting time for the asynchronous function)?
Assuming now, that we have one more route, which is only accessible by an admin and will be called every minute through crontab.
app.get('/users', function (req, res) {
User.find({}).exec(function(err, users) {
res.json(users);
}
});
app.get('/admin-route', function (req, res) {
blockingFunction(); // this function takes 2 seconds to complete
});
3. When a request X hits admin-route and blockingFunction() is called, will A,B and C, who will call /users right after X's request have to wait 2 seconds until they even enter the route callback-function?
4. Should we make every self defined function, even if it takes only 4 milliseconds, as an asynchronous function with a callback?
The answer is "Yes", on #3: blocking means blocking the event loop, meaning that any I/O (like handling an HTTP request) will be blocked. In this case, the app will seem unresponsive for those 2 seconds.
However, you have to do pretty wild things for synchronous code to take 2 seconds (either very heavy calculations, or using a lot of the *Sync() methods provided by modules like fs). If you really can't make that code asynchronous, you should consider running it in a separate process.
Regarding #4: if you can easily make it asynchronous, you probably should. However, just having your synchronous function accept a callback doesn't miraculously make it asynchronous. It depends on what the function does if, and how, you can make it async.
The ground principle is anything locking up the CPU (long-running for loops for instance) or anything using I/O or the network must be asynchronous. You could also consider moving out CPU-intensive logic out of node JS, perhaps into a Java/Python module which exposes a WebService which node JS can call.
As an aside, take a look at this module (might not be production-ready). It introduces the concept of multithreading in NodeJS: https://www.npmjs.com/package/webworker-threads
#3 Yes
#4 Node.js is for async programming and hence its good to follow this approach to avoid surprises in performance
Meanwhile, you can use cluster module of Node.js to improve performance and throughput of your app.
You may need to scale your application vertically first. Check out Node.js cluster module. You may utilize all the cores of the machine by spawning up workers on each core. A cluster is a pool of similar workers running under a parent Node process. Workers are spawned using the fork() method of the child_processes module. This means workers can share server handles and use inter-process communication to communicate with the parent Node process.
var cluster = require('cluster')
var http = require('http')
var os = require('os')
var numCPUs = os.cpus().length
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork()
}
} else {
// Define express routes and listen here
}

Node.js vs Async/await in .net

Can someone explain/ redirect me, what is the difference between Node.js's async model(non blocking thread) vs any other language for example c#'s asynchronous way of handling the I/O. This looks to me that both are same model. Kindly suggest.
Both models are very similar. There are two primary differences, one of which is going away soon (for some definition of "soon").
One difference is that Node.js is asynchronously single-threaded, while ASP.NET is asynchronously multi-threaded. This means the Node.js code can make some simplifying assumptions, because all your code always runs on the same exact thread. So when your ASP.NET code awaits, it could possibly resume on a different thread, and it's up to you to avoid things like thread-local state.
However, this same difference is also a strength for ASP.NET, because it means async ASP.NET can scale out-of-the-box up to the full capabilities of your sever. If you consider, say, an 8-core machine, then ASP.NET can process (the synchronous portions of) 8 requests simultaneously. If you put Node.js on a souped-up server, then it's common to actually run 8 separate instances of Node.js and add something like nginx or a simple custom load balancer that handles routing requests for that server. This also means that if you want other resources shared server-wide (e.g., cache), then you'll need to move them out-of-proc as well.
The other major difference is actually a difference in language, not platform. JavaScript's asynchronous support is limited to callbacks and promises, and even if you use the best libraries, you'll still end up with really awkward code when you do anything non-trivial. In contrast, the async/await support in C#/VB allow you to write very natural asynchronous code (and more importantly, maintainable asynchronous code).
However, the language difference is going away. The next revision of JavaScript will introduce generators, which (along with a helper library) will make asynchronous code in Node.js just as natural as it is today using async/await. If you want to play with the "coming soon" stuff now, generators were added in V8 3.19, which was rolled into Node.js 0.11.2 (the Unstable branch). Pass --harmony or --harmony-generators to explicitly enable the generator support.
The difference between Node.js's async model and C#'s async/await model is huge. The async model that has Node.js is similar to the old async model in C# and .Net called Event-based Asynchronous Pattern (EAP). C# and .Net has 3 async models, you can read about them at Asynchronous Programming Patterns. The most modern async model in C# is Task-based with C#'s async and await keywords, you can read about it at Task-based Asynchronous Pattern.
The C#'s async/await keywords make asynchronous code linear and let you avoid "Callback Hell" much better than in any of other programming languages. You need just try it, and after that you will never do it in other way. You just write code consuming asynchronous operations and don't worry about readability because it looks like you write any other code.
Please, watch this videos:
Async programming deep dive
Async in ASP.NET
Understanding async and Awaitable Tasks
And please, try to do something asynchronous in both C# and then Node.js to compare. You will see the difference.
EDIT:
Since Node.js V8 JavaScript engine supports generators, defined in ECMAScript 6 Draft, "Callback Hell" in JavaScript code also can be easily avoided. It brings some form of async/await to life in JavaScript
With nodejs, all requests go in the event queue. Node's event loop uses a single thread to process items in the event queue, doing all non-IO work, and sending to C++ threadpool (using javascript callbacks to manage asynchrony) all IO-bound work. The C++ threads then add to the event queue its results.
The differences with ASP.NET (the two first apply pretty much to all web servers that allow async IO) is that :
ASP.NET uses a different thread for each incoming requests, so you get an overhead of context switching
.NET doesn't force you to use async to do IO-bound work, so it isn't as idiomatic as nodejs where IO-bound api calls are de facto async (with callbacks)
.NET' "await-async" add's a step at compile time to add "callbacks", so you can write linear code (no callback function passing), in contrast with nodejs
There are so much places on the web that describe node's architecture, but here's something : http://johanndutoit.net/presentations/2013/02/gdg-capetown-nodejs-workshop-23-feb-2013/index.html#1
The difference between async in Nodejs and .NET is in using preemptive multitasking for user code.
.NET uses preemptive multitasking for user code, and Nodejs does not.
Nodejs uses an internal thread pool for serving IO requests, and a single thread for executing your JS code, including IO callbacks.
One of the consequences of using preemptive multitasking (.NET) is that a shared state can be altered by another stack of execution while executing a stack. That is not the case in Nodejs - no callback from an async operation can run simultaneously with currently executing stack. Another stacks of execution just do not exist in Javascript. A result of an async operation would be available to the callbacks only when current stack of execution exits completely. Having that, simple while(true); hangs Nodejs, because in this case current stack does not exit and the next loop is never initiated.
To understand the difference consider the two examples, one for js an one for net.
var p = new Promise(function(resolve) { setTimeout(resolve, 500, "my content"); });
p.then(function(value) { // ... value === "my content"
In this code, you can safely put a handler (then) after you "started" an async operation, because you can be sure, that no callback code that is initiated by an async operation would ever execute until the entire current call stack exits. The callbacks are handled in next cycles. As for the timer callbacks, they are treated the same. Async timer event justs puts callback processing on queue to be processed in a following cycle.
In .NET it's different. There are no cycles. There is preemptive multitasking.
ThreadPool.QueueUserWorkItem((o)=>{eventSource.Fire();});
eventSource.Fired += ()=>{
// the following line might never execute, because a parallel execution stack in a thread pool could have already been finished by the time the callback added.
Console.WriteLine("1");
}
Here is a Hello World .NET a-la Nodejs code to demonstrate async processing on single thread and using a thread pool for async IO, just like node does.
(.NET includes TPL and IAsyncResult versions of async IO operations, but there's no difference for the purposes of this example. Anyway everything ends up with different threads on a thread pool.)
void Main()
{
// Initializing the test
var filePath = Path.GetTempFileName();
var filePath2 = Path.GetTempFileName();
File.WriteAllText(filePath, "World");
File.WriteAllText(filePath2, "Antipodes");
// Simulate nodejs
var loop = new Loop();
// Initial method code, similar to server.js in Nodejs.
var fs = new FileSystem();
fs.ReadTextFile(loop, filePath, contents=>{
fs.WriteTextFile(loop, filePath, string.Format("Hello, {0}!", contents),
()=>fs.ReadTextFile(loop,filePath,Console.WriteLine));
});
fs.ReadTextFile(loop, filePath2, contents=>{
fs.WriteTextFile(loop, filePath2, string.Format("Hello, {0}!", contents),
()=>fs.ReadTextFile(loop,filePath2,Console.WriteLine));
});
// The first javascript-ish cycle have finished.
// End of a-la nodejs code, but execution have just started.
// First IO operations could have finished already, but not processed by callbacks yet
// Process callbacks
loop.Process();
// Cleanup test
File.Delete(filePath);
File.Delete(filePath2);
}
public class FileSystem
{
public void ReadTextFile(Loop loop, string fileName, Action<string> callback)
{
loop.RegisterOperation();
// simulate async operation with a blocking call on another thread for demo purposes only.
ThreadPool.QueueUserWorkItem(o=>{
Thread.Sleep(new Random().Next(1,100)); // simulate long read time
var contents = File.ReadAllText(fileName);
loop.MakeCallback(()=>{callback(contents);});
});
}
public void WriteTextFile(Loop loop, string fileName, string contents, Action callback)
{
loop.RegisterOperation();
// simulate async operation with a blocking call on another thread for demo purposes only.
ThreadPool.QueueUserWorkItem(o=>{
Thread.Sleep(new Random().Next(1,100)); // simulate long write time
File.WriteAllText(fileName, contents);
loop.MakeCallback(()=>{callback();});
});
}
}
public class Loop
{
public void RegisterOperation()
{
Interlocked.Increment(ref Count);
}
public void MakeCallback(Action clientAction)
{
lock(sync)
{
ActionQueue.Enqueue(()=>{clientAction(); Interlocked.Decrement(ref Count);});
}
}
public void Process()
{
while(Count > 0)
{
Action action = null;
lock(sync)
{
if(ActionQueue.Count > 0)
{
action = ActionQueue.Dequeue();
}
}
if( action!= null )
{
action();
}
else
{
Thread.Sleep(10); // simple way to relax a little bit.
}
}
}
private object sync = new object();
private Int32 Count;
private Queue<Action> ActionQueue = new Queue<Action>();
}

NodeJs how to create a non-blocking computation

I am trying to get my head around creating a non-blocking piece of heavy computation in nodejs. Take this example (stripped out of other stuff):
http.createServer(function(req, res) {
console.log(req.url);
sleep(10000);
res.end('Hello World');
}).listen(8080, function() { console.log("ready"); });
As you can imagine, if I open 2 browser windows at the same time, the first will wait 10 seconds and the other will wait 20, as expected. So, armed with the knowledge that a callback is somehow asynchronous I removed the sleep and put this instead:
doHeavyStuff(function() {
res.end('Hello World');
});
with the function simply defined:
function doHeavyStuff(callback) {
sleep(10000);
callback();
}
that of course does not work... I have also tried to define an EventEmitter and register to it, but the main function of the Emitter has the sleep inside before emitting 'done', for example, so again everything will run block.
I am wondering here how other people wrote non-blocking code... for example the mongojs module, or the child_process.exec are non blocking, which means that somewhere down in the code either they fork a process on another thread and listen to its events. How can I replicate this in a metod that for example has a long process going?
Am I completely misunderstanding the nodejs paradigm? :/
Thanks!
Update: solution (sort of)
Thanks for the answer to Linus, indeed the only way is to spawn a child process, like for example another node script:
http.createServer(function(req, res) {
console.log(req.url);
var child = exec('node calculate.js', function (err, strout, strerr) {
console.log("fatto");
res.end(strout);
});
}).listen(8080, function() { console.log("ready"); });
The calculate.js can take its time to do what it needs and return. In this way, multiple requests will be run in parallel so to speak.
You can't do that directly, without using some of the IO modules in node (such as fs or net). If you need to do a long-running computation, I suggest you do that in a child process (e.g. child_process.fork) or with a queue.
We (Microsoft) just released napajs that can work with Node.js to enable multithreading JavaScript scenarios in the same process.
your code will then look like:
var napa = require('napajs');
// One-time setup.
// You can change number of workers per your requirement.
var zone = napa.zone.create('request-worker-pool', { workers: 4 });
http.createServer(function(req, res) {
console.log(req.url);
zone.execute((request) => {
var result = null;
// Do heavy computation to get result from request
// ...
return result;
}, [req]).then((result) => {
res.end(result.value);
}
}).listen(8080, function() { console.log("ready"); });
You can read this post for more details.
This is a classic misunderstanding of how the event loop is working.
This isn't something that is unique to node - if you have a long running computation in a browser, it will also block. The way to do this is to break the computation up into small chunks that yield execution to the event loop, allowing the JS environment to interleave with other competing calls, but there is only ever one thing happening at one time.
The setImmediate demo may be instructive, which you can find here.
If you computation can be split into chunks, you could schedule executor to poll for data every N seconds then after M seconds run again. Or spawn dedicated child for that task alone, so that the main thread wouldn't block.
Although this is an old post(8 years ago), try to add some new updates to it.
For Nodejs application to get good performance, the first priority is never blocking the event loop. The sleep(10000) method breaks this rule. This is also the reason why Node.js is not suitable for the CPU intensive application. Since the big CPU computation occurs on the event loop thread(it's also the main and single thread of node.js)and will block it.
Multithread programming work_threads was introduced into node.js ecosystem since version 12. Compared with multi-process programming, it's lightweight and has less overhead.
Although multithread was introduced into node.js, but Node.js is still based on the event driven model and async non-block IO. That's node.js's DNA.

Resources