For some user requests, I need to do some heavy computations (taking about 100ms of time); and, of course, I do not want to perform these in node's main event loop so that it will prevent other requests from serving.
The most obvious, but definitely not the cleanest way to solve this problem would be to offload the computations to another program and wait for the results asynchronously.
Is there any way to do it without leaving the node process (and implementing inter-process communications)? E.g. something like this:
var compute = function (input) {
var i,
result = input;
for (i = 0; i < 1000000; i++) {
result = md5(result);
}
}
var controller = function (req, res) {
offload(compute, req.params.input, function(result) {
res.send(result);
});
}
I found a compute-cluster module, which seems to be designed exactly for my goals. It spawns children processes, and the communication is made with IPC (no sockets wasted).
The JavaScript language has no support for threads and node.js provides no interface to them either so you'll need to rely on a library to provide threading functionality. See the node-fibers and node-threads-a-go-go libraries for possible solutions.
If you're looking for a way to make IPC a little easier in node.js then consider using the zeromq library (node.js bindings). This somewhat-related-answer might be useful as well.
Related
While working on a project for my Express app, I wrote a recursive method that retrieves data from some nested JSON object. Roughly, the method looks like:
# The depth of the fields is up to 3-4 levels, so no stack overflow danger.
_recursiveFindFieldName: function(someJSONStruct, nestedFieldList) {
if (nestedFieldList.length === 0) {
return someJSONStruct;
}
fields = someJSONStruct['fields'];
for (var i=0; i < fields.length; i++) {
subField = fields[i];
if (subField['fieldName'] === nestedFieldList[0]) {
return this._recursiveFindFieldName(subField, nestedFieldList.splice(1));
}
}
return null;
Now, I call this method one my callbacks, by stating
data = _recursiveFindFieldName(someJSON, fieldPathList);. However, a friend who reviewed my code noted that this method, being recursive and iterative over potentially large JSON struct, may block the event loop and prevent Express from serving other requests.
While it does make sense, I am not sure if I should be ever concerned about CPU-synchronous tasks (as opposed to I/O). At least intuitively it does not look very simple.
I have tried to use this source to understand better how the event loop works, and was really surprised to see that the following code crashes my local node REPL.
for (var i = 0; i < 10000000; i++) {
console.log('hi:', i);
}
What I am not sure is why it happens, as opposed to Python (that runs single-thread as well, and easily handles the task of printing), and whether it's relevant to my case, which does not involve I/O operations.
First, measure the performance of your existing code: it probably isn't a bottleneck to begin with.
If the supposed bottleneck is actually valid, you can create an asynchronous Node C++ add-on, which can process the entire JSON blob via uv_queue_work() in a separate thread, outside of the JavaScript event loop, and then return the entire result via back to JavaScript, using a promise.
Is this supported performance bottleneck big enough of a concern to warrant this? Probably not.
As for your console.log() question: in Node, sometimes stdio is synchronous, and sometimes it's not: see this discussion. If you are on a POSIX system, it's synchronous, and you are writing enough data to fill up the pipe and block the event loop, which is all getting jammed in there before the next event tick. I'm not sure of the specifics on why that causes a crash, but hopefully this is a start to answering your question.
This code works because system-sleep blocks execution of the main thread but does not block callbacks. However, I am concerned that system-sleep is not 100% portable because it relies on the deasync npm module which relies on C++.
Are there any alternatives to system-sleep?
var sleep = require('system-sleep')
var done = false
setTimeout(function() {
done = true
}, 1000)
while (!done) {
sleep(100) // without this line the while loop causes problems because it is a spin wait
console.log('sleeping')
}
console.log('If this is displayed then it works!')
PS Ideally, I want a solution that works on Node 4+ but anything is better than nothing.
PPS I know that sleeping is not best practice but I don't care. I'm tired of arguments against sleeping.
Collecting my comments into an answer per your request:
Well, deasync (which sleep() depends on) uses quite a hack. It is a native code node.js add-on that manually runs the event loop from C++ code in order to do what it is doing. Only someone who really knows the internals of node.js (now and in the future) could imagine what the issues are with doing that. What you are asking for is not possible in regular Javascript code without hacking the node.js native code because it's simply counter to the way Javascript was designed to run in node.js.
Understood and thanks. I am trying to write a more reliable deasync (which fails on some platforms) module that doesn't use a hack. Obviously this approach I've given is not the answer. I want it to support Node 4. I'm thinking of using yield / async combined with babel now but I'm not sure that's what I'm after either. I need something that will wait until the callback is callback is resolved and then return the value from the async callback.
All Babel does with async/await is write regular promise.then() code for you. async/await are syntax conveniences. They don't really do anything that you can't write yourself using promises, .then(), .catch() and in some cases Promise.all(). So, yes, if you want to write async/await style code for node 4, then you can use Babel to transpile your code to something that will run on node 4. You can look at the transpiled Babel code when using async/await and you will just find regular promise.then() code.
There is no deasync solution that isn't a hack of the engine because the engine was not designed to do what deasync does.
Javascript in node.js was designed to run one Javascript event at a time and that code runs until it returns control back to the system where the system will then pull the next event from the event queue and run its callback. Your Javascript is single threaded with no pre-emptive interruptions by design.
Without some sort of hack of the JS engine, you can't suspend or sleep one piece of Javascript and then run other events. It simply wasn't designed to do that.
var one = 0;
function delay(){
return new Promise((resolve, reject) => {
setTimeout(function(){
resolve('resolved')
}, 2000);
})
}
while (one == 0) {
one = 1;
async function f1(){
var x = await delay();
if(x == 'resolved'){
x = '';
one = 0;
console.log('resolved');
//all other handlers go here...
//all of the program that you want to be affected by sleep()
f1();
}
}
f1();
}
I am trying out simple NodeJS app so that I could to understand the async nature.
But my problem is as soon as I hit "/home" from browser it waits for response and simultaneously when "/" is hit, it waits for the "/home" 's response first and then responds to "/" request.
My concern is that if one of the request needs heavy processing, in parallel we can't request another one? Is this correct?
app.get("/", function(request, response) {
console.log("/ invoked");
response.writeHead(200, {'Content-Type' : 'text/plain'});
response.write('Logged in! Welcome!');
response.end();
});
app.get("/home", function(request, response) {
console.log("/home invoked");
var obj = {
"fname" : "Dead",
"lname" : "Pool"
}
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
response.writeHead(200, {'Content-Type' : 'application/json'});
response.write(JSON.stringify(obj));
response.end();
});
Good question,
Now, although Node.js has it's asynchronous nature, this piece of code:
for (var i = 0; i < 999999999; i++) {
for (var i = 0; i < 2; i++) {
// BS
};
};
Is not asynchronous actually blocking the node main thread. And therefore, all other requests has to wait until this big for loop will end.
In order to do some heavy calculations in parallel I recommend using setTimeout or setInterval to achieve your goal:
var i=0;
var interval = setInterval(function() {
if(i++>=999999999){
clearInterval(interval);
}
//do stuff here
},5);
For more information I recommend searching for "Node.js event loop"
As Stasel, stated, code running like will block the event loop. Basically whenever javascript is running on the server, nothing else is running. Asynchronous I/O events such as disk I/O might be processing in the background, but their handler/callback won't be call unless your synchronous code has finished running. Basically as soon as it's finished, node will check for pending events to be handled and call their handlers respectively.
You actually have couple of choices to fix this problem.
Break the work in pieces and let the pending events be executed in between. This is almost same as Stasel's recommendation, except 5ms between a single iteration is huge. For something like 999999999 items, that takes forever. Firstly I suggest batch process the loop for about sometime, then schedule next batch process with setimmediate. setimmediate basically will schedule it after the pending I/O events are handled, so if there is not new I/O event to be handled(like no new http requests) then it will executed immediately. It's fast enough. Now the question comes that how much processing should we do for each batch/iteration. I suggest first measure how much does it on average manually, and for schedule about 50ms of work. For example if you have realized 1000 items take 100ms. Then let it process 500 items, so it will be 50ms. You can break it down further, but the more broken down, the more time it takes in total. So be careful. Also since you are processing huge amount of items, try not to make too much garbage, so the garbage collector won't block it much. In this not-so-similar question, I've explained how to insert 10000 documents into MongoDB without blocking the event loop.
Use threads. There are actually a couple nice thread implementations that you won't shoot yourself in foot with them. This is really a good idea for this case, if you are looking for performance for huge processings, since it would be tricky as I said above to implement CPU bound task playing nice with other stuff happening in the same process, asynchronous events are perfect for data-bound task not CPU bound tasks. There's nodejs-threads-a-gogo module you can use. You can also use node-webworker-threads which is built on threads-a-gogo, but with webworker API. There's also nPool, which is a bit more nice looking but less popular. They all support thread pools and should be straight forward to implement a work queue.
Make several processes instead of threads. This might be slower than threads, but for huge stuff still way better than iterating in the main process. There's are different ways. Using processes will bring you a design that you can extend it to using multiple machines instead of just using multiple CPUs. You can either use a job-queue(basically pull the next from the queue whenever finished a task to process), a multi process map-reduce or AWS elastic map reduce, or using nodejs cluster module. Using cluster module you can listen to unix domain socket on each worker and for each job just make a request to that socket. Whenever the worker finished processing the job, it will just write back to that particular request. You can search about this stuff, there are many implementations and modules existing already. You can use 0MQ, rabbitMQ, node built-in ipc, unix domain sockets or a redis queue for multi process communications.
Can someone explain/ redirect me, what is the difference between Node.js's async model(non blocking thread) vs any other language for example c#'s asynchronous way of handling the I/O. This looks to me that both are same model. Kindly suggest.
Both models are very similar. There are two primary differences, one of which is going away soon (for some definition of "soon").
One difference is that Node.js is asynchronously single-threaded, while ASP.NET is asynchronously multi-threaded. This means the Node.js code can make some simplifying assumptions, because all your code always runs on the same exact thread. So when your ASP.NET code awaits, it could possibly resume on a different thread, and it's up to you to avoid things like thread-local state.
However, this same difference is also a strength for ASP.NET, because it means async ASP.NET can scale out-of-the-box up to the full capabilities of your sever. If you consider, say, an 8-core machine, then ASP.NET can process (the synchronous portions of) 8 requests simultaneously. If you put Node.js on a souped-up server, then it's common to actually run 8 separate instances of Node.js and add something like nginx or a simple custom load balancer that handles routing requests for that server. This also means that if you want other resources shared server-wide (e.g., cache), then you'll need to move them out-of-proc as well.
The other major difference is actually a difference in language, not platform. JavaScript's asynchronous support is limited to callbacks and promises, and even if you use the best libraries, you'll still end up with really awkward code when you do anything non-trivial. In contrast, the async/await support in C#/VB allow you to write very natural asynchronous code (and more importantly, maintainable asynchronous code).
However, the language difference is going away. The next revision of JavaScript will introduce generators, which (along with a helper library) will make asynchronous code in Node.js just as natural as it is today using async/await. If you want to play with the "coming soon" stuff now, generators were added in V8 3.19, which was rolled into Node.js 0.11.2 (the Unstable branch). Pass --harmony or --harmony-generators to explicitly enable the generator support.
The difference between Node.js's async model and C#'s async/await model is huge. The async model that has Node.js is similar to the old async model in C# and .Net called Event-based Asynchronous Pattern (EAP). C# and .Net has 3 async models, you can read about them at Asynchronous Programming Patterns. The most modern async model in C# is Task-based with C#'s async and await keywords, you can read about it at Task-based Asynchronous Pattern.
The C#'s async/await keywords make asynchronous code linear and let you avoid "Callback Hell" much better than in any of other programming languages. You need just try it, and after that you will never do it in other way. You just write code consuming asynchronous operations and don't worry about readability because it looks like you write any other code.
Please, watch this videos:
Async programming deep dive
Async in ASP.NET
Understanding async and Awaitable Tasks
And please, try to do something asynchronous in both C# and then Node.js to compare. You will see the difference.
EDIT:
Since Node.js V8 JavaScript engine supports generators, defined in ECMAScript 6 Draft, "Callback Hell" in JavaScript code also can be easily avoided. It brings some form of async/await to life in JavaScript
With nodejs, all requests go in the event queue. Node's event loop uses a single thread to process items in the event queue, doing all non-IO work, and sending to C++ threadpool (using javascript callbacks to manage asynchrony) all IO-bound work. The C++ threads then add to the event queue its results.
The differences with ASP.NET (the two first apply pretty much to all web servers that allow async IO) is that :
ASP.NET uses a different thread for each incoming requests, so you get an overhead of context switching
.NET doesn't force you to use async to do IO-bound work, so it isn't as idiomatic as nodejs where IO-bound api calls are de facto async (with callbacks)
.NET' "await-async" add's a step at compile time to add "callbacks", so you can write linear code (no callback function passing), in contrast with nodejs
There are so much places on the web that describe node's architecture, but here's something : http://johanndutoit.net/presentations/2013/02/gdg-capetown-nodejs-workshop-23-feb-2013/index.html#1
The difference between async in Nodejs and .NET is in using preemptive multitasking for user code.
.NET uses preemptive multitasking for user code, and Nodejs does not.
Nodejs uses an internal thread pool for serving IO requests, and a single thread for executing your JS code, including IO callbacks.
One of the consequences of using preemptive multitasking (.NET) is that a shared state can be altered by another stack of execution while executing a stack. That is not the case in Nodejs - no callback from an async operation can run simultaneously with currently executing stack. Another stacks of execution just do not exist in Javascript. A result of an async operation would be available to the callbacks only when current stack of execution exits completely. Having that, simple while(true); hangs Nodejs, because in this case current stack does not exit and the next loop is never initiated.
To understand the difference consider the two examples, one for js an one for net.
var p = new Promise(function(resolve) { setTimeout(resolve, 500, "my content"); });
p.then(function(value) { // ... value === "my content"
In this code, you can safely put a handler (then) after you "started" an async operation, because you can be sure, that no callback code that is initiated by an async operation would ever execute until the entire current call stack exits. The callbacks are handled in next cycles. As for the timer callbacks, they are treated the same. Async timer event justs puts callback processing on queue to be processed in a following cycle.
In .NET it's different. There are no cycles. There is preemptive multitasking.
ThreadPool.QueueUserWorkItem((o)=>{eventSource.Fire();});
eventSource.Fired += ()=>{
// the following line might never execute, because a parallel execution stack in a thread pool could have already been finished by the time the callback added.
Console.WriteLine("1");
}
Here is a Hello World .NET a-la Nodejs code to demonstrate async processing on single thread and using a thread pool for async IO, just like node does.
(.NET includes TPL and IAsyncResult versions of async IO operations, but there's no difference for the purposes of this example. Anyway everything ends up with different threads on a thread pool.)
void Main()
{
// Initializing the test
var filePath = Path.GetTempFileName();
var filePath2 = Path.GetTempFileName();
File.WriteAllText(filePath, "World");
File.WriteAllText(filePath2, "Antipodes");
// Simulate nodejs
var loop = new Loop();
// Initial method code, similar to server.js in Nodejs.
var fs = new FileSystem();
fs.ReadTextFile(loop, filePath, contents=>{
fs.WriteTextFile(loop, filePath, string.Format("Hello, {0}!", contents),
()=>fs.ReadTextFile(loop,filePath,Console.WriteLine));
});
fs.ReadTextFile(loop, filePath2, contents=>{
fs.WriteTextFile(loop, filePath2, string.Format("Hello, {0}!", contents),
()=>fs.ReadTextFile(loop,filePath2,Console.WriteLine));
});
// The first javascript-ish cycle have finished.
// End of a-la nodejs code, but execution have just started.
// First IO operations could have finished already, but not processed by callbacks yet
// Process callbacks
loop.Process();
// Cleanup test
File.Delete(filePath);
File.Delete(filePath2);
}
public class FileSystem
{
public void ReadTextFile(Loop loop, string fileName, Action<string> callback)
{
loop.RegisterOperation();
// simulate async operation with a blocking call on another thread for demo purposes only.
ThreadPool.QueueUserWorkItem(o=>{
Thread.Sleep(new Random().Next(1,100)); // simulate long read time
var contents = File.ReadAllText(fileName);
loop.MakeCallback(()=>{callback(contents);});
});
}
public void WriteTextFile(Loop loop, string fileName, string contents, Action callback)
{
loop.RegisterOperation();
// simulate async operation with a blocking call on another thread for demo purposes only.
ThreadPool.QueueUserWorkItem(o=>{
Thread.Sleep(new Random().Next(1,100)); // simulate long write time
File.WriteAllText(fileName, contents);
loop.MakeCallback(()=>{callback();});
});
}
}
public class Loop
{
public void RegisterOperation()
{
Interlocked.Increment(ref Count);
}
public void MakeCallback(Action clientAction)
{
lock(sync)
{
ActionQueue.Enqueue(()=>{clientAction(); Interlocked.Decrement(ref Count);});
}
}
public void Process()
{
while(Count > 0)
{
Action action = null;
lock(sync)
{
if(ActionQueue.Count > 0)
{
action = ActionQueue.Dequeue();
}
}
if( action!= null )
{
action();
}
else
{
Thread.Sleep(10); // simple way to relax a little bit.
}
}
}
private object sync = new object();
private Int32 Count;
private Queue<Action> ActionQueue = new Queue<Action>();
}
I'm using Redis to generate IDs for my in memory stored models. The Redis client requires a callback to the INCR command, which means the code looks like
client.incr('foo', function(err, id) {
... continue on here
});
The problem is, that I already have written the other part of the app, that expects the incr call to be synchronous and just return the ID, so that I can use it like
var id = client.incr('foo');
The reason why I got to this problem is that up until now, I was generating the IDs just in memory with a simple closure counter function, like
var counter = (function() {
var count = 0;
return function() {
return ++count;
}
})();
to simplify the testing and just general setup.
Does this mean that my app is flawed by design and I need to rewrite it to expect callback on generating IDs? Or is there any simple way to just synchronize the call?
Node.js in its essence is an async I/O library (with plugins). So, by definition, there's no synchronous I/O there and you should rewrite your app.
It is a bit of a pain, but what you have to do is wrap the logic that you had after the counter was generated into a function, and call that from the Redis callback. If you had something like this:
var id = get_synchronous_id();
processIdSomehow(id);
you'll need to do something like this.
var runIdLogic = function(id){
processIdSomehow(id);
}
client.incr('foo', function(err, id) {
runIdLogic(id);
});
You'll need the appropriate error checking, but something like that should work for you.
There are a couple of sequential programming layers for Node (such as TameJS) that might help with what you want, but those generally do recompilation or things like that: you'll have to decide how comfortable you are with that if you want to use them.
#Sergio said this briefly in his answer, but I wanted to write a little more of an expanded answer. node.js is an asynchronous design. It runs in a single thread, which means that in order to remain fast and handle many concurrent operations, all blocking calls must have a callback for their return value to run them asynchronously.
That does not mean that synchronous calls are not possible. They are, and its a concern for how you trust 3rd party plugins. If someone decides to write a call in their plugin that does block, you are at the mercy of that call, where it might even be something that is internal and not exposed in their API. Thus, it can block your entire app. Consider what might happen if Redis took a significant amount of time to return, and then multiple that by the amount of clients that could potentially be accessing that same routine. The entire logic has been serialized and they all wait.
In answer to your last question, you should not work towards accommodating a blocking approach. It may seems like a simple solution now, but its counter-intuitive to the benefits of node.js in the first place. If you are only more comfortable in a synchronous design workflow, you may want to consider another framework that is designed that way (with threads). If you want to stick with node.js, rewrite your existing logic to conform to a callback style. From the code examples I have seen, it tends to look like a nested set of functions, as callback uses callback, etc, until it can return from that recursive stack.
The application state in node.js is normally passed around as an object. What I would do is closer to:
var state = {}
client.incr('foo', function(err, id) {
state.id = id;
doSomethingWithId(state.id);
});
function doSomethingWithId(id) {
// reuse state if necessary
}
It's just a different way of doing things.