Node js event loop: Safe for-looping and Python difference - node.js

While working on a project for my Express app, I wrote a recursive method that retrieves data from some nested JSON object. Roughly, the method looks like:
# The depth of the fields is up to 3-4 levels, so no stack overflow danger.
_recursiveFindFieldName: function(someJSONStruct, nestedFieldList) {
if (nestedFieldList.length === 0) {
return someJSONStruct;
}
fields = someJSONStruct['fields'];
for (var i=0; i < fields.length; i++) {
subField = fields[i];
if (subField['fieldName'] === nestedFieldList[0]) {
return this._recursiveFindFieldName(subField, nestedFieldList.splice(1));
}
}
return null;
Now, I call this method one my callbacks, by stating
data = _recursiveFindFieldName(someJSON, fieldPathList);. However, a friend who reviewed my code noted that this method, being recursive and iterative over potentially large JSON struct, may block the event loop and prevent Express from serving other requests.
While it does make sense, I am not sure if I should be ever concerned about CPU-synchronous tasks (as opposed to I/O). At least intuitively it does not look very simple.
I have tried to use this source to understand better how the event loop works, and was really surprised to see that the following code crashes my local node REPL.
for (var i = 0; i < 10000000; i++) {
console.log('hi:', i);
}
What I am not sure is why it happens, as opposed to Python (that runs single-thread as well, and easily handles the task of printing), and whether it's relevant to my case, which does not involve I/O operations.

First, measure the performance of your existing code: it probably isn't a bottleneck to begin with.
If the supposed bottleneck is actually valid, you can create an asynchronous Node C++ add-on, which can process the entire JSON blob via uv_queue_work() in a separate thread, outside of the JavaScript event loop, and then return the entire result via back to JavaScript, using a promise.
Is this supported performance bottleneck big enough of a concern to warrant this? Probably not.
As for your console.log() question: in Node, sometimes stdio is synchronous, and sometimes it's not: see this discussion. If you are on a POSIX system, it's synchronous, and you are writing enough data to fill up the pipe and block the event loop, which is all getting jammed in there before the next event tick. I'm not sure of the specifics on why that causes a crash, but hopefully this is a start to answering your question.

Related

NodeJS: Reading and writing to a shared object from multiple promises

I'm trying to find out if there could be an issue when accessing an object from multiple promises, eg:
let obj = {test: 0}
let promisesArr = []
for (let n = 0; n < 10; n++) {
promisesArr.push(promiseFunc(obj))
}
Promise.all(promisesArr)
// Then the promise would be something like this
function promiseFunc(obj) {
return new Promise(async (resolve, reject) => {
// read from the shared object
let read = obj.test
// write/modify the shared object
obj.test++
// Do some async op with the read data
await asyncFunc(read)
// resolves and gets called again later
})
}
From what I can see/tested there would not be an issue, it would seem like even though processing is asynchronous, there is no race condition. But maybe I could be missing something.
The only issue that I can see is writing to the object and then doing some I/O op and then read expecting what was written before to still be there
I'm not modifying the object after other async operations only at the start, but there are several promises doing the same. Once they resolve they get called again and the cycle starts over.
Race conditions in Javascript with multiple asynchronous operations depend entirely upon the application logic of what exactly your doing.
Since you don't show any real code in that regard, we can't really say whether you have a race condition liability here or not.
There is no generic answer to your question.
Yes, there can be race conditions among multiple asynchronous operations accessing the same data.
OR
The code can be written appropriately such that no race condition occurs.
It all depends upon the exact and real code and what it is doing. There is no generic answer. I can show you code with two promise-based asynchronous operations that absolutely causes a race condition and I can show you code with two promise-based asynchronous operations that does not cause a race condition. So, susceptibility to race conditions depends on the precise code and what it is doing and precisely how it is written.
Pure and simple access to a value in a shared object does not by itself cause a race condition because the main thread in Javascript is single threaded and non-interruptible so any single synchronous Javascript statement is thread-safe by itself. What matters is what you do with that data and how that code is written.
Here's an example of something that is susceptible to race conditions if there are other operations that can also change shareObj.someProperty:
let localValue = shareObj.someProperty;
let data = await doSomeAsyncOperation();
shareObj.someProperty = localValue + data.someProperty;
Whereas, this not cause a race condition:
let data = await doSomeAsyncOperation();
shareObj.someProperty = shareObj.someProperty += data.someProperty;
The second is not causing its own race condition because it is atomically updating the shared data. Whereas the first was getting it, storing it locally, then waiting for an asynchronous operation to complete which is an opportunity for other code to modify the shared variable without this local function knowing about it.
FYI, this is very similar to classic database issues. If you get a value from a database (which is always an asynchronous operation in nodejs), then increment it, then write the value back, that's subject to race conditions because others can be trying to read/modify the value at the same time and you can stomp on each other's changes. So, instead, you have to use an atomic operation built into the database to update the variable in one atomic operation.
For your own variables in Javascript, the problem is a little bit simpler than the generic database issue because reading and writing to a Javascript variable is atomic. You just make to make sure you don't hold onto a value that you will later modify and write back across an asynchronous operation.

Patterns for asynchronous but sequential requests

I have been writing a lot of NodeJS recently and that has forced me to attack some problems from a different perspective. I was wondering what patterns had developed for the problem of processing chunks of data sequentially (rather than in parallel) in an asynchronous request-environment, but I haven't been able to find anything directly relevant.
So to summarize the problem:
I have a list of data stored in an array format that I need to process.
I have to send this data to a service asynchronously, but the service will only accept a few at a time.
The data must be processed sequentially to meet the restrictions on the service, meaning making a number of parallel asynchronous requests is not allowed
Working in this domain, the simplest pattern I've come up with is a recursive one. Something like
function processData(data, start, step, callback){
if(start < data.length){
var chunk = data.split(start, step);
queryService(chunk, start, step, function(e, d){
//Assume no errors
//Could possibly do some matching between d and 'data' here to
//Update data with anything that the service may have returned
processData(data, start+step, step, callback);
});
}
else{
callback(data);
}
}
Conceptually, this should step through each item, but it's intuitively complex. I feel like there should be a simpler way of doing this. Does anyone have a pattern they tend to follow when approaching this kind of problem?
My first thought process would be to rely on object encapsulation. Create an object that contains all of the information about what needs to be processed and all of the relevant data about what has been processed and is being processed and the callback function will just call the 'next' function for the object, which will in turn start processing on the next piece of data and update the object. Essentially working like a n asynchronous for-loop.

In NodeJS: is it possible for two callbacks to be executed exactly at the same time?

Let's say I have this code:
function fn(n)
{
return function()
{
for(var k = 0; k <= 1000; ++k) {
fs.writeSync(process.stdout.fd, n+"\n");
}
}
}
setTimeout(fn(1), 100);
setTimeout(fn(2), 100);
Is it possible that 1 and 2 will be printed to stdout interchangeably (e.g. 12121212121...)?
I've tested this and they did NOT apper interchangeably, i.e. 1111111...222222222..., but few tests are far from proof and I'm worried that something like 111111211111...2222222... could happen.
In other words: when I register some callbacks and event handlers in Node can two callbacks be executed exactly at the same time?
(I know this could be possible with launching two processes, but then we would have two stdout and the above code would be splitted into separate files, etc.)
Another question: Forgetting the Node and speaking generally: in any language on single process is it possible for two functions to be executed at exactly the same time (i.e. in the same manner as above)?
No, every callback will be executed in its own "execution frame". In other languages "parallel execution" and potential conflicts as locks caused by that are possible if operations occur in different threads.
As long as the callback code is purely sync than no two functions can execute parallel.
Start using some asynchornish things inside, like getting a network result or inserting to a database, tadam: you will have parallelism issues.

How can I handle a callback synchrnously in Node.js?

I'm using Redis to generate IDs for my in memory stored models. The Redis client requires a callback to the INCR command, which means the code looks like
client.incr('foo', function(err, id) {
... continue on here
});
The problem is, that I already have written the other part of the app, that expects the incr call to be synchronous and just return the ID, so that I can use it like
var id = client.incr('foo');
The reason why I got to this problem is that up until now, I was generating the IDs just in memory with a simple closure counter function, like
var counter = (function() {
var count = 0;
return function() {
return ++count;
}
})();
to simplify the testing and just general setup.
Does this mean that my app is flawed by design and I need to rewrite it to expect callback on generating IDs? Or is there any simple way to just synchronize the call?
Node.js in its essence is an async I/O library (with plugins). So, by definition, there's no synchronous I/O there and you should rewrite your app.
It is a bit of a pain, but what you have to do is wrap the logic that you had after the counter was generated into a function, and call that from the Redis callback. If you had something like this:
var id = get_synchronous_id();
processIdSomehow(id);
you'll need to do something like this.
var runIdLogic = function(id){
processIdSomehow(id);
}
client.incr('foo', function(err, id) {
runIdLogic(id);
});
You'll need the appropriate error checking, but something like that should work for you.
There are a couple of sequential programming layers for Node (such as TameJS) that might help with what you want, but those generally do recompilation or things like that: you'll have to decide how comfortable you are with that if you want to use them.
#Sergio said this briefly in his answer, but I wanted to write a little more of an expanded answer. node.js is an asynchronous design. It runs in a single thread, which means that in order to remain fast and handle many concurrent operations, all blocking calls must have a callback for their return value to run them asynchronously.
That does not mean that synchronous calls are not possible. They are, and its a concern for how you trust 3rd party plugins. If someone decides to write a call in their plugin that does block, you are at the mercy of that call, where it might even be something that is internal and not exposed in their API. Thus, it can block your entire app. Consider what might happen if Redis took a significant amount of time to return, and then multiple that by the amount of clients that could potentially be accessing that same routine. The entire logic has been serialized and they all wait.
In answer to your last question, you should not work towards accommodating a blocking approach. It may seems like a simple solution now, but its counter-intuitive to the benefits of node.js in the first place. If you are only more comfortable in a synchronous design workflow, you may want to consider another framework that is designed that way (with threads). If you want to stick with node.js, rewrite your existing logic to conform to a callback style. From the code examples I have seen, it tends to look like a nested set of functions, as callback uses callback, etc, until it can return from that recursive stack.
The application state in node.js is normally passed around as an object. What I would do is closer to:
var state = {}
client.incr('foo', function(err, id) {
state.id = id;
doSomethingWithId(state.id);
});
function doSomethingWithId(id) {
// reuse state if necessary
}
It's just a different way of doing things.

Locking on an object?

I'm very new to Node.js and I'm sure there's an easy answer to this, I just can't find it :(
I'm using the filesystem to hold 'packages' (folders with a status extensions 'mypackage.idle') Users can perform actions on these which would cause the status to go to something like 'qa', or 'deploying' etc... If the server is accepting lots of requests and multiple requests come in for the same package how would I check the status and then perform an action, which would change the status, guaranteeing that another request didn't alter it before/during the action took place?
so in c# something like this
lock (someLock) { checkStatus(); performAction(); }
Thanks :)
If checkStatus() and performAction() are synchronous functions called one after another, then as others mentioned earlier: their exectution will run uninterupted till completion.
However, I suspect that in reality both of these functions are asynchoronous, and the realistic case of composing them is something like:
function checkStatus(callback){
doSomeIOStuff(function(something){
callback(something == ok);
});
}
checkStatus(function(status){
if(status == true){
performAction();
}
});
The above code is subject to race conditions, as when doSomeIOStuff is being perfomed instead of waiting for it new request can be served.
You may want to check https://www.npmjs.com/package/rwlock library.
This is a bit misleading. There are many script languages that are suppose to be single threaded, but when sharing data from the same source this creates a problem. NodeJs might be single threaded when you are running a single request, but when you have multiple requests trying to access the same data, it just behaves as it creates kind of the same problem as if you were running a multithreaded language.
There is already an answer about this here : Locking on an object?
WATCH sentinel_key
GET value_of_interest
if (value_of_interest = FULL)
MULTI
SET sentinel_key = foo
EXEC
if (EXEC returned 1, i.e. succeeded)
do_something();
else
do_nothing();
else
UNWATCH
One thing you can do is lock on an external object, for instance, a sequence in a database such as Oracle or Redis.
http://redis.io/commands
For example, I am using cluster with node.js (I have 4 cores) and I have a node.js function and each time I run through it, I increment a variable. I basically need to lock on that variable so no two threads use the same value of that variable.
check this out How to create a distributed lock with Redis?
and this https://engineering.gosquared.com/distributed-locks-using-redis
I think you can run with this idea if you know what you are doing.
If you are making asynchronous calls with callbacks, this means multiple clients could potentially make the same, or related requests, and receive responses in different orders. This is definitely a case where locking is useful. You won't be 'locking a thread' in the traditional sense, but merely ensuring asynchronous calls, and their callbacks are made in a predictable order. The async-lock package looks like it handles this scenario.
https://www.npmjs.com/package/async-lock
warning, node.js change semantic if you add a log entry beucause logging is IO bound.
if you change from
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
to
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
console.log("my log stuff");
qa_action_performed = true
perform_action()
}
}
more than one thread can execute perform_action().
You don't have to worry about synchronization with Node.js since it's single threaded with an event loop. This is one of the advantage of the architecture that Node.js use.
Nothing will be executed between checkStatus() and performAction().
There are no locks in node.js -- because you shouldn't need them. There's only one thread (the event loop) and your code is never interrupted unless you perform an asynchronous action like I/O. Hence your code should never block. You can't do any parallel code execution.
That said, your code could look something like this:
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
Between check_status() and perform_action() no other thread can interrupt because there is no I/O. As soon as you enter the if clause and set qa_action_performed = true, no other code will enter the if block and hence perform_action() is never executed twice, even if perform_action() takes time performing I/O.

Resources