NodeJS: Reading and writing to a shared object from multiple promises - node.js

I'm trying to find out if there could be an issue when accessing an object from multiple promises, eg:
let obj = {test: 0}
let promisesArr = []
for (let n = 0; n < 10; n++) {
promisesArr.push(promiseFunc(obj))
}
Promise.all(promisesArr)
// Then the promise would be something like this
function promiseFunc(obj) {
return new Promise(async (resolve, reject) => {
// read from the shared object
let read = obj.test
// write/modify the shared object
obj.test++
// Do some async op with the read data
await asyncFunc(read)
// resolves and gets called again later
})
}
From what I can see/tested there would not be an issue, it would seem like even though processing is asynchronous, there is no race condition. But maybe I could be missing something.
The only issue that I can see is writing to the object and then doing some I/O op and then read expecting what was written before to still be there
I'm not modifying the object after other async operations only at the start, but there are several promises doing the same. Once they resolve they get called again and the cycle starts over.

Race conditions in Javascript with multiple asynchronous operations depend entirely upon the application logic of what exactly your doing.
Since you don't show any real code in that regard, we can't really say whether you have a race condition liability here or not.
There is no generic answer to your question.
Yes, there can be race conditions among multiple asynchronous operations accessing the same data.
OR
The code can be written appropriately such that no race condition occurs.
It all depends upon the exact and real code and what it is doing. There is no generic answer. I can show you code with two promise-based asynchronous operations that absolutely causes a race condition and I can show you code with two promise-based asynchronous operations that does not cause a race condition. So, susceptibility to race conditions depends on the precise code and what it is doing and precisely how it is written.
Pure and simple access to a value in a shared object does not by itself cause a race condition because the main thread in Javascript is single threaded and non-interruptible so any single synchronous Javascript statement is thread-safe by itself. What matters is what you do with that data and how that code is written.
Here's an example of something that is susceptible to race conditions if there are other operations that can also change shareObj.someProperty:
let localValue = shareObj.someProperty;
let data = await doSomeAsyncOperation();
shareObj.someProperty = localValue + data.someProperty;
Whereas, this not cause a race condition:
let data = await doSomeAsyncOperation();
shareObj.someProperty = shareObj.someProperty += data.someProperty;
The second is not causing its own race condition because it is atomically updating the shared data. Whereas the first was getting it, storing it locally, then waiting for an asynchronous operation to complete which is an opportunity for other code to modify the shared variable without this local function knowing about it.
FYI, this is very similar to classic database issues. If you get a value from a database (which is always an asynchronous operation in nodejs), then increment it, then write the value back, that's subject to race conditions because others can be trying to read/modify the value at the same time and you can stomp on each other's changes. So, instead, you have to use an atomic operation built into the database to update the variable in one atomic operation.
For your own variables in Javascript, the problem is a little bit simpler than the generic database issue because reading and writing to a Javascript variable is atomic. You just make to make sure you don't hold onto a value that you will later modify and write back across an asynchronous operation.

Related

If I use `.push();` to crate a new empty Firebase realtime DB datum, do I have to resolve the promise before writing to it?

I inherited some NodeJs code which repeatedly uses .push(); to create a new empty array (node/leaf/whatever the FB terminology is - but, it's an array/list).
Immediately afterwards the code loops and starts writing entries to it.
On the one had I am nervous because it looks like a code smell and the quality of the code is noticeably amateurish even to me - and I am primarily an embedded C++ guy.
The code properly treats reads as async and waits for the promise to resolve:
admin
.database()
.ref(url)
.once("value", (snapshot) => { ...
but, the pushes do not wait.
My big fear is that the request to write an array member could arrive at FB before the array has been created, and timing bugs like that are the absolute worst to debug.
On the other hand, this is done so consistently that the original coder may know something that I do not.
Does the code have a problem?
Seeing the actual code that you're asking about would be useful, but my guess is that you have something like:
newRef = rootRef.push();
newRef.set('some value');
That first line does not actually write anything to the database yet. In fact, it is a purely client-side operation that just creates a new unique ID and returns a reference to that. So there's no need to await the call in that first line as it is synchronous.
The second line does perform a write operation to the database, is asynchronous, and thus you'll need to await it if you want the next operation to start after the write completes.
The two lines could be combined into:
rootRef.push('some value');
Now the push() operation does perform the write to the database and if you want to perform another operation after the write, you would have to await it.

Node js event loop: Safe for-looping and Python difference

While working on a project for my Express app, I wrote a recursive method that retrieves data from some nested JSON object. Roughly, the method looks like:
# The depth of the fields is up to 3-4 levels, so no stack overflow danger.
_recursiveFindFieldName: function(someJSONStruct, nestedFieldList) {
if (nestedFieldList.length === 0) {
return someJSONStruct;
}
fields = someJSONStruct['fields'];
for (var i=0; i < fields.length; i++) {
subField = fields[i];
if (subField['fieldName'] === nestedFieldList[0]) {
return this._recursiveFindFieldName(subField, nestedFieldList.splice(1));
}
}
return null;
Now, I call this method one my callbacks, by stating
data = _recursiveFindFieldName(someJSON, fieldPathList);. However, a friend who reviewed my code noted that this method, being recursive and iterative over potentially large JSON struct, may block the event loop and prevent Express from serving other requests.
While it does make sense, I am not sure if I should be ever concerned about CPU-synchronous tasks (as opposed to I/O). At least intuitively it does not look very simple.
I have tried to use this source to understand better how the event loop works, and was really surprised to see that the following code crashes my local node REPL.
for (var i = 0; i < 10000000; i++) {
console.log('hi:', i);
}
What I am not sure is why it happens, as opposed to Python (that runs single-thread as well, and easily handles the task of printing), and whether it's relevant to my case, which does not involve I/O operations.
First, measure the performance of your existing code: it probably isn't a bottleneck to begin with.
If the supposed bottleneck is actually valid, you can create an asynchronous Node C++ add-on, which can process the entire JSON blob via uv_queue_work() in a separate thread, outside of the JavaScript event loop, and then return the entire result via back to JavaScript, using a promise.
Is this supported performance bottleneck big enough of a concern to warrant this? Probably not.
As for your console.log() question: in Node, sometimes stdio is synchronous, and sometimes it's not: see this discussion. If you are on a POSIX system, it's synchronous, and you are writing enough data to fill up the pipe and block the event loop, which is all getting jammed in there before the next event tick. I'm not sure of the specifics on why that causes a crash, but hopefully this is a start to answering your question.

Can we have race conditions in a single-thread program?

You can find on here a very good explanation about what is a race condition.
I have seen recently many people making confusing statements about race conditions and threads.
I have learned that race conditions could only occur between threads. But I saw code that looked like race conditions, in event and asynchronous based languages, even if the program was single thread, like in Node.js, in GTK+, etc.
Can we have a race condition in a single thread program?
All examples are in a fictional language very close to Javascript.
Short:
A race condition can only occur between two or more threads / external state (one of them can be the OS). We cannot have race conditions inside a single thread process, non I/O doing program.
But a single thread program can in many cases :
give situations which looks similar to race conditions, like in event based program with an event loop, but are not real race conditions
trigger a race condition between or with other thread(s), for example, or because the execution of some parts of the program depends on external state :
other programs, like clients
library threads or servers
system clock
I) Race conditions can only occur with two or more threads
A race condition can only occur when two or more threads try to access a shared resource without knowing it is modified at the same time by unknown instructions from the other thread(s). This gives an undetermined result. (This is really important.)
A single thread process is nothing more than a sequence of known instructions which therefore results in a determined result, even if the execution order of instructions is not easy to read in the code.
II) But we are not safe
II.1) Situations similar to race conditions
Many programming languages implements asynchronous programming features through events or signals, handled by a main loop or event loop which check for the event queue and trigger the listeners. Example of this are Javascript, libuevent, reactPHP, GNOME GLib... Sometimes, we can find situations which seems to be race conditions, but they are not.
The way the event loop is called is always known, so the result is determined, even if the execution order of instructions is not easy to read (or even cannot be read if we do not know the library).
Example:
setTimeout(
function() { console.log("EVENT LOOP CALLED"); },
1
); // We want to print EVENT LOOP CALLED after 1 milliseconds
var now = new Date();
while(new Date() - now < 10) //We do something during 10 milliseconds
console.log("EVENT LOOP NOT CALLED");
in Javascript output is always (you can test in node.js) :
EVENT LOOP NOT CALLED
EVENT LOOP CALLED
because, the event loop is called when the stack is empty (all functions have returned).
Be aware that this is just an example and that in languages that implements events in a different way, the result might be different, but it would still be determined by the implementation.
II.2) Race condition between other threads, for example :
II.2.i) With other programs like clients
If other processes are requesting our process, that our program do not treat requests in an atomic way, and that our process share some resources between the requests, there might be a race condition between clients.
Example:
var step;
on('requestOpen')(
function() {
step = 0;
}
);
on('requestData')(
function() {
step = step + 1;
}
);
on('requestEnd')(
function() {
step = step +1; //step should be 2 after that
sendResponse(step);
}
);
Here, we have a classical race condition setup. If a request is opened just before another ends, step will be reset to 0. If two requestData events are triggered before the requestEnd because of two concurrent requests, step will reach 3. But this is because we take the sequence of events as undetermined. We expect that the result of a program is most of the time undetermined with an undetermined input.
In fact, if our program is single thread, given a sequence of events the result is still always determined. The race condition is between clients.
There is two ways to understand the thing :
We can consider clients as part of our program (why not ?) and in this case, our program is multi thread. End of the story.
More commonly we can consider that clients are not part of our program. In this case they are just input. And when we consider if a program has a determined result or not, we do that with input given. Otherwise even the simplest program return input; would have a undetermined result.
Note that :
if our process treat request in an atomic way, it is the same as if there was a mutex between client, and there is no race condition.
if we can identify request and attach the variable to a request object which is the same at every step of the request, there is no shared resource between clients and no race condition
II.2.ii) With library thread(s)
In our programs, we often use libraries which spawn other processes or threads, or that just do I/O with other processes (and I/O is always undetermined).
Example :
databaseClient.sendRequest('add Me to the database');
databaseClient.sendRequest('remove Me from the database');
This can trigger a race condition in an asynchronous library. This is the case if sendRequest() returns after having sent the request to the database, but before the request is really executed. We immediately send another request and we cannot know if the first will be executed before the second is evaluated, because database works on another thread. There is a race condition between the program and the database process.
But, if the database was on the same thread as the program (which in real life does not happen often) is would be impossible that sendRequest returns before the request is processed. (Unless the request is queued, but in this case, the result is still determined as we know exactly how and when the queue is read.)
II.2.i) With system clock
#mingwei-samuel answer gives an example of a race condition with a single thread JS program, between to setTimeout callback. Actually, once both setTimeout are called, the execution order is already determined. This order depends on the system clock state (so, an external thread) at the time of setTimeout call.
Conclusion
In short, single-thread programs are not free from trigerring race conditions. But they can only occur with or between other threads of external programs. The result of our program might be undetermined, because the input our program receive from those other programs is undetermined.
Race conditions can occur with any system that has concurrently executing processes that create state changes in external processes, examples of which include :
multithreading,
event loops,
multiprocessing,
instruction level parallelism where out-of-order execution of instructions has to take care to avoid race conditions,
circuit design,
dating (romance),
real races in e.g. the olympic games.
Yes.
A "race condition" is a situation when the result of a program can change depending on the order operations are run (threads, async tasks, individual instructions, etc).
For example, in Javascript:
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM812:1 Hello
// VM812:2 World
setTimeout(() => console.log("Hello"), 10);
setTimeout(() => setTimeout(() => console.log("World"), 4), 4);
// VM815:2 World
// VM815:1 Hello
So clearly this code depends on how the JS event loop works, how tasks are ordered/chosen, what other events occurred during execution, and even how your operating system chose to schedule the JS runtime process.
This is contrived, but a real program could have a situation where "Hello" needs to be run before "World", which could result in some nasty non-deterministic bugs. How people could consider this not a "real" race condition, I'm not sure.
Data Races
It is not possible to have data races in single threaded code.
A "data race" is multiple threads accessing a shared resource at the same time in an inconstant way, or specifically for memory: multiple threads accessing the same memory, where one (or more) is writing. Of course, with a single thread this is not possible.
This seems to be what #jillro's answer is talking about.
Note: the exact definitions of "race condition" and "data race" are not agreed upon. But if it looks like a race condition, acts like a race condition, and causes nasty non-deterministic bugs like a race condition, then I think it should be called a race condition.

How can I handle a callback synchrnously in Node.js?

I'm using Redis to generate IDs for my in memory stored models. The Redis client requires a callback to the INCR command, which means the code looks like
client.incr('foo', function(err, id) {
... continue on here
});
The problem is, that I already have written the other part of the app, that expects the incr call to be synchronous and just return the ID, so that I can use it like
var id = client.incr('foo');
The reason why I got to this problem is that up until now, I was generating the IDs just in memory with a simple closure counter function, like
var counter = (function() {
var count = 0;
return function() {
return ++count;
}
})();
to simplify the testing and just general setup.
Does this mean that my app is flawed by design and I need to rewrite it to expect callback on generating IDs? Or is there any simple way to just synchronize the call?
Node.js in its essence is an async I/O library (with plugins). So, by definition, there's no synchronous I/O there and you should rewrite your app.
It is a bit of a pain, but what you have to do is wrap the logic that you had after the counter was generated into a function, and call that from the Redis callback. If you had something like this:
var id = get_synchronous_id();
processIdSomehow(id);
you'll need to do something like this.
var runIdLogic = function(id){
processIdSomehow(id);
}
client.incr('foo', function(err, id) {
runIdLogic(id);
});
You'll need the appropriate error checking, but something like that should work for you.
There are a couple of sequential programming layers for Node (such as TameJS) that might help with what you want, but those generally do recompilation or things like that: you'll have to decide how comfortable you are with that if you want to use them.
#Sergio said this briefly in his answer, but I wanted to write a little more of an expanded answer. node.js is an asynchronous design. It runs in a single thread, which means that in order to remain fast and handle many concurrent operations, all blocking calls must have a callback for their return value to run them asynchronously.
That does not mean that synchronous calls are not possible. They are, and its a concern for how you trust 3rd party plugins. If someone decides to write a call in their plugin that does block, you are at the mercy of that call, where it might even be something that is internal and not exposed in their API. Thus, it can block your entire app. Consider what might happen if Redis took a significant amount of time to return, and then multiple that by the amount of clients that could potentially be accessing that same routine. The entire logic has been serialized and they all wait.
In answer to your last question, you should not work towards accommodating a blocking approach. It may seems like a simple solution now, but its counter-intuitive to the benefits of node.js in the first place. If you are only more comfortable in a synchronous design workflow, you may want to consider another framework that is designed that way (with threads). If you want to stick with node.js, rewrite your existing logic to conform to a callback style. From the code examples I have seen, it tends to look like a nested set of functions, as callback uses callback, etc, until it can return from that recursive stack.
The application state in node.js is normally passed around as an object. What I would do is closer to:
var state = {}
client.incr('foo', function(err, id) {
state.id = id;
doSomethingWithId(state.id);
});
function doSomethingWithId(id) {
// reuse state if necessary
}
It's just a different way of doing things.

Locking on an object?

I'm very new to Node.js and I'm sure there's an easy answer to this, I just can't find it :(
I'm using the filesystem to hold 'packages' (folders with a status extensions 'mypackage.idle') Users can perform actions on these which would cause the status to go to something like 'qa', or 'deploying' etc... If the server is accepting lots of requests and multiple requests come in for the same package how would I check the status and then perform an action, which would change the status, guaranteeing that another request didn't alter it before/during the action took place?
so in c# something like this
lock (someLock) { checkStatus(); performAction(); }
Thanks :)
If checkStatus() and performAction() are synchronous functions called one after another, then as others mentioned earlier: their exectution will run uninterupted till completion.
However, I suspect that in reality both of these functions are asynchoronous, and the realistic case of composing them is something like:
function checkStatus(callback){
doSomeIOStuff(function(something){
callback(something == ok);
});
}
checkStatus(function(status){
if(status == true){
performAction();
}
});
The above code is subject to race conditions, as when doSomeIOStuff is being perfomed instead of waiting for it new request can be served.
You may want to check https://www.npmjs.com/package/rwlock library.
This is a bit misleading. There are many script languages that are suppose to be single threaded, but when sharing data from the same source this creates a problem. NodeJs might be single threaded when you are running a single request, but when you have multiple requests trying to access the same data, it just behaves as it creates kind of the same problem as if you were running a multithreaded language.
There is already an answer about this here : Locking on an object?
WATCH sentinel_key
GET value_of_interest
if (value_of_interest = FULL)
MULTI
SET sentinel_key = foo
EXEC
if (EXEC returned 1, i.e. succeeded)
do_something();
else
do_nothing();
else
UNWATCH
One thing you can do is lock on an external object, for instance, a sequence in a database such as Oracle or Redis.
http://redis.io/commands
For example, I am using cluster with node.js (I have 4 cores) and I have a node.js function and each time I run through it, I increment a variable. I basically need to lock on that variable so no two threads use the same value of that variable.
check this out How to create a distributed lock with Redis?
and this https://engineering.gosquared.com/distributed-locks-using-redis
I think you can run with this idea if you know what you are doing.
If you are making asynchronous calls with callbacks, this means multiple clients could potentially make the same, or related requests, and receive responses in different orders. This is definitely a case where locking is useful. You won't be 'locking a thread' in the traditional sense, but merely ensuring asynchronous calls, and their callbacks are made in a predictable order. The async-lock package looks like it handles this scenario.
https://www.npmjs.com/package/async-lock
warning, node.js change semantic if you add a log entry beucause logging is IO bound.
if you change from
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
to
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
console.log("my log stuff");
qa_action_performed = true
perform_action()
}
}
more than one thread can execute perform_action().
You don't have to worry about synchronization with Node.js since it's single threaded with an event loop. This is one of the advantage of the architecture that Node.js use.
Nothing will be executed between checkStatus() and performAction().
There are no locks in node.js -- because you shouldn't need them. There's only one thread (the event loop) and your code is never interrupted unless you perform an asynchronous action like I/O. Hence your code should never block. You can't do any parallel code execution.
That said, your code could look something like this:
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
Between check_status() and perform_action() no other thread can interrupt because there is no I/O. As soon as you enter the if clause and set qa_action_performed = true, no other code will enter the if block and hence perform_action() is never executed twice, even if perform_action() takes time performing I/O.

Resources