I'm using Redis to generate IDs for my in memory stored models. The Redis client requires a callback to the INCR command, which means the code looks like
client.incr('foo', function(err, id) {
... continue on here
});
The problem is, that I already have written the other part of the app, that expects the incr call to be synchronous and just return the ID, so that I can use it like
var id = client.incr('foo');
The reason why I got to this problem is that up until now, I was generating the IDs just in memory with a simple closure counter function, like
var counter = (function() {
var count = 0;
return function() {
return ++count;
}
})();
to simplify the testing and just general setup.
Does this mean that my app is flawed by design and I need to rewrite it to expect callback on generating IDs? Or is there any simple way to just synchronize the call?
Node.js in its essence is an async I/O library (with plugins). So, by definition, there's no synchronous I/O there and you should rewrite your app.
It is a bit of a pain, but what you have to do is wrap the logic that you had after the counter was generated into a function, and call that from the Redis callback. If you had something like this:
var id = get_synchronous_id();
processIdSomehow(id);
you'll need to do something like this.
var runIdLogic = function(id){
processIdSomehow(id);
}
client.incr('foo', function(err, id) {
runIdLogic(id);
});
You'll need the appropriate error checking, but something like that should work for you.
There are a couple of sequential programming layers for Node (such as TameJS) that might help with what you want, but those generally do recompilation or things like that: you'll have to decide how comfortable you are with that if you want to use them.
#Sergio said this briefly in his answer, but I wanted to write a little more of an expanded answer. node.js is an asynchronous design. It runs in a single thread, which means that in order to remain fast and handle many concurrent operations, all blocking calls must have a callback for their return value to run them asynchronously.
That does not mean that synchronous calls are not possible. They are, and its a concern for how you trust 3rd party plugins. If someone decides to write a call in their plugin that does block, you are at the mercy of that call, where it might even be something that is internal and not exposed in their API. Thus, it can block your entire app. Consider what might happen if Redis took a significant amount of time to return, and then multiple that by the amount of clients that could potentially be accessing that same routine. The entire logic has been serialized and they all wait.
In answer to your last question, you should not work towards accommodating a blocking approach. It may seems like a simple solution now, but its counter-intuitive to the benefits of node.js in the first place. If you are only more comfortable in a synchronous design workflow, you may want to consider another framework that is designed that way (with threads). If you want to stick with node.js, rewrite your existing logic to conform to a callback style. From the code examples I have seen, it tends to look like a nested set of functions, as callback uses callback, etc, until it can return from that recursive stack.
The application state in node.js is normally passed around as an object. What I would do is closer to:
var state = {}
client.incr('foo', function(err, id) {
state.id = id;
doSomethingWithId(state.id);
});
function doSomethingWithId(id) {
// reuse state if necessary
}
It's just a different way of doing things.
Related
We have a project where we need to process ~5,000 objects and each object takes 200-500 milliseconds each to process. A developer on my team suggested using promises to try to process each object concurrently. So basically something like this:
let result = await Promise.all(objects.map(o => process(o));
The process() code might look like this:
async process(theObject) {
return new Promise(resolve => {
1 + 1 = 2;
resolve();
});
}
While it seems like a fair pattern, it seems like an anti-pattern, or a code smell. There also seems to be something about how Node/V8 handles promises that might introduce major issues later. Anyone have any thoughts on this pattern and whether it might be use-ful/less?
One caveat of using Promise.all() is how it handles errors. From the MDN:
It rejects with the reason of the first promise that rejects.
So if a single processing error of the ~5000 objects stops the entire process is okay, then it seems like a decent tool. I would recommend setting up a queue to both separate out the processing from the orchestration of the messages as well as provide scalability advantages.
While working on a project for my Express app, I wrote a recursive method that retrieves data from some nested JSON object. Roughly, the method looks like:
# The depth of the fields is up to 3-4 levels, so no stack overflow danger.
_recursiveFindFieldName: function(someJSONStruct, nestedFieldList) {
if (nestedFieldList.length === 0) {
return someJSONStruct;
}
fields = someJSONStruct['fields'];
for (var i=0; i < fields.length; i++) {
subField = fields[i];
if (subField['fieldName'] === nestedFieldList[0]) {
return this._recursiveFindFieldName(subField, nestedFieldList.splice(1));
}
}
return null;
Now, I call this method one my callbacks, by stating
data = _recursiveFindFieldName(someJSON, fieldPathList);. However, a friend who reviewed my code noted that this method, being recursive and iterative over potentially large JSON struct, may block the event loop and prevent Express from serving other requests.
While it does make sense, I am not sure if I should be ever concerned about CPU-synchronous tasks (as opposed to I/O). At least intuitively it does not look very simple.
I have tried to use this source to understand better how the event loop works, and was really surprised to see that the following code crashes my local node REPL.
for (var i = 0; i < 10000000; i++) {
console.log('hi:', i);
}
What I am not sure is why it happens, as opposed to Python (that runs single-thread as well, and easily handles the task of printing), and whether it's relevant to my case, which does not involve I/O operations.
First, measure the performance of your existing code: it probably isn't a bottleneck to begin with.
If the supposed bottleneck is actually valid, you can create an asynchronous Node C++ add-on, which can process the entire JSON blob via uv_queue_work() in a separate thread, outside of the JavaScript event loop, and then return the entire result via back to JavaScript, using a promise.
Is this supported performance bottleneck big enough of a concern to warrant this? Probably not.
As for your console.log() question: in Node, sometimes stdio is synchronous, and sometimes it's not: see this discussion. If you are on a POSIX system, it's synchronous, and you are writing enough data to fill up the pipe and block the event loop, which is all getting jammed in there before the next event tick. I'm not sure of the specifics on why that causes a crash, but hopefully this is a start to answering your question.
I have been writing a lot of NodeJS recently and that has forced me to attack some problems from a different perspective. I was wondering what patterns had developed for the problem of processing chunks of data sequentially (rather than in parallel) in an asynchronous request-environment, but I haven't been able to find anything directly relevant.
So to summarize the problem:
I have a list of data stored in an array format that I need to process.
I have to send this data to a service asynchronously, but the service will only accept a few at a time.
The data must be processed sequentially to meet the restrictions on the service, meaning making a number of parallel asynchronous requests is not allowed
Working in this domain, the simplest pattern I've come up with is a recursive one. Something like
function processData(data, start, step, callback){
if(start < data.length){
var chunk = data.split(start, step);
queryService(chunk, start, step, function(e, d){
//Assume no errors
//Could possibly do some matching between d and 'data' here to
//Update data with anything that the service may have returned
processData(data, start+step, step, callback);
});
}
else{
callback(data);
}
}
Conceptually, this should step through each item, but it's intuitively complex. I feel like there should be a simpler way of doing this. Does anyone have a pattern they tend to follow when approaching this kind of problem?
My first thought process would be to rely on object encapsulation. Create an object that contains all of the information about what needs to be processed and all of the relevant data about what has been processed and is being processed and the callback function will just call the 'next' function for the object, which will in turn start processing on the next piece of data and update the object. Essentially working like a n asynchronous for-loop.
I'm trying to render a page via something similar to this:
var content = '';
db.query(imageQuery,function(images){
content += images;
});
db.query(userQuery,function(users){
content += users;
});
response.end('<div id="page">'+content+'</div>');
Unfortunately, content is empty. I already know that these Asynchronous Queries cause the problem, but I can't find a way to fix it.
Somebody please helps me out of this.
The problem with your code is that you're saying "go do these two things for a while and then send my response." -- in other words you've told node to go into the other room to get the next pages of a book, and told it to do when it was done doing that, but then when it was out of the room, you just continued trying to read the book without the new pages.
What you need to do is instead send your response only when the two database queries are done.
There are several ways you can do that, how you do it is up to you.
You can chain the queries. This is inefficient since you're doing one query, waiting for it to return, doing the second, waiting for it to return and then sending your response, but it's the most basic way to do it.
var content = '';
db.query(imageQuery,function(images){
content += images;
db.query(userQuery,function(users){
content += users;
response.end('<div id="page">'+content+'</div>');
});
});
See how the response.end is now inside the last db.query's callback, which inside the first db.query's callback? This guarantees order of operations however. Your first query will ALWAYS complete first.
You could also write some sort of primitive latching system to run the queries in parallel. This is a little more efficient (they don't necessarily happen simultaneously, but it'll be faster than chaining them.) However, with this method you can't guarantee order of operations.
var _latch = 0;
var resp = '';
var complete = function(content){
resp += content;
++_latch;
if(_latch === 2){
response.end('<div id="page">'+resp+'</div>');
}
};
db.query(imageQuery, complete);
db.query(userQuery, complete);
So what you're doing there is saying run these queries and then call the same function. That function aggregates the responses and then counts the number of time it's been called. When it's been called the number of times you're making queries, it then returns the results to the user.
These are the two basic ways of handling multiple asynchronous methods. However, there are a lot of utilities to help you do this so you don't have to handle it manually.
async is a great library that will help you run async functions in series, parallel, waterfall, etc. Takes a TON of pain out of async management.
runnel is a similar library, but with a much smaller focus than async
q or bluebird are promises librarys implementing promises/a+. This provides a different concept behind flow control (if you're familiar with jQuery's deferred object, this is the idea that they were trying to implement.
You can read more about promises here, but a quick google will also help explain the concept.
I'm very new to Node.js and I'm sure there's an easy answer to this, I just can't find it :(
I'm using the filesystem to hold 'packages' (folders with a status extensions 'mypackage.idle') Users can perform actions on these which would cause the status to go to something like 'qa', or 'deploying' etc... If the server is accepting lots of requests and multiple requests come in for the same package how would I check the status and then perform an action, which would change the status, guaranteeing that another request didn't alter it before/during the action took place?
so in c# something like this
lock (someLock) { checkStatus(); performAction(); }
Thanks :)
If checkStatus() and performAction() are synchronous functions called one after another, then as others mentioned earlier: their exectution will run uninterupted till completion.
However, I suspect that in reality both of these functions are asynchoronous, and the realistic case of composing them is something like:
function checkStatus(callback){
doSomeIOStuff(function(something){
callback(something == ok);
});
}
checkStatus(function(status){
if(status == true){
performAction();
}
});
The above code is subject to race conditions, as when doSomeIOStuff is being perfomed instead of waiting for it new request can be served.
You may want to check https://www.npmjs.com/package/rwlock library.
This is a bit misleading. There are many script languages that are suppose to be single threaded, but when sharing data from the same source this creates a problem. NodeJs might be single threaded when you are running a single request, but when you have multiple requests trying to access the same data, it just behaves as it creates kind of the same problem as if you were running a multithreaded language.
There is already an answer about this here : Locking on an object?
WATCH sentinel_key
GET value_of_interest
if (value_of_interest = FULL)
MULTI
SET sentinel_key = foo
EXEC
if (EXEC returned 1, i.e. succeeded)
do_something();
else
do_nothing();
else
UNWATCH
One thing you can do is lock on an external object, for instance, a sequence in a database such as Oracle or Redis.
http://redis.io/commands
For example, I am using cluster with node.js (I have 4 cores) and I have a node.js function and each time I run through it, I increment a variable. I basically need to lock on that variable so no two threads use the same value of that variable.
check this out How to create a distributed lock with Redis?
and this https://engineering.gosquared.com/distributed-locks-using-redis
I think you can run with this idea if you know what you are doing.
If you are making asynchronous calls with callbacks, this means multiple clients could potentially make the same, or related requests, and receive responses in different orders. This is definitely a case where locking is useful. You won't be 'locking a thread' in the traditional sense, but merely ensuring asynchronous calls, and their callbacks are made in a predictable order. The async-lock package looks like it handles this scenario.
https://www.npmjs.com/package/async-lock
warning, node.js change semantic if you add a log entry beucause logging is IO bound.
if you change from
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
to
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
console.log("my log stuff");
qa_action_performed = true
perform_action()
}
}
more than one thread can execute perform_action().
You don't have to worry about synchronization with Node.js since it's single threaded with an event loop. This is one of the advantage of the architecture that Node.js use.
Nothing will be executed between checkStatus() and performAction().
There are no locks in node.js -- because you shouldn't need them. There's only one thread (the event loop) and your code is never interrupted unless you perform an asynchronous action like I/O. Hence your code should never block. You can't do any parallel code execution.
That said, your code could look something like this:
qa_action_performed = false
function handle_request() {
if (check_status() == STATUS_QA && !qa_action_performed) {
qa_action_performed = true
perform_action()
}
}
Between check_status() and perform_action() no other thread can interrupt because there is no I/O. As soon as you enter the if clause and set qa_action_performed = true, no other code will enter the if block and hence perform_action() is never executed twice, even if perform_action() takes time performing I/O.