How to manage a queue in nodejs? - node.js

I have written a script in Nodejs that takes a screenshot of websites(using slimerJs), this script takes around 10-20 seconds to complete, the problem here is the server is stalled until this script has is finished.
app.get('/screenshot', function (req, res, next) {
var url = req.query.url;
assert(url, "query param 'url' needed");
// actual saving happens here
var fileName = URL.parse(url).hostname + '_' + Date.now() + '.png';
var command = 'xvfb-run -a -n 5 node slimerScript.js '+ url + ' '+ fileName;
exec(command, function (err, stdout, stderror) {
if(err){ return next(err); }
if(stderror && (stderror.indexOf('error')!= -1) ){ return next(new Error('Error occurred!')); }
return res.send({
status: true,
data: {
fileName: fileName,
url: "http://"+path.join(req.headers.host,'screenshots', fileName)
}
});
})
});
Since the script spawns a firefox browser in memory and loads the website, the ram usage can spike upto 600-700mb, and thus i cannot execute this command asynchronously as ram is expensive on servers.
may i know if its possible to queue the incoming requests and executing them in FIFO fashion?
i tried checking packages like kue, bull and bee-queues, but i think these all assume the job list is already known before the queue is started, where as my job list depends on users using the site, and i wanna also tell people that they are in queue and need to wait for their turn. is this possible with the above mentioned packages?

If I were doing the similar thing, I would try these steps.
1.An array(a queue) to store requested info, when any request come, store those info in the array, and send back a msg to users, telling them they are in the queue, or the server is busy if there are already too many requests.
2.Doing the screen shot job, async, but not all in the same time. You could start the job if you find the queue is empty when a new request comes, and start another recursively when you finish the last one.
function doSceenShot(){
if(a.length > 1){
execTheJob((a[0])=>{
//after finishing the job;
doScreenShot()
})
}
}
3.Notify the user you've finished the job, via polling or other ways.

Related

NodeJS unzipper stream never finishes

I am attempting to download a zip file, extract the contents, and push them into a database. Unfortuantely, my stream never seems to complete, so I never get the opportunity to do clean up and end the process.
I have stripped the code down to the minimum to reproduce the error.
let debugmode = false;
fs.createReadStream(zPath)
.pipe(unzip.Parse())
.pipe(Stream.Transform({
objectMode: true,
transform: async function(entry,e,done) {
console.log('Item: ' + debugmode++ + ' of 819080');
let buff = await entry.buffer();
await entry.autodrain().promise()
done();
}
}))
.on('finish',()=>{
console.log('DONE');
})
;
The log shows the last couople of items, but never issues the word DONE.
Item: 819075
Item: 819076
Item: 819077
Item: 819078
Item: 819079
Item: 819080
Is there something I have done incorrectly? Is there something I can do to monitor for the end of file and kill the stream?
Extra Info
In the actual code, there is also a transform that reports progress based on bytes processed. There are a few bytes processed after this item.
I am using unzipper to do the extract
The zip file is a publicly accessible SEC submissions.zip. I have no problem with companies.zip. (I'm trying to find their linkable page)
I download the zip in full before processing.
Out of frustration, I have implemented a Dead Man's Switch.
let deadman = null;
await new Promise((resolve)=>{
fs.createReadStream(zPath)
.pipe(unzip.Parse())
.pipe(Stream.Transform({
clearTimeout(deadman);
deadman = setTimeout(resolve,60000);
/// still do all the other stuff
}
}))
.on('finish',()=>{
clearTimeout(deadman);
console.log('DONE');
resolve();
})
});
Now, every time it processes an entry, it has 60 seconds to complete processing. If it fails to complete processing in 60 seconds, it is assumed to have died and the promise is resolved. The timer is restarted every time an item is processed (the stream demonstrates it is still alive).
While I do not consider this a solution, just a work around, it is intended to be used as a single process, so it can be terminated after the run (to clean up the memory)

async.queue concurrent tasks

I am using async.queue to ensure that certain file copies in a service happen at most n concurrently, but watching the files copy sometimes I see a lot more than what the queue allows. Does anyone see something I may have missed in the below implementation?
createQueue(limit: number) {
let self = this;
return async.queue(function(cmdObj, callback) {
console.log("Beginning copy");
let cmd = cmdObj.cmd;
let args = cmdObj.args;
let request = cmdObj.req;
request.state = State.IN_PROGRESS;
self.reportStatus(request.destination);
const proc = spawn(cmd, args); //uses an rsync command upstream
proc.on("close", code => {
if (code !== 0) {
request.state = State.ERRORED;
self.reportStatus(request.destination); // these just report to the caller
statusMap.delete(request.destination);
} else {
fs.rename(request.destination + ".part", request.destination);
request.state = State.COMPLETED;
self.reportStatus(request.destination); // same here
statusMap.delete(request.destination);
}
callback();
});
proc.on("error", err => {
console.error("COPY ERR: " + err);
});
}, limit); // limit here, for example, may be two, but I see four copies concurrently
}
EDIT:
I now believe this is a side effect of the rest of the system...queues being cleared and reinitialized AFTER copies have started...so when new items are added to the reinitialized queues, they kick off immediately, as the system has no idea if something has been handed off to userland and is currently running.
So, this was user error...PEBCAK! Posting the solution more as a cautionary tale:
The queues above were working as designed, but I had an endpoint for the calling server to clear the queues as necessary; the problem was i was using kill() and re-initializing the queues, losing all track of any jobs in progress and their callbacks. As soon as a new item hit the fresh queue, it would think nothing was happening and spawn a new copy process. I resolved by using remove to clear the queues instead of re-initializing.

Is nesting async calls inside async calls desirable? (Node.js)

I am playing with Node.js and I have created a simple script that uploads files from a directory to a server:
var request = require('request');
var file = require('file');
var fs = require('fs');
var path = require('path');
VERSION = '0.1'
CONFIG_FILE = path.join(__dirname, 'etc', 'sender.conf.json');
var config = JSON.parse(
fs.readFileSync(CONFIG_FILE).toString()
);
var DATA_DIR = __dirname
config['data_dir'].forEach(function(dir) {
DATA_DIR = path.join(DATA_DIR, dir)
});
console.log('sending data from root directory: ' + DATA_DIR);
file.walk(
DATA_DIR,
function(err, dir_path, dirs, files) {
if(err) {
return console.error(err);
}
sendFiles(dir_path, files);
}
);
function sendFiles(dir_path, files)
{
files
.filter(function(file) {
return file.substr(-5) === '.meta';
})
.forEach(function(file) {
var name = path.basename(file.slice(0, -5));
sendFile(dir_path, name);
})
;
}
function sendFile(dir_path, name)
{
console.log("reading file start: " + dir_path + "/" + name);
fs.readFile(
path.join(dir_path, name + '.meta'),
function(err, raw_meta) {
if(err) {
return console.error(err);
}
console.log("reading file done: " + dir_path + "/" + name);
sendData(
name,
JSON.parse(raw_meta),
fs.createReadStream(path.join(dir_path, name + '.data'))
);
}
);
console.log("reading file async: " + dir_path + "/" + name);
}
function sendData(name, meta, data_stream)
{
meta['source'] = config['data_source'];
var req = request.post(
config['sink_url'],
function(err, res, body) {
if(err) {
console.log(err);
}
else {
console.log(name);
console.log(meta);
console.log(body);
}
}
);
var form = req.form();
form.append(
'meta',
JSON.stringify(meta),
{
contentType: 'application/x-www-form-urlencoded'
}
);
form.append(
'data',
data_stream
);
}
It works fine, when run with only a few files. But when I run it on directory with lots of files, it chokes. This is because it keeps creating huge amounts of tasks for reading from a file, but never gets to actually doing the reading (because there is too many files). This can be observed on output:
sending data from root directory: .../data
reading file start: .../data/ac/ad/acigisu-adruire-sabeveab-ozaniaru-fugeef-wemathu-lubesoraf-lojoepe
reading file async: .../data/ac/ad/acigisu-adruire-sabeveab-ozaniaru-fugeef-wemathu-lubesoraf-lojoepe
reading file start: .../data/ac/ab/acodug-abueba-alizacod-ugvut-nucom
reading file async: .../data/ac/ab/acodug-abueba-alizacod-ugvut-nucom
reading file start: .../data/ac/as/acigisu-asetufvub-liwi-ru-mitdawej-vekof
reading file async: .../data/ac/as/acigisu-asetufvub-liwi-ru-mitdawej-vekof
reading file start: .../data/ac/av/ace-avhad-bop-rujan-pehwopa
reading file async: .../data/ac/av/ace-avhad-bop-rujan-pehwopa
...
For each file, there is console output "reading file start" produced immediately before call to fs.readFile, and "reading file async" that is produced immediately after the async reading has been scheduled. But there is no "reading file done" message even when I let it run for a long time, which means that reading of any file has probably never been even scheduled (those files are on order of 100s of bytes, so once scheduled, those reads would probably finish in single go).
This leads me to the following thought process. Async calls in Node.js are done because the event loop itself is single-threaded and we do not want to block it. However, once this requirement is satisfied, does it make any sense to nest further async calls into async calls that are themselves nested in async calls, etc.? Would it serve any particular purpose? Moreover, would not it be actual pessimisation of the code due to scheduling overhead that is not really needed and can be completely avoided if complete handling of single file have consisted of synchronous calls only?
Given the thought process above, my course of action would be to use solution from this question:
asynchronously push names of all files to async.queue
limit number of parallel tasks by setting queue.concurrency
provide file-upload handler that is completely synchronous, i.e. it synchronously reads contents of the file and after that is finished, it synchronously sends POST request to the server
This is my very first try to use Node.js and/or JavaScript, therefore it is quite possible I am completely wrong (note that e.g. sync-request package makes it very clear that synchronous calls are not desirable, which is in contradiction with my thought process above - the question is why). Any comments on validity of the above thought process as well as viability of the proposed solution and eventual alternatives to it would be very much appreciated.
== Update ==
There is very good article explaining all this in great detail directly in documentation of Node.js.
As for the particular problem at hand, it is indeed in the choice of file-system-walker-module. The solution is to use e.g. walk instead of file:
## -4,7 +4,7 ##
var request = require('request');
-var file = require('file');
+var walk = require('walk');
var fs = require('fs');
var path = require('path');
## -24,13 +24,19 ## config['data_dir'].forEach(function(dir) {
console.log('sending data from root directory: ' + DATA_DIR);
-file.walk(
- DATA_DIR,
- function(err, dir_path, dirs, files) {
- if(err) {
- return console.error(err);
- }
- sendFiles(dir_path, files);
+var walker = walk.walk(DATA_DIR)
+walker.on(
+ 'files',
+ function(dir_path, files, next) {
+ sendFiles(dir_path, files.map(function(stats) { return stats.name; }));
+ next();
+ }
+);
+walker.on(
+ 'errors',
+ function(dir_path, node_stats, next) {
+ console.error('file walker:', node_stats);
+ next();
}
);
== Original Post ==
After a bit more study, I will attempt to answer my own question. This answer is still only a partial solution (more complete answer from someone who has actual experience with Node.js would be very much appreciated).
The short answer to the main question above is that it indeed is not only desirable, but also almost always necessary to schedule more asynchronous functions from already asynchronous functions. The long explanation follows.
It is because of how Node.js scheduling works: "Everything runs on a different thread except our code.". There are two very important comments in the discussion below the linked blog post:
"Javascript always finishes the currently executing function first. An event will never interrupt a function." [Twitchard]
"Also note it won't just finish the current function, it will run to completion of all synchronous functions and I believe anything queued with process.nextTick... before the request callback is handled." [Tim Oxley]
There is also a note mentioning this in the documentatoin of the process.nextTick: "The next tick queue is completely drained on each pass of the event loop before additional I/O is processed. As a result, recursively setting nextTick callbacks will block any I/O from happening, just like a while(true); loop."
So, to summarize, all code of the script itself is running on single thread and single thread only. The asynchronous callbacks scheduled to be run are executed on that very same single thread and they are executed only after whole current next tick queue has been drained. Use of asynchronous callbacks provide the only point, when some other function can be scheduled to be run. If the file-upload handler would not schedule any additional asynchronous tasks as described in the question, its execution would block everything else until that whole file-upload handler will have been finished. That is not desirable.
This also explains why the actual reading of the input file never occurs ("recursively setting nextTick callbacks will block any I/O from happening" - see above). It eventually would occur after all the tasks for whole directory hierarchy traversed will have been scheduled. However, without further study, I am not able to answer the question how to limit the number of file-upload tasks scheduled (effectively size of the task queue) and block the scheduling loop until some of those tasks will have been processed (some room on the task queue has been freed). Hence this answer is still incomplete.

Amazon SQS with aws-sdk receiveMessage Stall

I'm using the aws-sdk node module with the (as far as I can tell) approved way to poll for messages.
Which basically sums up to:
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
});
receiveMessage is my "do stuff with collected messages if I have enough collected messages" function
Occasionally, my script is stalling because I don't get a response for Amazon at all, say for example there are no messages in the queue to consume and instead of hitting the WaitTimeSeconds and sending a "no messages object", the callback isn't called.
(I'm writing this up to Amazon Weirdness)
What I'm asking is whats the best way to detect and deal with this, as I have some code in place to stop concurrent calls to receiveMessage.
The suggested answer here: Nodejs sqs queue processor also has code that prevents concurrent message request queries (granted it's only fetching one message a time)
I do have the whole thing wrapped in
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
} else {
running = true;
// call SQS.receive
}
}, 500);
(With a running = false after the delete loop (not in it's callback))
My solution would be
watchdogTimeout = setTimeout(function() {
running = false;
}, 30000);
But surely this would leave a pile of floating sqs.receive's lurking about and thus much memory over time?
(This job runs all the time, and I left it running on Friday, it stalled Saturday morning and hung till I manually restarted the job this morning)
Edit: I have seen cases where it hangs for ~5 minutes and then suddenly gets messages BUT with a wait time of 20 seconds it should throw a "no messages" after 20 seconds. So a WatchDog of ~10 minutes might be more practical (depending on the rest of ones business logic)
Edit: Yes Long Polling is already configured Queue Side.
Edit: This is under (latest) v2.3.9 of aws-sdk and NodeJS v4.4.4
I've been chasing this (or a similar) issue for a few days now and here's what I've noticed:
The receiveMessage call does eventually return although only after 120 seconds
Concurrent calls to receiveMessage are serialised by the AWS.SDK library so making multiple calls in parallel have no effect.
The receiveMessage callback does not error - in fact after the 120 seconds have passed, it may contain messages.
What can be done about this? This sort of thing can happen for a number of reasons and some/many of these things can't necessarily be fixed. The answer is to run multiple services each calling receiveMessage and processing the messages as they come - SQS supports this. At any time, one of these services may hit this 120 second lag but the other services should be able to continue on as normal.
My particular problem is that I have some critical singleton services that can't afford 120 seconds of down time. For this I will look into either 1) use HTTP instead of SQS to push messages into my service or 2) spawn slave processes around each of the singletons to fetch the messages from SQS and push them into the service.
I also ran into this issue, but not when calling receiveMessage but sendMessage. I also saw hangups of exactly 120 seconds. I also saw it with a few other services, like Firehose.
That lead me to this line in the AWS SDK:
SQS Constructor
httpOptions:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
to implement a fix, I override the timeout for my SQS client that performs the sendMessage to timeout after 10 seconds, and another with 25 seconds for receiving (where I long poll for 20 seconds):
var sendClient = new AWS.SQS({httpOptions:{timeout:10*1000}});
var receiveClient = new AWS.SQS({httpOptions:{timeout:25*1000}});
I've had this out in production for a week now and I've noticed that all of my SQS stalling issues have been eliminated.

Is making sequential HTTP requests a blocking operation in node?

Note that irrelevant information to my question will be 'quoted'
like so (feel free to skip these).
Problem
I am using node to make in-order HTTP requests on behalf of multiple clients. This way, what originally took the client(s) several different page loads to get the desired result, now only takes a single request via my server. I am currently using the ‘async’ module for flow control and ‘request’ module for making the HTTP requests. There are approximately 5 callbacks which, using console.time, takes about ~2 seconds from start to finish (sketch code included below).
Now I am rather inexperienced with node, but I am aware of the
single-threaded nature of node. While I have read many times that node
isn’t built for CPU-bound tasks, I didn’t really understand what that
meant until now. If I have a correct understanding of what’s going on,
this means that what I currently have (in development) is in no way
going to scale to even more than 10 clients.
Question
Since I am not an expert at node, I ask this question (in the title) to get a confirmation that making several sequential HTTP requests is indeed blocking.
Epilogue
If that is the case, I expect I will ask a different SO question (after doing the appropriate research) discussing various possible solutions, should I choose to continue approaching this problem in node (which itself may not be suitable for what I'm trying to do).
Other closing thoughts
I am truly sorry if this question was not detailed enough, too noobish, or had particularly flowery language (I try to be concise).
Thanks and all the upvotes to anyone who can help me with my problem!
The code I mentioned earlier:
var async = require('async');
var request = require('request');
...
async.waterfall([
function(cb) {
console.time('1');
request(someUrl1, function(err, res, body) {
// load and parse the given web page.
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
console.timeEnd('1');
console.time('2');
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
// more computation
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
console.timeEnd('2');
console.time('3');
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
// do more parsing + computation
// make another callback with the results
});
},
function(moreParameters, cb) {
console.timeEnd('3');
console.time('4');
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
// make final callback after some more computation.
//This part takes about ~1s to complete
});
}
], function (err, result) {
console.timeEnd('4'); //
res.status(200).send();
});
Normally, I/O in node.js are non-blocking. You can test this out by making several requests simultaneously to your server. For example, if each request takes 1 second to process, a blocking server would take 2 seconds to process 2 simultaneous requests but a non-blocking server would take just a bit more than 1 second to process both requests.
However, you can deliberately make requests blocking by using the sync-request module instead of request. Obviously, that's not recommended for servers.
Here's a bit of code to demonstrate the difference between blocking and non-blocking I/O:
var req = require('request');
var sync = require('sync-request');
// Load example.com N times (yes, it's a real website):
var N = 10;
console.log('BLOCKING test ==========');
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
var res = sync('GET','http://www.example.com')
console.log('Downloaded ' + res.getBody().length + ' bytes');
}
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
console.log('NON-BLOCKING test ======');
var loaded = 0;
var start = new Date().valueOf();
for (var i=0;i<N;i++) {
req('http://www.example.com',function( err, response, body ) {
loaded++;
console.log('Downloaded ' + body.length + ' bytes');
if (loaded == N) {
var end = new Date().valueOf();
console.log('Total time: ' + (end-start) + 'ms');
}
})
}
Running the code above you'll see the non-blocking test takes roughly the same amount of time to process all requests as it does for a single request (for example, if you set N = 10, the non-blocking code executes 10 times faster than the blocking code). This clearly illustrates that the requests are non-blocking.
Additional answer:
You also mentioned that you're worried about your process being CPU intensive. But in your code, you're not benchmarking CPU utility. You're mixing both network request time (I/O, which we know is non-blocking) and CPU process time. To measure how much time the request is in blocking mode, change your code to this:
async.waterfall([
function(cb) {
request(someUrl1, function(err, res, body) {
console.time('1');
// load and parse the given web page.
console.timeEnd('1');
// make a callback with data parsed from the web page
});
},
function(someParameters, cb) {
request({url: someUrl2, method: 'POST', form: {/* data */}}, function(err, res, body) {
console.time('2');
// more computation
console.timeEnd('2');
// make a callback with a session cookie given by the visited url
});
},
function(jar, cb) {
request({url: someUrl3, method: 'GET', jar: jar /* cookie from the previous callback */}, function(err, res, body) {
console.time('3');
// do more parsing + computation
console.timeEnd('3');
// make another callback with the results
});
},
function(moreParameters, cb) {
request({url: someUrl4, method: 'POST', jar: jar, form : {/*data*/}}, function(err, res, body) {
console.time('4');
// some more computation.
console.timeEnd('4');
// make final callback
});
}
], function (err, result) {
res.status(200).send();
});
Your code only blocks in the "more computation" parts. So you can completely ignore any time spent waiting for the other parts to execute. In fact, that's exactly how node can serve multiple requests concurrently. While waiting for the other parts to call the respective callbacks (you mention that it may take up to 1 second) node can execute other javascript code and handle other requests.
Your code is non-blocking because it uses non-blocking I/O with the request() function. This means that node.js is free to service other requests while your series of http requests is being fetched.
What async.waterfall() does it to order your requests to be sequential and pass the results of one on to the next. The requests themselves are non-blocking and async.waterfall() does not change or influence that. The series you have just means that you have multiple non-blocking requests in a row.
What you have is analogous to a series of nested setTimeout() calls. For example, this sequence of code takes 5 seconds to get to the inner callback (like your async.waterfall() takes n seconds to get to the last callback):
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
setTimeout(function() {
// it takes 5 seconds to get here
}, 1000);
}, 1000);
}, 1000);
}, 1000);
}, 1000);
But, this uses basically zero CPU because it's just 5 consecutive asynchronous operations. The actual node.js process is involved for probably no more than 1ms to schedule the next setTimeout() and then the node.js process literally could be doing lots of other things until the system posts an event to fire the next timer.
You can read more about how the node.js event queue works in these references:
Run Arbitrary Code While Waiting For Callback in Node?
blocking code in non-blocking http server
Hidden threads in Javascript/Node that never execute user code: is it possible, and if so could it lead to an arcane possibility for a race condition?
How does JavaScript handle AJAX responses in the background? (written about the browser, but concept is the same)
If I have a correct understanding of what’s going on, this means that
what I currently have (in development) is in no way going to scale to
even more than 10 clients.
This is not a correct understanding. A node.js process can easily have thousands of non-blocking requests in flight at the same time. Your sequentially measured time is only a start to finish time - it has nothing to do with CPU resources or other OS resources consumed (see comments below on non-blocking resource consumption).
I still have concerns about using node for this particular
application then. I'm worried about how it will scale considering that
the work it is doing is not simple I/O but computationally intensive.
I feel as though I should switch to a platform that enables
multi-threading. Does what I'm asking/the concern I'm expressing make
sense? I could just be spitting total BS and have no idea what I'm
talking about.
Non-blocking I/O consumes almost no CPU (only a little when the request is originally sent and then a little when the result arrives back), but while the compmuter is waiting for the remove result, no CPU is consumed at all and no OS thread is consumed. This is one of the reasons that node.js scales well for non-blocking I/O as no resources are used when the computer is waiting for a response from a remove site.
If your processing of the request is computationally intensive (e.g. takes a measurable amount of pure blocking CPU time to process), then yes you would want to explore getting multiple processes involved in running the computations. There are multiple ways to do this. You can use clustering (so you simply have multiple identical node.js processes each working on requests from different clients) with the nodejs clustering module. Or, you can create a work queue of computationally intensive work to do and have a set of child processes that do the computationally intensive work. Or, there are several other options too. This not the type of problem that one needs to switch away from node.js to solve - it can be solved using node.js just fine.
You can use queue to process concurrent http calls in nodeJs
https://www.npmjs.com/package/concurrent-queue
var cq = require('concurrent-queue');
test_queue = cq();
// request action method
testQueue: function(req, res) {
// queuing each request to process sequentially
test_queue(req.user, function (err, user) {
console.log(user.id+' done');
res.json(200, user)
});
},
// Queue will be processed one by one.
test_queue.limit({ concurrency: 1 }).process(function (user, cb) {
console.log(user.id + ' started')
// async calls will go there
setTimeout(function () {
// on callback of async, call cb and return response.
cb(null, user)
}, 1000);
});
Please remember that it needs to implement for sensitive business calls where the resource needs to be accessed or update at a time by one user only.
This will block your I/O and make your users to wait and response time will be slow.
Optimization:
You can make it faster and optimize it by creating resource dependent queue. So that the there is a separate queue for each shared resource and synchronous calls for same resource can only be execute for same resource and for different resources the calls will be executed asynchronously
Let suppose that you want to implement that on the base of current user. So that for the same user http calls can only execute synchronously and for different users the https calls will be asynchronous
testQueue: function(req, res) {
// if queue not exist for current user.
if(! (test_queue.hasOwnProperty(req.user.id)) ){
// initialize queue for current user
test_queue[req.user.id] = cq();
// initialize queue processing for current user
// Queue will be processed one by one.
test_queue[req.user.id].limit({ concurrency: 1 }).process(function (task, cb) {
console.log(task.id + ' started')
// async functionality will go there
setTimeout(function () {
cb(null, task)
}, 1000)
});
}
// queuing each request in user specific queue to process sequentially
test_queue[req.user.id](req.user, function (err, user) {
if(err){
return;
}
res.json(200, user)
console.log(user.id+' done');
});
},
This will be fast and block I/O for only that resource for which you want.

Resources