Node worker process / cron job advice - node.js

I have a database of items that I need to update — or rather just perform upon — every so often. I am using a message queue (Kue) to handle the concurrency of these jobs, but my process which goes about adding the jobs to the queue looks like this:
setInterval(function () {
feed.find({}, function (error, foundModels) {
jobs.create('update feeds', {
feeds: foundModels
}).save()
})
}, 6000)
Is polling like this the best way to add the jobs to the queue, do you think? Or should each feed be on its own timer (for example every job will spawn another job 6 afters after it's finished)?

I usually do it the way you've done it. In your case, it always pushes jobs at 6 second intervals. This is fine so long as your jobs don't take more than 6 seconds. If your jobs take more than 6 seconds then you'll start to get a backlog and you'll need to increase resources to handle the larger load. It can be a problem if resource usage spikes and you're not around to adjust for the spike and you don't have automated processes in place (you should).
The alternative is to only call your function 6 seconds after the last call returns. You'd do that like so:
function update() {
feed.find({}, function (error, foundModels) {
jobs.create('update feeds', {
feeds: foundModels
}).save(function() {
setTimeout(update,6000);
});
});
}
setTimeout(update, 6000);
I made the assumption that your .save method takes a callback like all good asynchronous libraries do. :-)

Related

How do I make function to running on background?

I have this code periodically calls the load function which does very load work taking 10sec. Problem is when load function is being executed, it's blocking the main flow. If I send a simple GET request (like a health check) while load is being executed, the GET call is blocked until the load call is finished.
function setLoadInterval() {
var self = this;
this.interval = setInterval(function doHeavyWork() {
// this takes 10 sec
self.load();
self.emit('reloaded');
}, 20000);
I tried async.parallel but still the GET call was blocked. I tried setTimeout but got the same result. How do I make load to running on background so that it doesn't block the main flow?
this.interval = setInterval(function doHeavyWork() {
async.parallel([function(cb) {
self.load();
cb(null);
}], function(err) {
if (err) {
// log error
}
self.emit('reloaded');
})
}, 20000);
Node.js is an event driven non-blocking IO model
Anything that is IO is offloaded as a separate thread in the underlying engine and hence parallelism is achieved.
If the task is CPU intensive there is no way you can achieve parallelism as by default Javascript is a blocking sync language
However there are some ways to achieve this by offloading the CPU intensive task to a different process.
Option1:
exec or spawn a child process and execute the load() function in that spawned node app. This is okay if the interval fired is for every 20000 ms as by the time another one fired, the 10sec process will be completed.
Otherwise it is dangerous as it can spawn too many node applications eating up your Systems resources
Option2:
I dont know how much data self.load() accepts and returns. If it is trivial and network overhead is acceptable, make that task a load-balanced web service (may be 4 webservers running in parallel) which accepts (rather point to) 1M records and returns back filtered records.
NOTE
It looks like you are using node async parallel function. But keep a note of this description from the documentation.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.

Async callback blocks NodeJS

I have a server-client based NODE.JS application.
server.js
...
socket.on('message', function(message) {
if(message.code == 103)
{
process_some_data()
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
client.js
.. sending responses back to server.js
the process_some_data() function takes about 4 seconds to complete, and when i have just one client it is not a problem, but if i have 10, they all choke and wait till the the last finishes.
I found out that the entire socket event waits till he finishes the current job, for example if i comment process_some_data(), it will not be frozen
I have tried 2 tweaks but the didn't worked :
...
socket.on('message', function(message) {
if(message.code == 103)
{
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
And even used http://caolan.github.io/async/ ,but no use :
...
socket.on('message', function(message) {
if(message.code == 103)
{
// Array to hold async tasks
var asyncTasks = [];
async.series([
setTimeout(function() {
process_some_data();
console.log("FINISH");
}, 1)
], function (err, results) {
console.log(results);
});
}
else
{
console.log("UNKNOWN MESSAGE");
}
});
...
How to make this ASYNC? Really need this.
Thank you.
You need multiple processes to solve this with Javascript, because Javascript engines are single-threaded.
What?
When it comes to handling I/O events, such as reading a socket, writing to a file or waiting for a signal, Javascript engines give the appearance of doing multiple things at the same time.
They are actually not: it's just that, under most conditions, processing these events takes so little computation, and the events themselves occur with so much time in between (a microsecond is an eternity for a CPU), that the engine can just process them one after another with plenty of time to spare.
In human time-scale, it looks like the engine is doing a lot of stuff in parallel, but it's just working serially at great speed.
No matter how you schedule your code to run, using setTimeout or Promise, it will still block other events from being processed during the time it's actively computing. Long-running computations (in the scale of seconds, instead of milliseconds) expose the single-threaded nature of the engine: it cannot actually do multiple things at the same time.
Multiple processes
Your computer, however, probably has multiple CPU cores. Unlike the Javascript engine, your hardware is capable of tackling multiple tasks at the same time, at least 1 per core. Even with a single core, your operating system can solve the problem if you run multiple processes.
Since a single Javascript process is single-threaded, you need multiple Javascript processes for this. An easy and time-proven architecture to solve your problem is this:
One Javascript program, running in one process, reads from the socket. Instead of calling process_some_data(), however, it puts all incoming messages in a queue.
This program then sends items from the queue to another Javascript program, running in a different process, that performs the computation using another CPU core. There may be multiple copies of this second process. In a modern computer, it makes sense to have twice as many active processes as you have CPU cores.
A simple approach for Node is to write an HTTP server, using express, that runs the computationally-intensive task. The main program can then use HTTP to delegate tasks to the workers, while still being able to read from the socket.
This is a good article on the topic of multi-processing with Node, using the cluster API.

Calling Express get commands concurrently

so currently I have this function
app.get('/requestItems', function(req, res){
//Check if Bots are online
if(botQueue.length == 0){
//If there are no bots in the queue to take the order, then we can't process it.
console.log("Sorry no bots available");
res.send("No Bots available ATM");
return;
} else {
var currentBot = botQueue.shift();
}
requestItems(currentBot.offerInstance, steamIDtoTrade, itemID, userAccessToken);
eventEmitter.on('requestOfferExpired', function(){
console.log("Request offer has timed out/been cancelled");
res.send("Request offer has timed out/been cancelled");
botQueue.push(currentBot);
});
eventEmitter.on('requestOfferAccepted', function(){
console.log("Request offer has completed");
res.send("Request offer has completed");
botQueue.push(currentBot);
});
});
When I call it, It takes about 5 minutes to run. While its running, I can't seem to make requests to the URL. I know node is a single threaded, but is there a way to run it parrallelly/concurrently? Or do I simply need to switch up my design strategy?
EDIT: requestItems function : http://pastebin.com/Eif5CeEv
If the 5 minutes per request is 100% CPU of node.js running with no IO, then you will not get any other requests processed during that time and you will need to run multiple node processes (probably in a cluster) in order to process multiple requests during that time.
If, on the other hand, much of that 5 minutes is doing IO (disk, database or socket), then a proper async IO design will allow your node server to process many requests during that time.
Since it seems unlikely that you're number crunching the CPU for 5 minutes in a row with no IO, my guess is that you need to switch to a proper async IO design and then you can use the advantages of node.js and can have many different requests processing at the same time.
You will need to disclose a lot more about what is going on in that 5 minutes for us to help more specifically. Some questions about those 5 minutes:
What types of operations are taking the 5 minutes of time?
Are you using node.js async disk IO and async database IO in a way that allows other node.js events to flow while a given request is waiting for IO?
Is your request handling strategy designed to have multiple requests being processed at the same time?
Can you show us pieces of the code that are taking 5 minutes?

What is the fastest way to complete a list of tasks with node.js async?

I have an array of fs.writeFile png jobs with the png headers already removed like so
canvas.toDataURL().replace(/^data:image\/\w+;base64,/,"")
jobs array like this
jobs=[['location0/file0'],['location1/file1'],['location2/file2'],['location3/file3']];
I have just started to use async and was looking at their docs and there are lots of methods
queue looks interesting and parallel..
Right now I handle my jobs (in a async.waterfall) like so
function(callback){//part of waterfall
(function fswritefile(){
if(jobs.length!==0){
var job=jobs.shift();
fs.writeFile(job[0],(new Buffer(job[1],'base64')),function(e){if(e){console.log(e);}else{fswritefile();}})
}
else{callback();}
})();
},//end of waterfall part
Could this be done more efficiently/faster using this module?
async.waterfall will process jobs sequentially. I think you could do everything in parallel with async.each:
async.each(jobs, function (job, done) {
var data = new Buffer(job[1],'base64');
fs.writeFile(job[0], data, done);
}, function (err) {
// …
});
All jobs will start everything in parallel. However, node.js always limits the number of concurrent operations on the disk to 4.
EDIT: No matter what you do, node.js will limit the number of concurrent operations on the fs. The main reason is that you have only have 1 disk and it would be inefficient to attempt more.

How to have heavy processing operations done in node.js

I have a heavy data processing operation that I need to get done per 10-12 simulatenous request. I have read that for higher level of concurrency Node.js is a good platform and it achieves it by having an non blocking event loop.
What I know is that for having things like querying a database, I can spawn off an event to a separate process (like mongod, mysqld) and then have a callback which will handle the result from that process. Fair enough.
But what if I want to have a heavy piece of computation to be done within a callback. Won't it block other request until the code in that callback is executed completely. For example I want to process an high resolution image and code I have is in Javascript itself (no separate process to do image processing).
The way I think of implementing is like
get_image_from_db(image_id, callback(imageBitMap) {
heavy_operation(imageBitMap); // Can take 5 seconds.
});
Will that heavy_operation stop node from taking in any request for those 5 seconds. Or am I thinking the wrong way to do such task. Please guide, I am JS newbie.
UPDATE
Or can it be like I could process partial image and make the event loop go back to take in other callbacks and return to processing that partial image. (something like prioritising events).
Yes it will block it, as the callback functions are executed in the main loop. It is only the asynchronously called functions which do not block the loop. It is my understanding that if you want the image processing to execute asynchronously, you will have to use a separate processes to do it.
Note that you can write your own asynchronous process to handle it. To start you could read the answers to How to write asynchronous functions for Node.js.
UPDATE
how do i create a non-blocking asynchronous function in node.js? may also be worth reading. This question is actually referenced in the other one I linked, but I thought I'd include it here to for simplicity.
Unfortunately, I don't yet have enough reputation points to comment on Nick's answer, but have you looked into Node's cluster API? It's currently still experimental, but it would allow you to spawn multiple threads.
When a heavy piece of computation is done in the callback, the event loop would be blocked until the computation is done. That means the callback will block the event loop for the 5 seconds.
My solution
It's possible to use a generator function to yield back control to the event loop. I will use a while loop that will run for 3 seconds to act as a long running callback.
Without a Generator function
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log('blocked')
}
}
loop();
The output would be:
// blocked
// blocked
//
// ... would not return to the event loop while the loop is running
//
// blocked
//...when the loop is over then the setInterval kicks in
// resumed
// resumed
With a Generator function
let gen;
let start = Date.now();
setInterval(() => console.log('resumed'), 500);
function *loop() {
while ((Date.now() - start) < 3000) { //while the difference between Date.now() and start is less than 3 seconds
console.log(yield output())
}
}
function output() {
setTimeout(() => gen.next('blocked'), 500)
}
gen = loop();
gen.next();
The output is:
// resumed
// blocked
//...returns control back to the event loop while though the loop is still running
// resumed
// blocked
//...end of the loop
// resumed
// resumed
// resumed
Using javascript generators can help run heavy computational functions that would yield back control to the event loop while it's still computing.
To know more about the event loop visit
https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*
https://davidwalsh.name/es6-generators

Resources