Run parallel processes in node js to handle SQS messages - node.js

sqs allow MaxNumberOfMessages = 10
("The maximum number of messages to return. Amazon SQS never returns more messages than this value but may return fewer. ")
to fetch messages at once, So is there any way we can run multiple parallel processes
in nodejs which can handle many sqs messages.
Any npm package available for that?

Async might not be the right option as parallel operations are really not run parallel on them.
https://github.com/caolan/async#paralleltasks-callback
parallel(tasks, [callback])
Run the tasks collection of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback, the main callback is immediately called with the value of the error. Once the tasks have completed, the results are passed to the final callback as an array.
Note: parallel is about kicking-off I/O tasks in parallel, not about parallel execution of code. If your tasks do not use any timers or perform any I/O, they will actually be executed in series. Any synchronous setup sections for each task will happen one after the other. JavaScript remains single-threaded.
It is also possible to use an object instead of an array. Each property will be run as a function and the results will be passed to the final callback as an object instead of an array. This can be a more readable way of handling results from parallel.
But you can use background threads , worker threads to run these tasks in parallel,But not sure if this would solve ur issue fully.

You can spin up multiple processes, but node is meant for leveraging the max out of the available core with a single processes. Creating more processes will not necessarily make the overall throughput much higher.
If you have a multicore machine, generally it is advisable to have one process per core.
The AWS Javascript SDK for SQS works asynchronously, i.e the process will continue to fetch more messages when I/O is happening for the first fetch.
Unless you are making the process synchronous by waiting, the process will retrieve messages from SQS continuously.

Related

Most optimal way to execute timed functions in parallel using node?

I'm trying to create a timed scheduler that can execute tasks in parallel. For example:
Let's say I'm trying to create a function that will do something after 10 seconds of being called. After calling Process_1(), it will be expected to run its intended functionality after 10 seconds.
But at the 5 second mark while Process_1() is waiting to be executed at the halfway point, I'm now calling Process_2() midway. So at the 10 seconds mark, Process_1() will execute its function and at the 15 seconds mark, Process_2() will execute its function.
I've tried using node-cron for this but it doesn't seem like it can schedule things in parallel. Thanks in advance!
Nodejs runs your Javascript in a single thread unless you explicitly create a WorkerThread and run some code in that. True parallel execution where both jobs are running code that uses the CPU will only be accomplished if you either run each task in a WorkerThread or child process to get it out of the main thread.
Let me repeat, true parallel execution requires more than one thread or process in nodejs and nodejs does not do that by default so you will have to create a WorkerThread or child_process.
So, if you have code that takes more than a few ms to do its work and you want it to run at a fairly precise time, then you can't count on the main Javascript thread to do that because it might be busy at that precise time. Timers in Javascript will run your code no earlier than the scheduled time, and when that scheduled time comes around, the event loop is ready to run them, but they won't actually run until whatever was running before finishes and returns control back to the event loop so the event loop can run the code attached to your timer.
So, if all you're mostly doing is I/O kind of work (reading/writing files or network), then your actual Javascript execution time is probably only milliseconds and nodejs can be very, very responsive to run your timers pretty close to "on time". But, if you have computationally expensive things that keep the CPU busy for much longer, then you can't count on your timers to run "on time" if you run that CPU-heavy stuff in the main thread.
What you can do, is start up a WorkerThread, set the timer in the WorkerThread and run your code in the worker thread. As long as you don't ask that WorkerThread to run anything else, it should be ready to run that timer pretty much "on time".
Now WorkerThreads do share some resources with the main thread so they aren't 100% independent (though they are close to independent). If you want 100% independence, then you can start a nodejs child process that runs a node script, sets its own timers and runs its own work in that other process.
All that said, the single threaded model works very, very well at reasonably high scale for code that is predominantly I/O code because nodejs uses non-blocking I/O so while it's waiting to read or write from file or network, the main thread is free and available to run other things. So, it will often give the appearance of running things in parallel because progress is being made on multiple fronts. The I/O itself inside the nodejs library is either natively non-blocking (network I/O) or is happening in an OS-native thread (file I/O) and the programming interface to Javascript is callback or promise based so it is also non-blocking.
I mention all this because you don't say what your two operations that you want to run in parallel are (including your actual code allows us to write more complete answers). If they are I/O or even some crypto, then they may already be non-blocking and you may achieve desired parallelism without having to use additional threads or processes.

If blocking operations are asynchronously handled and results in a queue of blocking processes, how does asynchronous programming speed things up?

Like many before, I came across this diagram describing the architecture of NodeJS and since I'm new to the concept of Asynchronous programming, let me first share with you my (possibly flawed) understanding of Node's architecture.
To my knowledge, the main application script is first compiled into binary code using Chrome's V8 Engine. After this it moves through Node.JS bindings, which is a low-level API that allows the binary code to be handled by the event mechanism. Then, a single thread is allocated to the event loop, which loops infinitely, continually picks up the first (i.e. the oldest) event in the event queue and assigns a worker thread to process the event. After that, the callback is stored in the event queue, moved to a worker thread by the event-loop thread, and - depending on whether or not the callback function had another nested callback function - is either done or executes any of the callback functions that have not yet been processed.
Now here's what I don't get. The event-loop is able to continually assign events to worker threads, but the code that the worker threads have to process is still CPU blocking and the amount of worker threads is still limited. In a synchronous process, wouldn't it be able to assign different pieces of code to different worker threads on the server's CPU?
Let's use an example:
var fs = require('fs');
fs.readFile('text.txt', function(err, data) {
if(err) {
console.log(err);
} else {
console.log(data.toString());
}
});
console.log('This will probably be finished first.');
This example will log 'This will probably be finished first' and then output the data of the text.txt file later, since it's the callback function of the fs.readFile() function. Now I understand that NodeJS has a non-blocking architecture since the second block of code is finished earlier than the first even though it was called in a later stage. However, the total amount of time it takes for the program to be finished would still be the addition of the time it takes for each function to finish, right?
The only answer I can think of is that asynchronous programming allows for multithreading whereas synchronous programming does not. Otherwise, asynchronous event handling wouldn't actually be faster than synchronous programming, right?
Thanks in advance for your replies.

how does async.parallel work in nodejs with a single thread

The async library in nodejs provides many methods like parallel and times to execute multiple calls in parallel. How can this be really parallel since nodejs is single threaded and the event loop can pick only one job at a time.
Does this mean async.parallel is not really parallel and only asynchronous ? I understand that both asynchronous and parallel are totally different terms.

Node Event Loop Confusion

I am a bit confused and don't know if I fully understand the nodeJS event loop/non-blocking I/O concepts.
Let's say in my server I have:
app.get('/someURL', AuthHandler.restrict, MainHandler.findAllStocksFromUser);
And findAllStocksFromUser() is defined like so:
findAllStocksFromUser(req,res) {
/* Do some terribly inefficient, heavy computation*/
res.send(/*return something*/);
}
So now let's say 5 requests come in. As I understand, with each request that comes in, a callback, in this case findAllStocksFromUser(), is added to the eventloop queue, and with every tick, the callbacks are called.
Questions:
The "terribly inefficient, heavy computation" won't effect the server's ability to efficiently receive requests as they come in and immediately add their callbacks to the queue, correct?
But the "terribly inefficient, heavy computation" is going to block the other callbacks until it's done and cause the server to be inefficient in that way, right?
In node.js, your Javascript is single threaded. That means that only one piece of Javascript is run at a time. So, once a request handler starts running, it keeps running until it either finishes entirely and returns back to the system that called it or until it starts an async operation (DB, file, network, etc...) and then returns back to the system that called it. Only then, can other requests start processing.
So, if your "heavy computation" is truly lots of synchronous Javascript running, then no other requests will process while that is running. If that "heavy computation" actually has lots of async operations in it, then other requests will get to run while the handlers waits for responses from the async operations.
Now, to your specific questions:
So now let's say 5 requests come in. As I understand, with each
request that comes in, a callback, in this case
findAllStocksFromUser(), is added to the eventloop queue, and with
every tick, the callbacks are called.
This isn't quite correct. The incoming request is queued, but it is queued at a level much lower than just queuing your callback. It's queued before the Javascript part of your server even sees the request (in native code somewhere).
The "terribly inefficient, heavy computation" won't effect the
server's ability to efficiently receive requests as they come in and
immediately add their callbacks to the queue, correct?
The incoming requests will be queued by the underlying TCP infrastructure or by the native code in node.js which implements your server (which is not running in single-threaded JS). So, a long running piece of Javascript won't keep incoming requests from getting queued (unless some internal queue fills up).
But the "terribly inefficient, heavy computation" is going to block
the other callbacks until it's done and cause the server to be
inefficient in that way, right?
Correct. If this inefficient, heavy computation is synchronous code, then it runs until it is done and no other requests get to run while it is running.
The usual solution to heavy computation code in node.js is to either redesign it to run faster or to use async operations where possible or to move it out of the main process and fire up a child process or a cluster of child workers to handle the heavy computation. This then allows your main request handler to treat this heavy computation as an asynchronous operation and allow other things to run while that heavy-duty work is being done outside the main node.js thread.
Though this is sometimes more coding work, it is also possible to break a long running computation into chunks so that a chunk of work can be executed and then use setImmediate() to schedule the next chunk of work, allowing other queued items to be processed between your chunks of work. Since it's fairly quick these days to just set up a pool of workers that you pass off the work to, I'd probably favor that approach as it also gives you a better shot at utilizing multiple CPUs and it saves the complication of "chunking" the work and writing code to efficiently process that way.
Yes it will affect it. Node.js is single-threaded. It means that the "terribly inefficient, heavy calculation" will block everything while being processed.
This is easy to test : send several requests and see their response times. Or just send a really big JSON file (it will have to be parsed, which can be slow), and again measure the response times.
You could break the computations into smaller chunks to improve the efficiency.
Yes, it would cause inefficiencies on the server. The first request to the server would block all other requests from being processed since the javascript event loop runs on a single thread.
All other requests would have to wait because the event loop is blocked by the first findAllStocksFromUser task to reach the server.

does user defined callback function uses thread pool in node.js?

This question is to understand how event loop calls thread pool to process task.
Say,
I want to create a function (say to process small task) not any i/o operation, i want that to process using a callback function, so that it can call thread pool and task can be concurrent with my main thread, and return result in callback after completion. I have understanding that it can be done by creating child processes(forking etc),
but, I am little confused and want to understand how exactly is process executes concurrently in single threaded node in i/o operation and not in user defined operation. What exactly happens in event loop, will all event be passed to thread pool or how it identifies if it is I/O operation??
I am new at node.js and totally confused.
Help would be appreciated :)
“Node.js manages its own threads for I/O” by using libuv for operations involving the network, file system, etc. libuv essentially creates a thread pool for I/O that varies in size based on platform. The V8 event loop is a separate thread that processes events in the queue. Those events map to a JavaScript function to execute with the event data. This is how asynchronous I/O is handled by Node.js.
Source: http://www.wintellect.com/blogs/dbanister/stop-fighting-node.js-in-the-enterprise
So each I/O operation executes outside V8 event loop thread, that's why it runs concurrently.
I/O operations run efficiently because, as you mentioned, a thread pool is used - a group of threads that "wait" for incoming tasks from V8 event loop, execute them, and return data to JavaScript callback functions.
As you already stated Node runtime is single threaded.
Node is well suited for IO bound operation. It's less recommended for CPU bound because it will block Node's event loop.
If you really want to do CPU bound work async with node you can achieve that using a nodes cluster but you'll have to manage the communication between them. (A simple example here - http://davidherron.com/blog/2014-07-03/easily-offload-your-cpu-intensive-nodejs-code-simple-express-based-rest-server) or using chiled process - http://nodejs.org/api/child_process.html

Resources