Node.js async parallel - what consequences are? - node.js

There is code,
async.series(tasks, function (err) {
return callback ({message: 'tasks execution error', error: err});
});
where, tasks is array of functions, each of it peforms HTTP request (using request module) and calling MongoDB API to store the data (to MongoHQ instance).
With my current input, (~200 task to execute), it takes
[normal mode] collection cycle: 1356.843 sec. (22.61405 mins.)
But simply trying change from series to parallel, it gives magnificent benefit. The almost same amount of tasks run in ~30 secs instead of ~23 mins.
But, knowing that nothing is for free, I'm trying to understand what the consequences of that change? Can I tell that number of open sockets will be much higher, more memory consumption, more hit to DB servers?
Machine that I run the code is only 1GB of RAM Ubuntu, so I so that app hangs there one time, can it be caused by lacking of resources?

Your intuition is correct that the parallelism doesn't come for free, but you certainly may be able to pay for it.
Using a load testing module (or collection of modules) like nodeload, you can quantify how this parallel operation is affecting your server to determine if it is acceptable.
Async.parallelLimit can be a good way of limiting server load if you need to, but first it is important to discover if limiting is necessary. Testing explicitly is the best way to discover the limits of your system (eachLimit has a different signature, but could be used as well).
Beyond this, common pitfalls using async.parallel include wanting more complicated control flow than that function offers (which, from your description doesn't seem to apply) and using parallel on too large of a collection naively (which, say, may cause you to bump into your system's file descriptor limit if you are writing many files). With your ~200 request and save operations on 1GB RAM, I would imagine you would be fine as long as you aren't doing much massaging in the event handlers, but if you are experiencing server hangs, parallelLimit could be a good way out.
Again, testing is the best way to figure these things out.

I would point out that async.parallel executes multiple functions concurrently not (completely) parallely. It is more like virtual parallelism.
Executing concurrently is like running different programs on a single CPU core, via multitasking/scheduling. True parallel execution would be running different program on each core of multi-core CPU. This is important as node.js has single-threaded architecture.
The best thing about node is that you don't have to worry about I/O. It handles I/O very efficiently.
In your case you are storing data to MongoDB, is mostly I/O. So running them parallely will use up your network bandwidth and if reading/writing from disk then disk bandwidth too. Your server will not hang because of CPU overload.
The consequence of this would be that if you overburden your server, your requests may fail. You may get EMFILE error (Too many open files). Each socket counts as a file. Usually connections are pooled, meaning to establish connection a socket is picked from the pool and when finished return to the pool. You can increase the file descriptor with ulimit -n xxxx.
You may also get socket errors when overburdened like ECONNRESET(Error: socket hang up), ECONNREFUSED or ETIMEDOUT. So handle them with properly. Also check the maximum number of simultaneous connections for mongoDB server too.
Finally the server can hangup because of garbage collection. Garbage collection kicks in after your memory increases to a certain point, then runs periodically after some time. The max heap memory V8 can have is around 1.5 GB, so expect GC to run frequently if its memory is high. Node will crash with process out of memory if asking for more, than that limit. So fix the memory leaks in your program. You can look at these tools.

The main downside you'll see here is a spike in database server load. That may or may not be okay depending on your setup.
If your database server is a shared resource then you will probably want to limit the parallel requests by using async.eachLimit instead.

you'll realize the difference if multiple users connect:
in this case the processor can handle multiple operations
asynch tries to run several operations of multiple users relative equal
T = task
U = user
(T1.U1 = task 1 of user 1)
T1.U1 => T1.U2 => T2.U1 => T8.U3 => T2.U2 => etc
this is the oposite of atomicy (so maybe watch for atomicy on special db operations - but thats another topic)
so maybe it is faster to use:
T2.U1 before T1.U1
- this is no problem until
T2.U1 is based on T1.U1
- this is preventable by using callbacks/ or therefore are callbacks
...hope this is what you wanted to know... its a bit late here

Related

Comparison of Nodejs EventLoop (with cluster module) and Golang Scheduler

In nodejs the main critics are based on its single threaded event loop model.
The biggest disadvantage of nodejs is that one can not perform CPU intensive tasks in the application. For demonstration purpose, lets take the example of a while loop (which is perhaps analogous to a db function returning hundred thousand of records and then processing those records in nodejs.)
while(1){
x++
}
Such sort of the code will block the main stack and consequently all other tasks waiting in the Event Queue will never get the chance to be executed. (and in a web Applications, new users will not be able to connect to the App).
However, one could possibly use module like cluster to leverage the multi core system and partially solve the above issue. The Cluster module allows one to create a small network of separate processes which can share server ports, which gives the Node.js application access to the full power of the server. (However, one of the biggest disadvantage of using Cluster is that the state cannot be maintained in the application code).
But again there is a high possibility that we would end up in the same situation (as described above) again if there is too much server load.
When I started learning the Go language and had a look at its architecture and goroutines, I thought it would possibly solve the problem that arises due to the single threaded event loop model of nodejs. And that it would probably avoid the above scenario of CPU intensive tasks, until I came across this interesting code, which blocks all of the GO application and nothing happens, much like a while loop in nodejs.
func main() {
var x int
threads := runtime.GOMAXPROCS(0)
for i := 0; i < threads; i++ {
go func() {
for { x++ }
}()
}
time.Sleep(time.Second)
fmt.Println("x =", x)
}
//or perhaps even if we use some number that is just greater than the threads.
So, the question is, if I have an application which is load intensive and there would be lot of CPU intensive tasks as well, I could probably get stuck in the above sort of scenario. (where db returns numerous amount of rows and then the application need to process and modify some thing in those rows). Would not the incoming users would be blocked and so would all other tasks as well?
So, how could the above problem be solved?
P.S
Or perhaps, the use cases I have mentioned does not make much of the sense? :)
Currently (Go 1.11 and earlier versions) your so-called
tight loop will indeed clog the code.
This would happen simply because currently the Go compiler
inserts code which does "preemption checks" («should I yield
to the scheduler so it runs another goroutine?») only in
prologues of the functions it compiles (almost, but let's not digress).
If your loop does not call any function, no preemption checks
will be made.
The Go developers are well aware of this
and are working on eventually alleviating this issue.
Still, note that your alleged problem is a non-issue in
most real-world scenarious: the code which performs long
runs of CPU-intensive work without calling any function
is rare and far in between.
In the cases, where you really have such code and you have
detected it really makes other goroutines starve
(let me underline: you have detected that through profiling—as
opposed to just conjuring up "it must be slow"), you may
apply several techniques to deal with this:
Insert calls to runtime.Gosched() in certain key points
of your long-running CPU-intensive code.
This will forcibly relinquish control to another goroutine
while not actually suspending the caller goroutine (so it will
run as soon as it will have been scheduled again).
Dedicate OS threads for the goroutines running
those CPU hogs:
Bound the set of such CPU hogs to, say, N "worker goroutines";
Put a dispatcher in front of them (this is called "fan-out");
Make sure that N is sensibly smaller than runtime.GOMAXPROCS
or raise the latter so that you have those N extra threads.
Shovel units of work to those dedicated goroutines via the dispatcher.

Perl threads to execute a sybase stored proc parallel

I have written a sybase stored procedure to move data from certain tables[~50] on primary db for given id to archive db. Since it's taking a very long time to archive, I am thinking to execute the same stored procedure in parallel with unique input id for each call.
I manually ran the stored proc twice at same time with different input and it seems to work. Now I want to use Perl threads[maximum 4 threads] and each thread execute the same procedure with different input.
Please advise if this is recommended way or any other efficient way to achieve this. If the experts choice is threads, any pointers or examples would be helpful.
What you do in Perl does not really matter here: what matters is what happens on the side of the Sybase server. Assuming each client task creates its own connection to the database, then it's all fine and how the client achieved this makes no diff for the Sybase server. But do not use a model where the different client tasks will try to use the same client-server connection as that will never happen in parallel.
No 'answer' per se, but some questions/comments:
Can you quantify taking a very long time to archive? Assuming your archive process consists of a mix of insert/select and delete operations, do query plans and MDA data show fast, efficient operations? If you're seeing table scans, sort merges, deferred inserts/deletes, etc ... then it may be worth the effort to address said performance issues.
Can you expand on the comment that running two stored proc invocations at the same time seems to work? Again, any sign of performance issues for the individual proc calls? Any sign of contention (eg, blocking) between the two proc calls? If the archival proc isn't designed properly for parallel/concurrent operations (eg, eliminate blocking), then you may not be gaining much by running multiple procs in parallel.
How many engines does your dataserver have, and are you planning on running your archive process during a period of moderate-to-heavy user activity? If the current archive process runs at/near 100% cpu utilization on a single dataserver engine, then spawning 4 copies of the same process could see your archive process tying up 4 dataserver engines with heavy cpu utilization ... and if your dataserver doesn't have many engines ... combined with moderate-to-heavy user activity at the same time ... you could end up invoking the wrath of your DBA(s) and users. Net result is that you may need to make sure your archive process hog the dataserver.
One other item to consider, and this may require input from the DBAs ... if you're replicating out of either database (source or archive), increasing the volume of transactions per a given time period could have a negative effect on replication throughput (ie, an increase in replication latency); if replication latency needs to be kept at a minimum, then you may want to rethink your entire archive process from the point of view of spreading out transactional activity enough so as to not have an effect on replication latency (eg, single-threaded archive process that does a few insert/select/delete operations, sleeps a bit, then does another batch, then sleeps, ...).
It's been my experience that archive processes are not considered high-priority operations (assuming they're run on a regular basis, and before the source db fills up); this in turn means the archive process is usually designed so that it's efficient while at the same time putting a (relatively) light load on the dataserver (think: running as a trickle in the background) ... ymmv ...

ArangoDB Java Batch mode insert performance

I'm using ArangoDb 3.0.5 with arangodb-java-driver 3.0.1. ArangoDB is running on a 3.5ghz i7 with 24gb ram and an ssd.
Loading some simple Vertex data from Apache Flink seems to be going very slowly, at roughly 1000 vertices/sec. Task Manager shows it is CPU bound on the ArangoDB process.
My connector is calling startBatchMode, iterating through 500 calls to graphCreateVertex (with wait for sync set to false), and then calling executeBatch.
System resources in the management interface shows roughly 15000 (per sec?) while the load is running, and used CPU time fixed to 1 for user time. I'm new to ArangoDB and am not sure how to profile what is going on. Any help much appreciated!
Rob
Your performance result is the expected behavior. The point with batchMode is, that all of you 500 calls are send in one and executed on the server in only one thread.
To gain better performance, you can use more than one thread in your client for creating your vertices. More requests in parallel will allow the server to use more than one thread.
Also you can use createDocument instead of graphCreateVertex. This avoids consistency checks on the graph, but is a lot faster.
If you don't need these checks, you can also use importDocuments instead of batchMode + createDocument which is even faster.

How to tell how many requests are waiting to be served?

At times my app does need to do some blocking operations (simple math calculations). As a result other requests will be blocked till the calculation is completed. Is there a way to count the number of requests that are waiting ?
EDIT: Adding more details - The blocking operations are very small (perhaps just simple addition of numbers, or loops of additions/ multiplication Eg: var foo = 3 + y;), but the point is that anything that is not IO, is probably a blocking operation. When there aren't many users accessing the site, this may not be noticeable, because the calculation is very quick. But as more users access the site, the experience gets much worse.
EDIT: If this feature isn't there, I think it would be useful. Because then we would know when to start up a new server (e.g.: an AWS server or other cloud provider). Running at full capacity is ok if we can meet all the requests, but not ok if requests start to pile up. XD
You can not count the number of waiting requests, because they are really just queued instructions, and by time your code to "count" ran, the queue would be empty again. JavaScript is single threaded, so unless you're running async, you'll never even get swapped-in in time.
If you want to manage this, implement your own queueing mechanism, and manage the work yourself. The async package can provide this for you.

Returning LOTS of items from a MongoDB via Node.js

I'm returning A LOT (500k+) documents from a MongoDB collection in Node.js. It's not for display on a website, but rather for data some number crunching. If I grab ALL of those documents, the system freezes. Is there a better way to grab it all?
I'm thinking pagination might work?
Edit: This is already outside the main node.js server event loop, so "the system freezes" does not mean "incoming requests are not being processed"
After learning more about your situation, I have some ideas:
Do as much as you can in a Map/Reduce function in Mongo - perhaps if you throw less data at Node that might be the solution.
Perhaps this much data is eating all your memory on your system. Your "freeze" could be V8 stopping the system to do a garbage collection (see this SO question). You could Use V8 flag --trace-gc to log GCs & prove this hypothesis. (thanks to another SO answer about V8 and Garbage collection
Pagination, like you suggested may help. Perhaps even splitting up your data even further into worker queues (create one worker task with references to records 1-10, another with references to records 11-20, etc). Depending on your calculation
Perhaps pre-processing your data - ie: somehow returning much smaller data for each record. Or not using an ORM for this particular calculation, if you're using one now. Making sure each record has only the data you need in it means less data to transfer and less memory your app needs.
I would put your big fetch+process task on a worker queue, background process, or forking mechanism (there are a lot of different options here).
That way you do your calculations outside of your main event loop and keep that free to process other requests. While you should be doing your Mongo lookup in a callback, the calculations themselves may take up time, thus "freezing" node - you're not giving it a break to process other requests.
Since you don't need them all at the same time (that's what I've deduced from you asking about pagination), perhaps it's better to separate those 500k stuff into smaller chunks to be processed at the nextTick?
You could also use something like Kue to queue the chunks and process them later (thus not everything in the same time).

Resources