Clustering Loop Confusion - node.js

My basic setup I have using the cluster module is: (I have 6 cores)
var cluster = require('cluster');
if (cluster.isMaster) {
var numCPUs = require('os').cpus().length;
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
}else{
//Code here
console.time("Time: ");
var obj = {'abcdef' : 1, 'qqq' : 13, '19' : [1, 2, 3, 4]};
for(var i = 0; i < 500000; i++) {
JSON.parse(JSON.stringify(obj));
}
console.timeEnd("Time: ");
}
If I were to run that test.
It will output:
But... if I run that same exact test inside the cluster.isMaster block, it will output:
1) Why is my code being executed multiple times instead of once?
2) Since I have 6 cpu cores helping me run that test, shouldn't it run that code only once but perform the operation faster?

You're forking os.cpus().length separate processes. So if os.cpus().length === 6, then you should see 6 separate outputs (which is the case from the output you've posted).
No, that's not how it works. Each process would be scheduled on a separate core. It's not about running it faster, but being able to do more processing in parallel.

Related

Time first call function in node

I have the following code:
let startTime;
let stopTime;
const arr = [1, 2, 3, 8, 5, 0, 110, 4, 4, 16, 3, 8, 7, 56, 1, 2, 3, 8, 5, 0, 110, 16, 3, 8, 7, 56];
const sum = 63;
durationTime = (start, stop, desc) => {
let duration = (stop - start);
console.info('Duration' + ((desc !== undefined) ? '(' + desc + ')' : '') + ': ' + duration + 'ms');
};
findPair = (arr, sum) => {
let result = [];
const filterArr = arr.filter((number) => {
return number <= sum;
});
filterArr.forEach((valueFirst, index) => {
for (let i = index + 1; i < filterArr.length; i++) {
const valueSecond = filterArr[i];
const checkSum = valueFirst + valueSecond;
if (sum === checkSum) {
result.push([valueFirst, valueSecond]);
}
}
});
//console.info(result);
};
for (let i = 0; i < 5; i++) {
startTime = new Date();
findPair(arr, sum);
stopTime = new Date();
durationTime(startTime, stopTime);
}
When I run locally on the nodejs (v8.9.3), the result in the console:
Duration(0): 4ms
Duration(1): 0ms
Duration(2): 0ms
Duration(3): 0ms
Duration(4): 0ms
My Question: Why does the first call of 'findPair' take 4ms and other calls only 0ms?
When the loop runs the first time the JavaScript engine (Google's V8) interprets the code, compiles it and executes. However, code that runs more often will have it's compiled code optimized and cached so that subsequent runs of that code can run faster. Code inside loops would be a good example of such code that runs often.
Unless you fiddle with prototypes and things that could make that cached code invalid, it'll keep running that cached code which is a lot faster than interpreting the code every time it runs.
There are a lot of smart things V8 does to make your code run faster, if you are interested in this stuff I'd highly recommend reading the sources for my answer:
Dynamic Memory and V8 with JavaScript
How JavaScript works: inside the V8 engine + 5 tips on how to write optimized code
Before beginning, better time measurement is:
for (let i = 0; i < 5; i++) {
console.time('timer '+i);
findPair(arr, sum);
console.timeEnd('timer ' + i);
}
...
The first function call is slower, probably because V8 dynamically create hidden classes behind the scenes, and (initially) puts function into it.
More information on https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access

node.js: multithread, using more than one core

I am trying to divide a task in node.js onto several cores (using a i5 I have 4 cores available). So far every explanation I found was to cryptic for me (especially the ones talking about servers, which I have no idea of). Can someone show me on the simple example below how I can split the task onto several cores?
Example:
I just want to split the task, so that each core runs one of the loops. How do I do that?
var fs = require('fs');
var greater = fs.createWriteStream('greater.txt');
var smaller = fs.createWriteStream('smaller.txt');
for (var i=0; i<10000; i++){
var input = Math.random()*100;
if (input > 50){
greater.write(input + '\r\n');
}
}
for (var i=0; i<10000; i++){
var input = Math.random()*100;
if (input < 50){
smaller.write(input + '\r\n');
}
}
greater.end();
smaller.end();

Node.js Performance using Cluster

I have been trying to figure this out for a while. I wrote a very simple http server in node to benchmark the effect of using cluster. Here is my code:
var cluster = require('cluster');
var http = require('http');
var numCPUs = 0; //require('os').cpus().length;
if(process.argv.length >= 3)
{
numCPUs = process.argv[2];
}
if (cluster.isMaster && numCPUs > 0) {
console.log("launching " + numCPUs + " procs");
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
console.log("launching proc #" + i);
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(3000);
}
The problem is that I am not seeing any performance gain at all. 1 process has better performance most of the time. And, If I add more work, like retrieving data from redis or mongo then increasing the processes helps, but only modestly (around 15%). I've tried this on an i7 MBPr (quad-core with HT), and an i5 (quad-core) Win7 systems both with the same results.
Can someone please explain whats wrong with this code? Or, why am I not seeing an advantage/benefit in using cluster?
Your test appears to be almost purely I/O-oriented and in that situation using cluster provides little benefit (as you've seen) because I/O is concurrent regardless.
To see a significant benefit you'd need to have portions of your code that are CPU-bound because it's only then that you can provide additional parallelism between your cluster workers.

Node.js chronological issue

I have a problem with node.js. The commands of the program doesn't load cronologically and i don't know how to do it.
I'm trying to download some images and text from database and send it with packs of 8. But node.js runs for loop and command after loop at the same time.
Here's my code:
socket.on('background_dinamically', function(data){
connection.query("SELECT * FROM products WHERE id='"+data.cathegory+"'" , function(err, rows, fields){
var count = 0;
var array_elements = [];
if(err){
socket.emit('errorserver');
}else{
for (var i = rows.length - 1, count; i >= 0; i-- & count ++) {
array_elements.push(rows[i]);
if (count == 8) {
socket.emit('image_loading_background', [array_elements, data]);
count = 0;
array_elements = [];
}
};
if(count > 0 && count < 8 && count != 0) {
socket.emit('image_loading_background', [array_elements, data]);
}
}
});
});
Marc, first I would check if synchronisation can be done on the client side. If you force your nodejs app to synchronize before sending data to the client, scalability suffers.
If you cannot do without synchronizing on the server side, you can choose between spaghetti code or a sync lib.
Welcome to the world of asynchronous (not chronological) programming. By default, node will work on I/O operations in parallel as you are seeing. To get other behaviors including chronological (in serial), parallel batches, as well as error handling helpers, have a look at one of the many flow control libraries available. Specifically, I recommend caolan/async.

Node JS with CouchDB for lots o' parsing

My team and I are playing around with NodeJS (with jsdom/jQuery) and parsing a lot of HTML documents stored in CouchDB. NodeJS is single threaded so having 8 cores in a serve does not help us at all initially, this is where I was wondering how to best create child processes (workers perhaps?) to process the individual file as it's pulled out from CouchDB?
Here is my thought process:
Main NodeJS script loops through CouchDB view getting the HTML files from documents every X minutes
Spawn a process to parse (jsdom/jQuery) and store the results from each HTML file
We aren't running a webserver at all to handle any of this (all command line) so I am unsure of how to handle this outside of a generic "set up CRON to just run each parsing job seperately". It seems that workers are generally used to process requests coming in from a webserver.
Thoughts?
Use the cluster
var cluster = require("cluster");
var numCPUs = require('os').cpus().length;
var htmlDocs = [...];
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
for (var i = process.env.NODE_WORKER_ID; i < htmlDocs.length; i+=numCPUs) {
couch.doWork(htmlDocs[i]);
}
}
This is a classic case of doing work on members in an array and then splitting that work out over multiple processes by having each process do a subset of the array.
Note how we increment i by number of processes. This means worker 1 does 1st, 5th, 9th, etc, worker 2 does 2nd, 6th, 10th, etc.

Resources