I'm trying to build nodejs application which will take advantage of multicore machines ( a.k.a. clustering ) and I got a question about sessions. My code looks like this:
var cluster = exports.cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died. Trying to respawn...');
cluster.fork();
});
} else {
//spawn express etc
}
My question is: Everytime a single user hits random node instance or for example the first time he opens the page and hits node N4 and till his session expires, he hit node N4 on every request? For those who didn't understand my question, I will try to explain what I'm worried about:
A user enters my page, he login on node N3, then I set req.session.userdata to a random data, he refreshes the page and he hit Node N4, will I be able to access req.session.userdata from different Node? That mean there is a chance for the user to get randomly logged out or I'm just not understanding how clustering with express works?
You're correct that the in memory session store in Connect/Express is unsuitable for supporting more than one instance. The solution is to implement a session store with a backing database. My recommendation is connect-redis, and example code is at Session Undefined - Using Connect-Redis / ExpressJS / Node
But there are dozens of options.
Related
I am trying to run a Node.js cluster within my Express app, but only for one specific function.
My app is a standard Express app generated with the express app generator.
My app initially scrapes an eCommerce website to get a list of categories in an array. I want to be able to then scrape each category's products, concurrently, using child processes.
I do not want to have the whole Express app inside the child processes. When the app starts up I want only one process to scrape for the initial categories. Once that is done I only want the function that scrapes the products to be run concurrently in the cluster.
I have tried the following:
delegation-controller.js
var {em} = require('./entry-controller');
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;
class DelegationController {
links = [];
constructor() {
em.on('PageLinks', links => {
this.links = links;
this.startCategoryCrawl();
});
}
startCategoryCrawl() {
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
console.log(`Worker ${process.pid} started`);
process.exit();
}
}
}
module.exports = DelegationController;
But then I got an error:
/ecommerce-scraper/bin/www:58
throw error;
^
Error: bind EADDRINUSE null:3000
Which I am guessing is because it is trying to start the express server again, but it is in use.
Am I able to do what I am trying to do, or am I misunderstanding how Node.js clusters work?
I believe this is not the case where you make use of cluster module. Instead you need the child_process module. This module lets you create a separate process. Here is the documentation.
I typically create my own Worker bootstrap that sits on top of my application. For things that need to run once, I have a convenient runonce function that is given a name and callback. The function checks the primary process for an open (non-busy process) and sends back the PID. If the PID matches (because all processes will claim ownership) the callback executes. If not, the function returns.
Example:
https://gist.github.com/jonshipman/abe627c687a46e7f5ea4b36bb919666c
NodeJS clustering creates identical copies of your application (through the cluster.fork(). It's up to your application to ensure that multiple actions aren't run twice (when they aren't expected to).
I believe, when using Express or https.createServer, it's setup in a way so that it doesn't listen to the same port multiple times. Instead each the prime process will distribute the load internally.
I'm running a clustered node app, with 8 worker processes. I'm giving output when serving requests, and the output includes the ID of the process which handled the request:
app.get('/some-url', function(req, res) {
console.log('Request being handled by process #' + process.pid);
res.status(200).text('yayyy');
});
When I furiously refresh /some-url, I see in the output that the same process is handling the request every time.
I used node load-test to query my app. Again, even with 8 workers available, only one of them handles every single request. This is obviously undesirable as I wish to load-test the clustered app to see the overall performance of all processes working together.
Here's how I'm initializing the app:
var cluster = require('cluster');
if (cluster.isMaster) {
for (var i = 0; i < 8; i++) cluster.fork();
} else {
var app = require('express')();
// ... do all setup on `app`...
var server = require('http').createServer(app);
server.listen(8000);
}
How do I get all my workers working?
Your request does not use any ressources. I suspect that the same worker is always called, because it just finishes to handle the request before the next one comes in.
What happens if you do some calculation inside that takes more time than the time needed to handle a request ? As it stands, the worker is never busy between accepting a request and answering it.
we're diving deeper in Node.js architecture, to achieve fully understanding, how to scale our application.
Clear solution is cluster usage https://nodejs.org/api/cluster.html. Everything seems to be fine, apart of workers management description:
Node.js does not automatically manage the number of workers for you, however. It is your responsibility to manage the worker pool for your application's needs.
I was searching, how to really manage the workers, but most solutions, says:
Start so many workers as you've got cores.
But I would like to dynamically scale up or down my workers count, depending on current load on server. So if there is load on server and queue is getting longer, I would like to start next worker. In another way, when there isn't so much load, I would like to shut down workers (and leave f.e. minimum 2 of them).
The ideal place, will be for me Master Process queue, and event when new Request is coming to Master Process. On this place we can decide if we need next worker.
Do you have any solution or experience with managing workers from Master Thread in Cluster? Starting and killing them dynamically?
Regards,
Radek
following code will help you to understand to create cluster on request basis.
this program will genrate new cluster in every 10 request.
Note: you need to open http://localhost:8000/ and refresh the page for increasing request.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
var numReqs = 0;
var initialRequest = 10;
var maxcluster = 10;
var totalcluster = 2;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < 2; i++) {
var worker = cluster.fork();
console.log('cluster master');
worker.on('message', function(msg) {
if (msg.cmd && msg.cmd == 'notifyRequest') {
numReqs++;
}
});
}
setInterval(function() {
console.log("numReqs =", numReqs);
isNeedWorker(numReqs) && cluster.fork();
}, 1000);
} else {
console.log('cluster one initilize');
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
// Send message to master process
process.send({ cmd: 'notifyRequest' });
}).listen(8000);
}
function isNeedWorker(numReqs) {
if( numReqs >= initialRequest && totalcluster < numCPUs ) {
initialRequest = initialRequest + 10;
totalcluster = totalcluster + 1;
return true;
} else {
return false;
}
}
To manually manage your workers, you need a messaging layer to facilitate inter process communication. With IPC master and worker can communicate effectively, by default and architecture stand point this behavior is already implemented in the process module native. However i find the native implementation not flexible or robust enough to handle horizontal scaling due to network requests.
One obvious solution Redis as a message broker to facilitate this method of master and slave communication. However this solution also as its faults , which is context latency, directly linked to command and reply.
Further research led me to RabbitMQ,great fit for distributing time-consuming tasks among multiple workers.The main idea behind Work Queues (aka: Task Queues) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.
To implement a robust server , read this link , it may give some insights. Link
So I am new to cluster and mongodb and i came across this bunch of code.
#!/usr/bin/env node
var cluster = require('cluster');
var os = require('os');
var app = require('../main')
var models = require("../models");
if(cluster.isMaster){
var mCoreCount = os.cpus().length;
console.log("Cores : ", mCoreCount);
for (var i = 0; i < mCoreCount; i++) {
cluster.fork();
}
cluster.on('exit', function(){
cluster.fork();
});
}else{
models.sequelize.sync().then(function(){
app.listen(app.get('port'), function(){
console.log('api is live.' + app.get('port'));
});
});
}
So when I console I get cores as 4, I tried reading but I could not understand anything , If someone could point me whats going on here It will be a great help.
I understood that the greater the number of cores the node instances will increase , but I guess right now its picking up from my system, what happens in production ?.
This script is trying to get the more efficient way to launch the NodeJS app by creating a fork for each available core on the hardware server.
It picks up the number of core with os.cpus().length
In production, it will append the same process, and the number of fork will depend of the number of available core production server.
Are you really sure the database is MongoDB in both environement ? We can't really tell without seeing the whole app code.
My team and I are playing around with NodeJS (with jsdom/jQuery) and parsing a lot of HTML documents stored in CouchDB. NodeJS is single threaded so having 8 cores in a serve does not help us at all initially, this is where I was wondering how to best create child processes (workers perhaps?) to process the individual file as it's pulled out from CouchDB?
Here is my thought process:
Main NodeJS script loops through CouchDB view getting the HTML files from documents every X minutes
Spawn a process to parse (jsdom/jQuery) and store the results from each HTML file
We aren't running a webserver at all to handle any of this (all command line) so I am unsure of how to handle this outside of a generic "set up CRON to just run each parsing job seperately". It seems that workers are generally used to process requests coming in from a webserver.
Thoughts?
Use the cluster
var cluster = require("cluster");
var numCPUs = require('os').cpus().length;
var htmlDocs = [...];
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('death', function(worker) {
console.log('worker ' + worker.pid + ' died');
});
} else {
for (var i = process.env.NODE_WORKER_ID; i < htmlDocs.length; i+=numCPUs) {
couch.doWork(htmlDocs[i]);
}
}
This is a classic case of doing work on members in an array and then splitting that work out over multiple processes by having each process do a subset of the array.
Note how we increment i by number of processes. This means worker 1 does 1st, 5th, 9th, etc, worker 2 does 2nd, 6th, 10th, etc.