I'm a node noob and trying to understand how one would implement auto discovery in a node.js application. I'm going to use the cluster module and want each worker process to be kept up to date (and persistently connected to) the elasticache nodes.
Since there is no concept of shared memory (like PHP APC) would you have to have code that runs in each worker, that wakes up every X seconds and somehow updates the list of IP's and re-connects the memcache client?
How do people solve this today? Example code would be much appreciated.
Note that at this time, Auto Discovery is only available for cache clusters running the memcached engine.
For Cache Engine Version 1.4.14 or Higher you need to create a TCP/IP socket to the Cache Cluster Configuration Endpoint (or any Cache Node Endpoint) and send this command:
config get cluster
With Node.js you can use the net.Socket class to to that.
The reply consists of two lines:
The version number of the configuration information. Each time a node is added or removed from the cache cluster, the version number increases by one.
A list of cache nodes. Each node in the list is represented by a hostname|ip-address|port group, and each node is delimited by a space.
A carriage return and a linefeed character (CR + LF) appears at the end of each line.
Here you can find a more thorough description of how to add Auto Discovery to your client library.
Using the cluster module you need to store the same information in each process (i.e. child) and I would use "setInterval" per child to periodically check (e.g. every 60 seconds) the list of nodes and re-connect only if the list has changed (this should not happen very often).
You can optionally update the list on the master only and use "worker.send" to update the workers. This could keep all the processes running in a single server more in sync, but it would not help in a multi server architecture, so it is very important to use consistent hashing in order to be able to change the list of nodes and loose the "minimum" amount of keys stored in the memcached cluster.
I would use a global variable to store this kind of configuration.
Thinking twice you can use the AWS SDK for Node.js to get the list of ElastiCache Nodes (and that works for the Redis engine as well).
In that case the code would be something like:
var util = require('util'),
AWS = require('aws-sdk'),
Memcached = require('memcached');
global.AWS_REGION = 'eu-west-1'; // Just as a sample I'm using the EU West region
global.CACHE_CLUSTER_ID = 'test';
global.CACHE_ENDPOINTS = [];
global.MEMCACHED = null;
function init() {
AWS.config.update({
region: global.AWS_REGION
});
elasticache = new AWS.ElastiCache();
function getElastiCacheEndpoints() {
function sameEndpoints(list1, list2) {
if (list1.length != list2.length)
return false;
return list1.every(
function(e) {
return list2.indexOf(e) > -1;
});
}
function logElastiCacheEndpoints() {
global.CACHE_ENDPOINTS.forEach(
function(e) {
util.log('Memcached Endpoint: ' + e);
});
}
elasticache.describeCacheClusters({
CacheClusterId: global.CACHE_CLUSTER_ID,
ShowCacheNodeInfo: true
},
function(err, data) {
if (!err) {
util.log('Describe Cache Cluster Id:' + global.CACHE_CLUSTER_ID);
if (data.CacheClusters[0].CacheClusterStatus == 'available') {
var endpoints = [];
data.CacheClusters[0].CacheNodes.forEach(
function(n) {
var e = n.Endpoint.Address + ':' + n.Endpoint.Port;
endpoints.push(e);
});
if (!sameEndpoints(endpoints, global.CACHE_ENDPOINTS)) {
util.log('Memached Endpoints changed');
global.CACHE_ENDPOINTS = endpoints;
if (global.MEMCACHED)
global.MEMCACHED.end();
global.MEMCACHED = new Memcached(global.CACHE_ENDPOINTS);
process.nextTick(logElastiCacheEndpoints);
setInterval(getElastiCacheEndpoints, 60000); // From now on, update every 60 seconds
}
} else {
setTimeout(getElastiCacheEndpoints, 10000); // Try again after 10 seconds until 'available'
}
} else {
util.log('Error describing Cache Cluster:' + err);
}
});
}
getElastiCacheEndpoints();
}
init();
Related
I'm using Nodejs cluster module to have multiple workers running.
I created a basic Architecture where there will be a single MASTER process which is basically an express server handling multiple requests and the main task of MASTER would be writing incoming data from requests into a REDIS instance. Other workers(numOfCPUs - 1) will be non-master i.e. they won't be handling any request as they are just the consumers. I have two features namely ABC and DEF. I distributed the non-master workers evenly across features via assigning them type.
For eg: on a 8-core machine:
1 will be MASTER instance handling request via express server
Remaining (8 - 1 = 7) will be distributed evenly. 4 to feature:ABD and 3 to fetaure:DEF.
non-master workers are basically consumers i.e. they read from REDIS in which only MASTER worker can write data.
Here's the code for the same:
if (cluster.isMaster) {
// Fork workers.
for (let i = 0; i < numCPUs - 1; i++) {
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
}
cluster.on('exit', function(worker) {
console.log(`Worker ${worker.process.pid}::type(${worker.type}) died`);
ClusteringUtil.removeWorkerFromList(worker.type);
ClusteringUtil.forkNewClusterWithAutoTypeBalancing();
});
// Start consuming on server-start
ABCConsumer.start();
DEFConsumer.start();
console.log(`Master running with process-id: ${process.pid}`);
} else {
console.log('CLUSTER type', cluster.worker.process.env.type, 'running on', process.pid);
if (
cluster.worker.process.env &&
cluster.worker.process.env.type &&
cluster.worker.process.env.type === ServerTypeEnum.EXPRESS
) {
// worker for handling requests
app.use(express.json());
...
}
{
Everything works fine except consumers reading from REDIS.
Since there are multiple consumers of a particular feature, each one reads the same message and start processing individually, which is what I don't want. If there are 4 consumers, 1 is marked as busy and can not consumer until free, 3 are available. Once the message for that particular feature is written in REDIS by MASTER, the problem is all 3 available consumers of that feature start consuming. This means that the for a single message, the job is done based on number of available consumers.
const stringifedData = JSON.stringify(req.body);
const key = uuidv1();
const asyncHsetRes = await asyncHset(type, key, stringifedData);
if (asyncHsetRes) {
await asyncRpush(FeatureKeyEnum.REDIS.ABC_MESSAGE_QUEUE, key);
res.send({ status: 'success', message: 'Added to processing queue' });
} else {
res.send({ error: 'failure', message: 'Something went wrong in adding to queue' });
}
Consumer simply accepts messages and stop when it is busy
module.exports.startHeartbeat = startHeartbeat = async function(config = {}) {
if (!config || !config.type || !config.listKey) {
return;
}
heartbeatIntervalObj[config.type] = setInterval(async () => {
await asyncLindex(config.listKey, -1).then(async res => {
if (res) {
await getFreeWorkerAndDoJob(res, config);
stopHeartbeat(config);
}
});
}, HEARTBEAT_INTERVAL);
};
Ideally, a message should be read by only one consumer of that particular feature. After consuming, it is marked as busy so it won't consume further until free(I have handled this). Next message could only be processed by only one consumer out of other available consumers.
Please help me in tacking this problem. Again, I want one message to be read by only one free consumer and rest free consumers should wait for new message.
Thanks
I'm not sure I fully get your Redis consumers architecture, but I feel like it contradicts with the use case of Redis itself. What you're trying to achieve is essentially a queue based messaging with an ability to commit a message once its done.
Redis has its own pub/sub feature, but it is built on fire and forget principle. It doesn't distinguish between consumers - it just sends the data to all of them, assuming that its their logic to handle the incoming data.
I recommend to you use Queue Servers like RabbitMQ. You can achieve your goal with some features that AMQP 0-9-1 supports: message acknowledgment, consumer's prefetch count and so on. You can set up your cluster with very agile configs like ok, I want to have X consumers, and each can handle 1 unique (!) message at a time and they will receive new ones only after they let the server (rabbitmq) know that they successfully finished message processing. This is highly configurable and robust.
However, if you want to go serverless with some fully managed service so that you don't provision like virtual machines or anything else to run a message queue server of your choice, you can use AWS SQS. It has pretty much similar API and features list.
Hope it helps!
we're diving deeper in Node.js architecture, to achieve fully understanding, how to scale our application.
Clear solution is cluster usage https://nodejs.org/api/cluster.html. Everything seems to be fine, apart of workers management description:
Node.js does not automatically manage the number of workers for you, however. It is your responsibility to manage the worker pool for your application's needs.
I was searching, how to really manage the workers, but most solutions, says:
Start so many workers as you've got cores.
But I would like to dynamically scale up or down my workers count, depending on current load on server. So if there is load on server and queue is getting longer, I would like to start next worker. In another way, when there isn't so much load, I would like to shut down workers (and leave f.e. minimum 2 of them).
The ideal place, will be for me Master Process queue, and event when new Request is coming to Master Process. On this place we can decide if we need next worker.
Do you have any solution or experience with managing workers from Master Thread in Cluster? Starting and killing them dynamically?
Regards,
Radek
following code will help you to understand to create cluster on request basis.
this program will genrate new cluster in every 10 request.
Note: you need to open http://localhost:8000/ and refresh the page for increasing request.
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
var numReqs = 0;
var initialRequest = 10;
var maxcluster = 10;
var totalcluster = 2;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < 2; i++) {
var worker = cluster.fork();
console.log('cluster master');
worker.on('message', function(msg) {
if (msg.cmd && msg.cmd == 'notifyRequest') {
numReqs++;
}
});
}
setInterval(function() {
console.log("numReqs =", numReqs);
isNeedWorker(numReqs) && cluster.fork();
}, 1000);
} else {
console.log('cluster one initilize');
// Worker processes have a http server.
http.Server(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
// Send message to master process
process.send({ cmd: 'notifyRequest' });
}).listen(8000);
}
function isNeedWorker(numReqs) {
if( numReqs >= initialRequest && totalcluster < numCPUs ) {
initialRequest = initialRequest + 10;
totalcluster = totalcluster + 1;
return true;
} else {
return false;
}
}
To manually manage your workers, you need a messaging layer to facilitate inter process communication. With IPC master and worker can communicate effectively, by default and architecture stand point this behavior is already implemented in the process module native. However i find the native implementation not flexible or robust enough to handle horizontal scaling due to network requests.
One obvious solution Redis as a message broker to facilitate this method of master and slave communication. However this solution also as its faults , which is context latency, directly linked to command and reply.
Further research led me to RabbitMQ,great fit for distributing time-consuming tasks among multiple workers.The main idea behind Work Queues (aka: Task Queues) is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.
To implement a robust server , read this link , it may give some insights. Link
I am running a multi-threaded expressjs server.
I have implemented firebase database and one of the way I use it is to handle a real-time player ranking (top 10 players):
When the server starts, I setup a dbref.on('value', callback) where callback saves the current ranking to memory
When a user submits a score, I check if that score belongs in the top 10, and if it does I update the ranking in memory (Array.push > Array.sort > Array.slice) and push the new ranking to firebase. If the score doesn't belong in the top 10, do nothing.
In theory, each thread would receive the new ranking when one of the other threads updates it. That's what it does client side without issue.
However, only one of the thread actually gets its dbref.on('value', callback) executed, breaking the ranking due to inconsistent data.
Why is it doing this?
How do I fix that?
Here is how I setup the multithreading, using cluster:
var cluster = require('cluster');
cluster.setupMaster({
exec: path.normalize(__dirname+'/main.js') // Implements expressjs
});
// For each CPU core...
for (i=0;i<core_count;i++) {
var worker = cluster.fork();
...
}
Here is how I setup and use firebase:
var firebase = require('firebase');
firebase.initializeApp({
serviceAccount: path.normalize('firebase.json'),
databaseURL: 'https://'+firebase_subdomain+'.firebaseio.com'
});
var localRanking = []; // The local in-memory copy of the ranking
var db = firebase.database();
db.ref("path/to/my/ranking").on("value", function(data) {
// This code block only gets executed on one of the thread, not the others.
// If I have 8 threads, 7 threads won't get the update
var value = data.val();
if (value) {
localRanking = value;
}
});
I'm trying to create a new node for Node-Red. Basically it is a udp listening socket that shall be established via a config node and which shall pass all incoming messages to dedicated nodes for processing.
This is the basic what I have:
function udpServer(n) {
RED.nodes.createNode(this, n);
this.addr = n.host;
this.port = n.port;
var node = this;
var socket = dgram.createSocket('udp4');
socket.on('listening', function () {
var address = socket.address();
logInfo('UDP Server listening on ' + address.address + ":" + address.port);
});
socket.on('message', function (message, remote) {
var bb = new ByteBuffer.fromBinary(message,1,0);
var CoEdata = decodeCoE(bb);
if (CoEdata.type == 'digital') { //handle digital output
// pass to digital handling node
}
else if (CoEdata.type == 'analogue'){ //handle analogue output
// pass to analogue handling node
}
});
socket.on("error", function (err) {
logError("Socket error: " + err);
socket.close();
});
socket.bind({
address: node.addr,
port: node.port,
exclusive: true
});
node.on("close", function(done) {
socket.close();
});
}
RED.nodes.registerType("myServernode", udpServer);
For the processing node:
function ProcessAnalog(n) {
RED.nodes.createNode(this, n);
var node = this;
this.serverConfig = RED.nodes.getNode(this.server);
this.channel = n.channel;
// how do I get the server's message here?
}
RED.nodes.registerType("process-analogue-in", ProcessAnalog);
I can't figure out how to pass the messages that the socket receives to a variable number of processing nodes, i.e. multiple processing nodes shall share on server instance.
==== EDIT for more clarity =====
I want to develop a new set of nodes:
One Server Node:
Uses a config-node to create an UDP listening socket
Managing the socket connection (close events, error etc)
Receives data packages with one to many channels of different data
One to many processing nodes
The processing nodes shall share the same connection that the Server Node has established
The processing nodes shall handle the messages that the server is emitting
Possibly the Node-Red flow would use as many processing Nodes as there are channels in the server's data package
To quote the Node-Red documentation on config-nodes:
A common use of config nodes is to represent a shared connection to a
remote system. In that instance, the config node may also be
responsible for creating the connection and making it available to the
nodes that use the config node. In such cases, the config node should
also handle the close event to disconnect when the node is stopped.
As far as I understood this, I make the connection available via this.serverConfig = RED.nodes.getNode(this.server); but I cannot figure out how to pass data, which is received by this connection, to the node that is using this connection.
A node has no knowledge of what nodes it is connected to downstream.
The best you can do from the first node is to have 2 outputs and to send digital to one and analogue to the other.
You would do this by passing an array to the node.send() function.
E.g.
//this sends output to just the first output
node.sent([msg,null]);
//this sends output to just the second output
node.send([null,msg]);
Nodes that have receive messagess need to add a listener for input
e.g.
node.on('input', function(msg) {
...
});
All of this is well documented on the Node-RED page
The other option is if the udpServer node is a config node then you need to implement your own listeners, best bet is to look something like the MQTT nodes in core for examples of pooling connections
I'm was trying to think of a way to help minimize the damage on my node.js application if I ever get a DDOS attack. I want to limit requests per IP. I want to limit every IP address to so many requests per second. For example: No IP address can exceed 10 requests every 3 seconds.
So far I have come up with this:
http.createServer(req, res, function() {
if(req.connection.remoteAddress ?????? ) {
block ip for 15 mins
}
}
If you want to build this yourself at the app server level, you will have to build a data structure that records each recent access from a particular IP address so that when a new request arrives, you can look back through the history and see if it has been doing too many requests. If so, deny it any further data. And, to keep this data from piling up in your server, you'd also need some sort of cleanup code that gets rid of old access data.
Here's an idea for a way to do that (untested code to illustrate the idea):
function AccessLogger(n, t, blockTime) {
this.qty = n;
this.time = t;
this.blockTime = blockTime;
this.requests = {};
// schedule cleanup on a regular interval (every 30 minutes)
this.interval = setInterval(this.age.bind(this), 30 * 60 * 1000);
}
AccessLogger.prototype = {
check: function(ip) {
var info, accessTimes, now, limit, cnt;
// add this access
this.add(ip);
// should always be an info here because we just added it
info = this.requests[ip];
accessTimes = info.accessTimes;
// calc time limits
now = Date.now();
limit = now - this.time;
// short circuit if already blocking this ip
if (info.blockUntil >= now) {
return false;
}
// short circuit an access that has not even had max qty accesses yet
if (accessTimes.length < this.qty) {
return true;
}
cnt = 0;
for (var i = accessTimes.length - 1; i >= 0; i--) {
if (accessTimes[i] > limit) {
++cnt;
} else {
// assumes cnts are in time order so no need to look any more
break;
}
}
if (cnt > this.qty) {
// block from now until now + this.blockTime
info.blockUntil = now + this.blockTime;
return false;
} else {
return true;
}
},
add: function(ip) {
var info = this.requests[ip];
if (!info) {
info = {accessTimes: [], blockUntil: 0};
this.requests[ip] = info;
}
// push this access time into the access array for this IP
info.accessTimes.push[Date.now()];
},
age: function() {
// clean up any accesses that have not been here within this.time and are not currently blocked
var ip, info, accessTimes, now = Date.now(), limit = now - this.time, index;
for (ip in this.requests) {
if (this.requests.hasOwnProperty(ip)) {
info = this.requests[ip];
accessTimes = info.accessTimes;
// if not currently blocking this one
if (info.blockUntil < now) {
// if newest access is older than time limit, then nuke the whole item
if (!accessTimes.length || accessTimes[accessTimes.length - 1] < limit) {
delete this.requests[ip];
} else {
// in case an ip is regularly visiting so its recent access is never old
// we must age out older access times to keep them from
// accumulating forever
if (accessTimes.length > (this.qty * 2) && accessTimes[0] < limit) {
index = 0;
for (var i = 1; i < accessTimes.length; i++) {
if (accessTimes[i] < limit) {
index = i;
} else {
break;
}
}
// remove index + 1 old access times from the front of the array
accessTimes.splice(0, index + 1);
}
}
}
}
}
}
};
var accesses = new AccessLogger(10, 3000, 15000);
// put this as one of the first middleware so it acts
// before other middleware spends time processing the request
app.use(function(req, res, next) {
if (!accesses.check(req.connection.remoteAddress)) {
// cancel the request here
res.end("No data for you!");
} else {
next();
}
});
This method also has the usual limitations around IP address monitoring. If multiple users are sharing an IP address behind NAT, this will treat them all as one single user and they may get blocked due to their combined activity, not because of the activity of one single user.
But, as others have said, by the time the request gets this far into your server, some of the DOS damage has already been done (it's already taking cycles from your server). It might help to cut off the request before doing more expensive operations such as database operations, but it is even better to detect and block this at a higher level (such as Nginx or a firewall or load balancer).
I don't think that is something that should be done at the http server level. Basically, it doesn't prevent users to reach your server, even if they won't see anything for 15 minutes.
In my opinion, you should handle that within your system, using a firewall. Although it's more a discussion for ServerFault or SuperUser, let me give you a few pointers.
Use iptables to setup a firewall on your entry point (your server or whatever else you have access to up the line). iptables allows you to set a limit of max connections per IP. The learning curve is pretty steep though if you don't have a background in Networks. That is the traditional way.
Here's a good resource geared towards beginners : Iptables for beginners
And something similar to what you need here : Unix StackExchange
I recently came across a really nice package called Uncomplicated Firewall (ufw) it happens to have an option to limit connection rate per IP and is setup in minutes. For more complicated stuff, you'll still need iptables though.
In conclusion, like Brad said,
let your application servers do what they do best... run your application.
And let firewalls do what they do best, kick out the unwanted IPs from your servers.
It is not good if you use Nodejs filter the connection or apply the connection policy as that.
It is better if you use Nginx in front of NodeJS
Client --> Nginx --> Nodejs or Application.
It is not difficult and cheap because Ngnix is opensource tooo.
Good luck.
we can use npm Package
npm i limiting-middleware
Code :
const LimitingMiddleware = require('limiting-middleware');
app.use(new LimitingMiddleware({ limit: 100, resetInterval: 1200000 }).limitByIp());
// 100 request limit. 1200000ms reset interval (20m).
For more information: Click here