How to limit the amount of requests per ip in Node.JS? - node.js

I'm was trying to think of a way to help minimize the damage on my node.js application if I ever get a DDOS attack. I want to limit requests per IP. I want to limit every IP address to so many requests per second. For example: No IP address can exceed 10 requests every 3 seconds.
So far I have come up with this:
http.createServer(req, res, function() {
if(req.connection.remoteAddress ?????? ) {
block ip for 15 mins
}
}

If you want to build this yourself at the app server level, you will have to build a data structure that records each recent access from a particular IP address so that when a new request arrives, you can look back through the history and see if it has been doing too many requests. If so, deny it any further data. And, to keep this data from piling up in your server, you'd also need some sort of cleanup code that gets rid of old access data.
Here's an idea for a way to do that (untested code to illustrate the idea):
function AccessLogger(n, t, blockTime) {
this.qty = n;
this.time = t;
this.blockTime = blockTime;
this.requests = {};
// schedule cleanup on a regular interval (every 30 minutes)
this.interval = setInterval(this.age.bind(this), 30 * 60 * 1000);
}
AccessLogger.prototype = {
check: function(ip) {
var info, accessTimes, now, limit, cnt;
// add this access
this.add(ip);
// should always be an info here because we just added it
info = this.requests[ip];
accessTimes = info.accessTimes;
// calc time limits
now = Date.now();
limit = now - this.time;
// short circuit if already blocking this ip
if (info.blockUntil >= now) {
return false;
}
// short circuit an access that has not even had max qty accesses yet
if (accessTimes.length < this.qty) {
return true;
}
cnt = 0;
for (var i = accessTimes.length - 1; i >= 0; i--) {
if (accessTimes[i] > limit) {
++cnt;
} else {
// assumes cnts are in time order so no need to look any more
break;
}
}
if (cnt > this.qty) {
// block from now until now + this.blockTime
info.blockUntil = now + this.blockTime;
return false;
} else {
return true;
}
},
add: function(ip) {
var info = this.requests[ip];
if (!info) {
info = {accessTimes: [], blockUntil: 0};
this.requests[ip] = info;
}
// push this access time into the access array for this IP
info.accessTimes.push[Date.now()];
},
age: function() {
// clean up any accesses that have not been here within this.time and are not currently blocked
var ip, info, accessTimes, now = Date.now(), limit = now - this.time, index;
for (ip in this.requests) {
if (this.requests.hasOwnProperty(ip)) {
info = this.requests[ip];
accessTimes = info.accessTimes;
// if not currently blocking this one
if (info.blockUntil < now) {
// if newest access is older than time limit, then nuke the whole item
if (!accessTimes.length || accessTimes[accessTimes.length - 1] < limit) {
delete this.requests[ip];
} else {
// in case an ip is regularly visiting so its recent access is never old
// we must age out older access times to keep them from
// accumulating forever
if (accessTimes.length > (this.qty * 2) && accessTimes[0] < limit) {
index = 0;
for (var i = 1; i < accessTimes.length; i++) {
if (accessTimes[i] < limit) {
index = i;
} else {
break;
}
}
// remove index + 1 old access times from the front of the array
accessTimes.splice(0, index + 1);
}
}
}
}
}
}
};
var accesses = new AccessLogger(10, 3000, 15000);
// put this as one of the first middleware so it acts
// before other middleware spends time processing the request
app.use(function(req, res, next) {
if (!accesses.check(req.connection.remoteAddress)) {
// cancel the request here
res.end("No data for you!");
} else {
next();
}
});
This method also has the usual limitations around IP address monitoring. If multiple users are sharing an IP address behind NAT, this will treat them all as one single user and they may get blocked due to their combined activity, not because of the activity of one single user.
But, as others have said, by the time the request gets this far into your server, some of the DOS damage has already been done (it's already taking cycles from your server). It might help to cut off the request before doing more expensive operations such as database operations, but it is even better to detect and block this at a higher level (such as Nginx or a firewall or load balancer).

I don't think that is something that should be done at the http server level. Basically, it doesn't prevent users to reach your server, even if they won't see anything for 15 minutes.
In my opinion, you should handle that within your system, using a firewall. Although it's more a discussion for ServerFault or SuperUser, let me give you a few pointers.
Use iptables to setup a firewall on your entry point (your server or whatever else you have access to up the line). iptables allows you to set a limit of max connections per IP. The learning curve is pretty steep though if you don't have a background in Networks. That is the traditional way.
Here's a good resource geared towards beginners : Iptables for beginners
And something similar to what you need here : Unix StackExchange
I recently came across a really nice package called Uncomplicated Firewall (ufw) it happens to have an option to limit connection rate per IP and is setup in minutes. For more complicated stuff, you'll still need iptables though.
In conclusion, like Brad said,
let your application servers do what they do best... run your application.
And let firewalls do what they do best, kick out the unwanted IPs from your servers.

It is not good if you use Nodejs filter the connection or apply the connection policy as that.
It is better if you use Nginx in front of NodeJS
Client --> Nginx --> Nodejs or Application.
It is not difficult and cheap because Ngnix is opensource tooo.
Good luck.

we can use npm Package
npm i limiting-middleware
Code :
const LimitingMiddleware = require('limiting-middleware');
app.use(new LimitingMiddleware({ limit: 100, resetInterval: 1200000 }).limitByIp());
// 100 request limit. 1200000ms reset interval (20m).
For more information: Click here

Related

Does a NodeJS app has to be purely stateless in order for it to be replicated?

In my Node application, there are Values that the User can define. These Values, once created, can change, either from a user-triggered action or from something else, for example, a MQTT message received on the server. A Value can change very sporadically or a few times per second.
class Value {
constructor(valueStore, id, name) {
this.valueStore = valueStore;
this.id = id;
this.name = name;
}
// ...
}
Because some Values can change many times per second, I don't save the Values to MongoDB Atlas every time they change (data transfer costs are quite expensive). Instead, I have a "ValueStore", which is basically a global object where all my Values are stored with their current value. In case my app goes down, I save the contents of the ValueStore to Atlas every 5 seconds, which is much less expensive.
// This is a global object
class ValueStore {
constructor() {
this.values = [];
this.bufferUpdateInterval = setInterval(() => {
// Every 5s, save values in ValueStore collection
}, VALUES_UPDATE_DB_FREQUENCY);
}
setValue(value) {
this.value = value;
}
// ...
}
I haven't yet implemented zero-downtime deployment. When I update my application, I have to bring the app down. When I put it back up, I want all the Values to be initialized with the Values they had when I put the app down. So, before anything else, I have to query the collection in which I save my Values every 5 seconds in order to reinstantiate my ValueStore.
// When my app starts:
const initializeValueStore = async (streams, variables) => {
const values = await Values.getLastValues(); // call to get the last known Values
for (let i = 0; i < values.length; i++) {
const lastKnownValue = // ...
valueStore.addValue(valueStore, values[i].id, values[i].name, lastKnownValue);
}
}
Now, I want to scale my application and implement patterns such as zero-downtime deployment, which implies replicating my app across several nodes. The more I think about it and the more I am under the impression that I won't be able to do that until I make my app stateless (i.e.: that I get rid of the ValueStore).
Am I right to think that my app should be stateless in order for it to be replicated and if so, how could I do differently what I'm currently doing with my ValueStore? Could a Redis cache come into play?

Inconsistent request behavior in Node when requesting large number of links?

I am currently using this piece of code to connect to a massive list of links (a total of 2458 links, dumped at https://pastebin.com/2wC8hwad) to get feeds from numerous sources, and to deliver them to users of my program.
It's basically splitting up one massive array into multiple batches (arrays), then forking a process to handle a batch to request each stored link for a 200 status code. Only when a batch is complete is the next batch sent for processing, and when its all done the forked process is disconnected. However I'm facing issues concerning apparent inconsistency in how this is performing with this logic, particularly the part where it requests the code.
const req = require('./request.js')
const process = require('child_process')
const linkList = require('./links.json')
let processor
console.log(`Total length: ${linkList.length}`) // 2458 links
const batchLength = 400
const batchList = [] // Contains batches (arrays) of links
let currentBatch = []
for (var i in linkList) {
if (currentBatch.length < batchLength) currentBatch.push(linkList[i])
else {
batchList.push(currentBatch)
currentBatch = []
currentBatch.push(linkList[i])
}
}
if (currentBatch.length > 0) batchList.push(currentBatch)
console.log(`Batch list length by default is ${batchList.length}`)
// cutDownBatchList(1)
console.log(`New batch list length is ${batchList.length}`)
const startTime = new Date()
getBatchIsolated(0, batchList)
let failCount = 0
function getBatchIsolated (batchNumber) {
console.log('Starting batch #' + batchNumber)
let completedLinks = 0
const currentBatch = batchList[batchNumber]
if (!processor) processor = process.fork('./request.js')
for (var u in currentBatch) { processor.send(currentBatch[u]) }
processor.on('message', function (linkCompletion) {
if (linkCompletion === 'failed') failCount++
if (++completedLinks === currentBatch.length) {
if (batchNumber !== batchList.length - 1) setTimeout(getBatchIsolated, 500, batchNumber + 1)
else finish()
}
})
}
function finish() {
console.log(`Completed, time taken: ${((new Date() - startTime) / 1000).toFixed(2)}s. (${failCount}/${linkList.length} failed)`)
processor.disconnect()
}
function cutDownBatchList(maxBatches) {
for (var r = batchList.length - 1; batchList.length > maxBatches && r >= 0; r--) {
batchList.splice(r, 1)
}
return batchList
}
Below is request.js, using needle. (However, for some strange reason it may completely hang up on a particular site indefinitely - in that case, I just use this workaround)
const needle = require('needle')
function connect (link, callback) {
const options = {
timeout: 10000,
read_timeout: 8000,
follow_max: 5,
rejectUnauthorized: true
}
const request = needle.get(link, options)
.on('header', (statusCode, headers) => {
if (statusCode === 200) callback(null, link)
else request.emit('err', new Error(`Bad status code (${statusCode})`))
})
.on('err', err => callback(err, link))
}
process.on('message', function(linkRequest) {
connect(linkRequest, function(err, link) {
if (err) {
console.log(`Couldn't connect to ${link} (${err})`)
process.send('failed')
} else process.send('success')
})
})
In theory, I think this should perform perfectly fine - it spawns off a separate process to handle the dirty work in sequential batches so its not overloaded and is super scaleable. However, when using using the full list of links at length 2458 with a total of 7 batches, I often get massive "socket hang up" errors on random batches on almost every trial that I do, similar to what would happen if I requested all the links at once.
If I cut down the number of batches to 1 using the function cutDownBatchList it performs perfectly fine on almost every trial. This is all happening on a Linux Debian VPS with two 3.1GHz vCores and 4 GB RAM from OVH, on Node v6.11.2
One thing I also noticed is that if I increased the timeout to 30000 (30 sec) in request.js for 7 batches, it works as intended - however it works perfectly fine with a much lower timeout when I cut it down to 1 batch. If I also try to do all 2458 links at once, with a higher timeout, I also face no issues (which basically makes this mini algorithm useless if I can't cut down the timeout via batch handling links). This all goes back to the inconsistent behavior issue.
The best TLDR I can do: Trying to request a bunch of links in sequential batches in a forked child process - succeeds almost every time with a lower number of batches, fails consistently with full number of batches even though behavior should be the same since its handling it in isolated batches.
Any help would be greatly appreciated in solving this issue as I just cannot for the life of me figure it out!

How do you implement AWS Elasticache auto discovery for node.js

I'm a node noob and trying to understand how one would implement auto discovery in a node.js application. I'm going to use the cluster module and want each worker process to be kept up to date (and persistently connected to) the elasticache nodes.
Since there is no concept of shared memory (like PHP APC) would you have to have code that runs in each worker, that wakes up every X seconds and somehow updates the list of IP's and re-connects the memcache client?
How do people solve this today? Example code would be much appreciated.
Note that at this time, Auto Discovery is only available for cache clusters running the memcached engine.
For Cache Engine Version 1.4.14 or Higher you need to create a TCP/IP socket to the Cache Cluster Configuration Endpoint (or any Cache Node Endpoint) and send this command:
config get cluster
With Node.js you can use the net.Socket class to to that.
The reply consists of two lines:
The version number of the configuration information. Each time a node is added or removed from the cache cluster, the version number increases by one.
A list of cache nodes. Each node in the list is represented by a hostname|ip-address|port group, and each node is delimited by a space.
A carriage return and a linefeed character (CR + LF) appears at the end of each line.
Here you can find a more thorough description of how to add Auto Discovery to your client library.
Using the cluster module you need to store the same information in each process (i.e. child) and I would use "setInterval" per child to periodically check (e.g. every 60 seconds) the list of nodes and re-connect only if the list has changed (this should not happen very often).
You can optionally update the list on the master only and use "worker.send" to update the workers. This could keep all the processes running in a single server more in sync, but it would not help in a multi server architecture, so it is very important to use consistent hashing in order to be able to change the list of nodes and loose the "minimum" amount of keys stored in the memcached cluster.
I would use a global variable to store this kind of configuration.
Thinking twice you can use the AWS SDK for Node.js to get the list of ElastiCache Nodes (and that works for the Redis engine as well).
In that case the code would be something like:
var util = require('util'),
AWS = require('aws-sdk'),
Memcached = require('memcached');
global.AWS_REGION = 'eu-west-1'; // Just as a sample I'm using the EU West region
global.CACHE_CLUSTER_ID = 'test';
global.CACHE_ENDPOINTS = [];
global.MEMCACHED = null;
function init() {
AWS.config.update({
region: global.AWS_REGION
});
elasticache = new AWS.ElastiCache();
function getElastiCacheEndpoints() {
function sameEndpoints(list1, list2) {
if (list1.length != list2.length)
return false;
return list1.every(
function(e) {
return list2.indexOf(e) > -1;
});
}
function logElastiCacheEndpoints() {
global.CACHE_ENDPOINTS.forEach(
function(e) {
util.log('Memcached Endpoint: ' + e);
});
}
elasticache.describeCacheClusters({
CacheClusterId: global.CACHE_CLUSTER_ID,
ShowCacheNodeInfo: true
},
function(err, data) {
if (!err) {
util.log('Describe Cache Cluster Id:' + global.CACHE_CLUSTER_ID);
if (data.CacheClusters[0].CacheClusterStatus == 'available') {
var endpoints = [];
data.CacheClusters[0].CacheNodes.forEach(
function(n) {
var e = n.Endpoint.Address + ':' + n.Endpoint.Port;
endpoints.push(e);
});
if (!sameEndpoints(endpoints, global.CACHE_ENDPOINTS)) {
util.log('Memached Endpoints changed');
global.CACHE_ENDPOINTS = endpoints;
if (global.MEMCACHED)
global.MEMCACHED.end();
global.MEMCACHED = new Memcached(global.CACHE_ENDPOINTS);
process.nextTick(logElastiCacheEndpoints);
setInterval(getElastiCacheEndpoints, 60000); // From now on, update every 60 seconds
}
} else {
setTimeout(getElastiCacheEndpoints, 10000); // Try again after 10 seconds until 'available'
}
} else {
util.log('Error describing Cache Cluster:' + err);
}
});
}
getElastiCacheEndpoints();
}
init();

How do I execute a piece of code no more than every X minutes?

Say I have a link aggregation app where users vote on links. I sort the links using hotness scores generated by an algorithm that runs whenever a link is voted on. However running it on every vote seems excessive. How do I limit it so that it runs no more than, say, every 5 minutes.
a) use cron job
b) keep track of the timestamp when the procedure was last run, and when the current timestamp - the timestamp you have stored > 5 minutes then run the procedure and update the timestamp.
var yourVoteStuff = function() {
...
setTimeout(yourVoteStuff, 5 * 60 * 1000);
};
yourVoteStuff();
Before asking why not to use setTimeinterval, well, read the comment below.
Why "why setTimeinterval" and no "why cron job?"?, am I that wrong?
First you build a receiver that receives all your links submissions.
Secondly, the receiver push()es each link (that has been received) to
a queue (I strongly recommend redis)
Moreover you have an aggregator which loops with a time interval of your desire. Within this loop each queued link should be poll()ed and continue to your business logic.
I have use this solution to a production level and I can tell you that scales well as it also performs.
Example of use;
var MIN = 5; // don't run aggregation for short queue, saves resources
var THROTTLE = 10; // aggregation/sec
var queue = [];
var bucket = [];
var interval = 1000; // 1sec
flow.on("submission", function(link) {
queue.push(link);
});
___aggregationLoop(interval);
function ___aggregationLoop(interval) {
setTimeout(function() {
bucket = [];
if(queue.length<=MIN) {
___aggregationLoop(100); // intensive
return;
}
for(var i=0; i<THROTTLE; ++i) {
(function(index) {
bucket.push(this);
}).call(queue.pop(), i);
}
___aggregationLoop(interval);
}, interval);
}
Cheers!

Socket.IO server throttling a fast client

I have a server that uses socket.io and I need a way of throttling a client that is sending the server data too quickly. The server exposes both a TCP interface and a socket.io interface - with the TCP server (from the net module) I can use socket.pause() and socket.resume(), and this effectively throttles the client. But with socket.io's socket class there are no pause() and resume() methods.
What would be the easiest way of getting feedback to a client that it is overwhelming the server and needs to slow down? I liked socket.pause() and socket.resume() because it didn't require any additional code on the client-side - backup the TCP socket and things naturally slow down. Any equivalent for socket.io?
Update: I provide an API to interact with the server (there is currently a python version which runs over TCP and a JavaScript version which uses socket.io). So I don't have any real control over what the client does. Which is why using socket.pause() and socket.resume() is so great - backing up the TCP stream slows the python client down no matter what it tries to do. I'm looking for an equivalent for a JavaScript client.
With enough digging I found this:
this.manager.transports[this.id].socket.pause();
and
this.manager.transports[this.id].socket.resume();
Granted this probably won't work if the socket.io connection isn't a web sockets connection, and may break in a future update, but for now I'm going to go with it. When I get some time in the future I'll probably change it to the QUOTA_EXCEEDED solution that Pascal proposed.
Here is a dirty way to achieve throttling. Although this is a old post; some people may benefit from it:
First register a middleware:
io.on("connection", function (socket) {
socket.use(function (packet, next) {
if (throttler.canBeServed(socket, packet)) {
next();
}
});
//You other code ..
});
canBeServed is a simple throttler as seen below:
function canBeServed(socket, packet) {
if (socket.markedForDisconnect) {
return false;
}
var previous = socket.lastAccess;
var now = Date.now();
if (previous) {
var diff = now - previous;
//Check diff and disconnect if needed.
if (diff < 50) {
socket.markedForDisconnect = true;
setTimeout(function () {
socket.disconnect(true);
}, 1000);
return false;
}
}
socket.lastAccess = now;
return true;
}
You can use process.hrtime() instead of Date.time().
If you have a callback on your server somewhere which normally sends back the response to your client, you could try and change it like this:
before:
var respond = function (res, callback) {
res.send(data);
};
after
var respond = function (res, callback) {
setTimeout(function(){
res.send(data);
}, 500); // or whatever delay you want.
};
Looks like you should slow down your clients. If one client can send too fast for your server to keep up, this is not going to go very well with 100s of clients.
One way to do this would be have the client wait for the reply for each emit before emitting anything else. This way the server can control how fast the client can send by only answering when ready for example, or only answer after a set time.
If this is not enough, when a client exceeded x requests per second, start replying with something like QUOTA_EXCEEDED error, and ignore the data they send in. This will force external developers to make their app behave as you want them to do.
As another suggestion, I would propose a solution like this:
It is common for MySQL to get a large amount of requests which would take longer time to apply than the rate the requests coming in.
The server can record the requests in a table in db assuming this action is fast enough for the rate the requests are coming in and then process the queue at a normal rate for the server to sustain. This buffer system will allow the server to run slow but still process all the requests.
But if you want something sequential, then the request callback should be verified before the client can send another request. In this case, there should be a server ready flag. If the client is sending request while the flag is still red, then there can be a message telling the client to slow down.
simply wrap your client emitter into a function like below
let emit_live_users = throttle(function () {
socket.emit("event", "some_data");
}, 2000);
using use a throttle function like below
function throttle(fn, threshold) {
threshold = threshold || 250;
var last, deferTimer;
return function() {
var now = +new Date, args = arguments;
if(last && now < last + threshold) {
clearTimeout(deferTimer);
deferTimer = setTimeout(function() {
last = now;
fn.apply(this, args);
}, threshold);
} else {
last = now;
fn.apply(this, args);
}
}
}

Resources