I'm writing an api using nodejs and express and my app is hosted by openshift free plan.
I want to protect my routes from brute force. For example if an IP sends more than 5 requests /sec then block it for 5 minutes. :)
There's nothing stopping you from implementing this in Node.js/express directly, but this sort of thing is typically (and almost certainly more easily) handled by using something like nginx or Apache httpd to handle traffic to your app.
This has the added benefit of allowing you to run the app entirely as an unprivileged user because nginx (or whatever) will be binding to ports 80 and 443 (which requires administrative/superuser/whatever privileges) rather than your app. Plus you can easily get a bunch of other desirable features, like caching for static contents.
nginx has a module specifically for this:
The ngx_http_limit_req_module module (0.7.21) is used to limit the request processing rate per a defined key, in particular, the processing rate of requests coming from a single IP address.
There are several packages on NPM that are dedicated to this, if you are using the Express framework:
express-rate-limiter
express-limiter
express-brute
These can be used for limiting by ip, but also by other information (e.g. by username for failed login attempts).
It is better to limit rates on reverse-proxy, load balancer or any other entry point to your node.js app.
However, it doesn't fit requirements sometimes.
rate-limiter-flexible package has block option you need
const { RateLimiterMemory } = require('rate-limiter-flexible');
const opts = {
points: 5, // 5 points
duration: 1, // Per second
blockDuration: 300, // block for 5 minutes if more than points consumed
};
const rateLimiter = new RateLimiterMemory(opts);
const rateLimiterMiddleware = (req, res, next) => {
// Consume 1 point for each request
rateLimiter.consume(req.connection.remoteAddress)
.then(() => {
next();
})
.catch((rejRes) => {
res.status(429).send('Too Many Requests');
});
};
app.use(rateLimiterMiddleware);
You can configure rate-limiter-flexible for any exact route. See official express docs about using middlwares
There are also options for Cluster or distributed apps and many others useful
Related
I am required to save logs into a MySQL database of each request and response made to the backend. The issue is that we are migrating to microservices architecture. The backend was made with NodeJS and Express, and it has a middleware that does this task. Currently, it has this middleware attached to each microservice.
I would like to isolate this middleware as its own microservice. The issue is that I don't know how to redirect the traffic this way. This is how I would like to manage it:
I would like to do it this way, because we can make changes or add features to the middleware without having to implement it in each microservice. This is the middleware's code:
const connection = require("../database/db");
const viewLog = (req, res, next) => {
const oldWrite = res.write,
oldEnd = res.end,
chunks = [],
now = new Date();
res.write = function (chunk) {
chunks.push(chunk);
oldWrite.apply(res, arguments);
};
res.end = function (chunk, error) {
if (chunk) chunks.push(chunk);
const bodyRes = Buffer.concat(chunks).toString("utf8");
connection.query("CALL HospitalGatifu.insertLog(?,?,?,?,?)", [
`[${req.method}] - ${req.url}`,
`${JSON.stringify(req.body) || "{}"}`,
bodyRes,
res.statusCode === 400 ? 1 : 0,
now,
]);
oldEnd.apply(res, arguments);
};
next();
};
module.exports = viewLog;
I think there might be a way to manage this with Nginx which is the reverse proxy that we are using. I would like to get an approach of how to change the logs middleware.
Perhaps you might want to take a look at the sidecar pattern which is used in microservice architectures for common tasks (like logging).
In short, a sidecar runs in a container besides your microservice container. One task of the sidecar could be intercepting network traffic and logging requests and responses (and a lot of other possible tasks). The major advantage of this pattern is that you don't need to change any code in your microservices and you don't have to manage traffic redirection yourself. The latter will be handled by the sidecar itself.
The disadvantage is that you are required to run your microservices containerized and use some kind of container orchestration solution. I assume this being the case since you are moving towards a microservices based application.
One question about the log service in between of the webapp and the NGNIX server. What if the logging services goes down for some reason, is it acceptable for the entire application to go down?
Let me give you not exactly what you requested but something to think about.
I can think on 3 solutions for the issue of logging in microservices, each 1 have its own advantages and disadvantages:
Create a shared library that handles the logs, I think its the best choice in must cases. An article I wrote about shared libraries
You can create API gateway, it is great solution for shared logic to all the requests. So it will probably be more work but then can be used for other shared logic. Further read (not written by me :) )
A third option (which I personally don't like) is create a log microservice that listens to LogEvent or something like that. Then from your MSs publish this event whenever needed.
I'm using express-rate-limit npm package, I deployed my backend on AWS (t2 micro ec2 instance), while limiter is on, requests are blocked from ALL users who try to interact with my API, it works for a couple of minutes and stops for about 10 minutes.
when I comment out the limiter part everything is working fine,I think too many requests should be blocked for only one user who tries to hammer the server with requests but what happens is ALL users get blocked, all users are treated like only 1 user, that's my conclusion.
If that's the case what should I do? I need my rate limiter on, and if there is any other explanation what would it be?
By default, express-rate-limit has a keyGenerator of req.ip. When I log this on my server it is '::ffff:127.0.0.1' which is obviously going to be the same for every request, thus limiting for all IP addresses once it's limited for one.
My solution was to use request-ip to get the correct IP address like so:
const rateLimit = require('express-rate-limit');
const requestIp = require('request-ip');
const app = express();
app.use(requestIp.mw());
app.use(rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 30, // limit each IP to 30 requests per windowMs
keyGenerator: (req, res) => {
return req.clientIp // IP address from requestIp.mw(), as opposed to req.ip
}
}));
keyGenerator: function (req: any) {
return req.headers["x-forwarded-for"] || req.connection.remoteAddress;
}
It blocks based on iP
The express-rate-limit package blocks requests based on IP Address and that's because it provides a very basic configuration for rate-limiting that would be suitable for most applications. If you block based on user, someone can easily configure a bot to hit your APIs until the limit is reached on one user account and make a new account automatically to start hitting your server again. Blocking based on IP avoids such risks as one IP means one Device no matter how many users request from that IP. In most cases, one device is most likely to be used by one person so this solution works pretty well.
I am trying to develop an node js application that acts as api-gateway / facade layer for Services(developed in Spring Boot).
Is it a good practice or not?
If yes, which nodejs framework should I use?(Async / co / Promise / Async-Await ) etc. I mean what is currently used mostly on production enviornemnt?
"Is it a good practice or not?"
What is your question related to? Using an API gateway/facade? Using spring boot? Using async/await...? What exactly is your problem?
I guess you want to develop a spring boot based microservice architecture with a nodeJS based api orchestrator as a frontcontroller and single entry point?
Don't confuse the technical side of naive routing (load balancing with nginx, round robin, reverse proxy, etc.) to increase capacity, speed, availability etc. with the semantic business integration of services through url path mapping.
An API Orchestrator addresses the semantic abstraction and integration of an underlying service landscape. API Gateway vs. API Orchestrator!
To my personal view, using an API Orchestrator is a acceptable solution in conjunction with microservices. It is the easiest and modest way to
integrate and componse an underlying service layer.
Just to state a few positive and negative aspects:
Single entry point for standard business cases such as
authentification, security issues, session mangament, logging etc.
Can also be started and managend as a microservice. Feel free to use
a 3-tier layered architecture for the API orchestrator microservice
Abstracts the complexity of an underlying microservice layer
Might become a god thing.
In context of microservices, the API Orchestrator performs to much
of business cases
High coupling, complexity...
Design trial of a nodeJS based API Orchestrator with HTTP Communication ...
Evaluate a (web) server (express.js, hapi.js, your-own-node-server)
Evaluate a http-request API (axios, node-fetch, r2, your-own-http-api). HTTP-API should resolve to a promise object!
Example of an express.js based API Orchestrator:
const express = require('express');
const http = require('http');
const path = require('path');
const app = express();
const port = 3000;
// define middleware plugins in express.js for your API gateway like session management ...
app.use(express.static(path.join(__dirname, 'public')));
// define relevant business/use case relevant semantic routes or commands e.g. /getAllUsers or REST-URL or /whatever
app.get('/whatever', (request, response) => {
//consumes whatever service
const getWhatEverToGet = () => {
return new Promise((resolve, reject) => {
//connection data should be read from a service registry or by configuration management (process level, file level, environemnt level)
http.get({
hostname: 'localhost',
port: 3001,
path: `/whatever_service_url`
}, (res) => {
// built-in HTTP-API http.get() uses streams, hence "onData"-event should be buffered, not done here!
res.on('data', (data) => {
resolve(data.toString());
});
});
});
}
// Here you can consume more services with the same code, when they are connected to each other use async/await to share data synchronized...
//consumes whatever2 service returns promise
//consumes whatever3 service returns promise
const respondWhatEverData = async () => {
let whatEver = await getWhatEverToGet();
response.send(whatEver)
}
// trigger service complete
respondWhatEverData();
})
app.listen(port, (err) => {
if (err) {
return console.log('Shit happens...', err)
}
console.log(`server listens on ${port}`)
})
TL;DR If your NodeJS application is only expected to forward the request to Spring Boot application, then NodeJS setup would probably not be worth it. You should look at Nginx revere proxy which can do all that efficiently.
Async / co / Promise / Async-Await are not frameworks. Promise / async-await are programming constructs in NodeJS; Async / co are convenience libraries to make manage asynchronous code manageable before Promises and async-await were introduced. That said there are multiple rest frameworks, that you could use to receive and pipe requests to your SpringBoot servers. Take a look at Express.JS, Restify, Sails.js all of them can add REST capabilities to NodeJS. You will also need a Rest Client library (like axios or request both of them then support Promises) to be able to forward your requests to target server.
I currently have a frontend-only app that fetches 5-6 different JSON feeds, grabs some necessary data from each of them, and then renders a page based on said data. I'd like to move the data fetching / processing part of the app to a server-side node application which outputs one simple JSON file which the frontend app can fetch and easily render.
There are two noteworthy complications for this project:
1) The new backend app will have to live on a different server than its frontend counterpart
2) Some of the feeds change fairly often, so I'll need the backend processing to constantly check for changes (every 5-10 seconds). Currently with the frontend-only app, the browser fetches the latest versions of the feeds on load. I'd like to replicate this behavior as closely as possible
My thought process for solving this took me in two directions:
The first is to setup an express application that uses setTimeout to constantly check for new data to process. This data is then sent as a response to a simple GET request:
const express = require('express');
let app = express();
let processedData = {};
const getData = () => {...} // returns a promise that fetches and processes data
/* use an immediately invoked function with setTimeout to fetch the data
* when the program starts and then once every 5 seconds after that */
(function refreshData() {
getData.then((data) => {
processedData = data;
});
setTimeout(refreshData, 5000);
})();
app.get('/', (req, res) => {
res.send(processedData);
});
app.listen(port, () => {
console.log(`Started on port ${port}`);
});
I would then run a simple get request from the client (after properly adjusting CORS headers) to get the JSON object.
My questions about this approach are pretty generic: Is this even a good solution to this problem? Will this drive up hosting costs based on processing / client GET requests? Is setTimeout a good way to have a task run repeatedly on the server?
The other solution I'm considering would deal with setting up an AWS Lambda that writes the resulting JSON to an s3 bucket. It looks like the minimum interval for scheduling an AWS Lambda function is 1 minute, however. I imagine I could set up 3 or 4 identical Lambda functions and offset them by 10-15 seconds, however that seems so hacky that it makes me physically uncomfortable.
Any suggestions / pointers / solutions would be greatly appreciated. I am not yet a super experienced backend developer, so please ELI5 wherever you deem fit.
A few pointers.
Use crontasks for periodic processing of data. This is far preferable especially if you are formatting a lot of data.
Don't setup multiple Lambda functions for the same task. It's going to be messy to maintain all those functions.
After processing / fetching the feed, you can store the JSON file in your own server or S3. Note that if it's S3, then you are paying and waiting for a network operation. You can read the file from your express app and just send the response back to your clients.
Depending on the file size and your load in the server you might want to add a caching server so that you can cache the response until new JSON data is available.
I have a web app that accepts api requests from an ios app. My web app is hosted on Heroku using their free dyno which is able to process 512 mb of data per request. Because node is a single threaded application this will be a problem once we start getting higher levels of traffic from the ios end to the web server. I'm also not the richest person in the world so i'm wondering if it would be smart to create another free heroku app and use a round robin approach to balance the load received from the ios app?
I just need to be pointed into the right direction. Vertical scaling is not really an option financially.
I'm the Node.js platform owner at Heroku.
You may be doing some premature optimization. Node.js, on our smallest 1X size (512MB RAM), can handle hundreds of simultaneous connections and thousands of requests per minute.
If your iOS app is consistently maxing that out, it may be time to consider monetization!
As mentioned by Daniel it's against Heroku rules. Having said that there are probably other services that would allow you to do that.
One way to approach this problem is to use cluster module with ZeroMQ (you need to have ZeroMQ installed before using the module - see module description).
var cluster = require('cluster');
var zmq = require('zmq');
var ROUTER_SOCKET = 'tcp://127.0.0.1:5555';
var DEALER_SOCKET = 'tcp://127.0.0.1:7777';
if (cluster.isMaster) {
// this is the main process - create Router and Dealer sockets
var router = zmq.socket('router').bind(ROUTER_SOCKET);
var dealer = zmq.socket('dealer').bind(DEALER_SOCKET);
// forward messages between router and dealer
router.on('message', function() {
var frames = Array.prototype.slice.cal(arguments);
dealer.send(frames);
});
dealer.on('message', function() {
var frames = Array.prototype.slice.cal(arguments);
router.send(frames);
});
// listen for workers processes to come online
cluster.on('online', function() {
// do something with a new worker, maybe keep an array of workers
});
// fork worker processes
for (var i = 0, i < 100; i++) {
cluster.fork();
}
} else {
// worker process - connect to Dealer
let responder = zmq.socket('rep').connect(DEALER_SOCKET);
responder.on('message', function(data) {
// do something with incomming data
})
}
This is just to point you in the right direction. If you think about it you can create a script with a parameter that will tell it if it's a master or a worker process. Then on the main server run it as is, and on additional servers run it using worker flag which will force it to connect to the main dealer.
Now your main app needs to send the requests to the router, which will be later forwarded to the worker processes:
var zmq = require('zmq');
var requester = zmq.socket('req');
var ROUTER_SOCKET = 'tcp://127.0.0.1:5555';
// handle replies - for example completion status from the worker processes
requester.on('message', function(data) {
// do something with the replay
});
requester.connect(ROUTER_SOCKET);
// send requests to the router
requester.send({
// some object describing the task
});
So first off, as the other replies have pointed out, running two copies of your app to avoid Heroku's limits violates their ToS, which may not be a great idea.
There is, however, some good news. For starters (from Heroku's docs):
The dyno manager will restart your dyno and log an R15 error if the memory usage of a:
free, hobby or standard-1x dyno reaches 2.5GB, five times its quota.
As I understand it, despite the fact that your dyno has 512mb of actual RAM, it'll swap out to 5x that before it actually restarts. So you can go beyond 512mb (as long as you're willing to pay the performance penalty for swapping to disk, which can be severe).
Further to that, Heroku bills by the second and allows you to scale your dyno formation up and down as needed. This is fairly easy to do within your own app by hitting the Heroku API – I see that you've tagged this with NodeJS so you might want to check out:
Heroku's node client
the very-barebones-but-still-functional toots/node-heroku module
Both of these modules allow you to scale up and down your formation of dynos — with a simple heuristic (say, always have a spare 1X dyno running), you could add capacity while you're processing a request, and get rid of the spare capacity when api requests aren't running. Given that you're billed by the second, this can end up being very inexpensive; 1X dynos work out to something like 5¢ an hour to run. If you end up running extra dynos for even a few hours a day, it's a very, very small cost to you.
Finally: there are also 3rd party services such as Adept and Hirefire (two random examples from Google, I'm sure there are more) that allow you to automate this to some degree, but I don't have any experience with them.
You certainly could, I mean, programatically - but that would bypass Heroku's TOS:
4.4 You may not develop multiple Applications to simulate or act as a single Application or otherwise access the Heroku Services in a manner intended to avoid incurring fees.
Now, I'm not sure about this:
Because node is a single threaded application this will be a problem once we start getting higher levels of traffic from the ios end to the web server.
There are some threads discussing that, with some interesting answers:
Clustering Node JS in Heavy Traffic Production Environment
How to decide when to use Node.js?
Also, they link to this video, introducing Node.js, which talks a bit about benchmarks:
Introduction of Node JS by Ryan Dahl