Dealing with / preventing potentially malicious requests (AWS, Node.js) - node.js

I have a server that is running on aws - it's load balanced to some ec2 instances that run node.js servers. The security groups are set up so that only the LB can hit them on the HTTP port.
I was tailing some log files, and saw a bunch (50 or so at a time, seemingly somewhat periodically) of requests to /manager/html - AFAIK this looks like an attempt to expose a vulnerability in my app or gain access to a database manager of some sort.
My questions are:
Am I being targeted or are these random crawlers? This is on a service that is not even launched yet, so it's definitely obscure. There's been a bit of press about the service, so it's feasible that a person would be aware of our domain, but this subdomain has not been made public.
Are there common conventions for not allowing these types of requests to hit my instances? Preferably, I'd be able to configure some sort of frequency or blacklist in my LB, and never have these types of requests hit an instance. Not sure how to detect malicious vs normal traffic though.
Should I be running a local proxy on my ec2 instances to avoid this type of thing? Are there any existing node.js solutions that can just refuse the requests at the app level? Is that a bad idea?
Bonus: If I were to log the origin of these requests, would that information be useful? Should I try to go rogue and hunt down the origin and send some hurt their way? Should I beeswithmachineguns the originating IP if it's a single origin? (I realize this is silly, but may inspire some fun answers).
Right now these requests are not effecting me, they get 401s or 404s, and it has virtually no impact on other clients. But if this were to go up in scale, what are my options?

There are too many random automated requests are being made, even I host a nodejs server, they try to use cgi and phpmyadmin/wordpress configs. You can just use basic rate limiting techniques (redis-throttle)[https://npmjs.org/package/redis-throttle] for both your NodeJS server and ssh fail2ban to protect yourself from simple DoS attacks.
Automatic requests cannot do harm unless NodeJS or the libraries you have as well known flaws, so you should be always input & security checking all over your server. You should not be worried if you coded well. (Don't dump errors to users, sanitize input etc.)
You can log your 401 and 404s for a week, and filter the most common ones via your LB. Hunting down the IPs and sources will not help you if you are not a hollywood producer or fighting terrorists, as yoır problem is not so imporant and most importantly these requests are mostly from botnets.

We had faced similar issues in the past and we had taken some preventive measures to stop such attacks though it can't guarantee to stop them completely but it showed significant measures in the reduction of such attacks.
http://uksysadmin.wordpress.com/2011/03/21/protecting-ssh-against-brute-force-attacks/
http://www.prolexic.com/knowledge-center-white-paper-ddos-mitigation-incident-response-plan-playbook.html
https://serverfault.com/questions/340307/how-can-i-prevent-a-ddos-attack-on-amazon-ec2
Hope this helps.

Consider running a proxy cache like Varnish in front of your app servers. Use it's VCL to allow access to only the URI you define and reject everything else, allow GET but block PUT and POST, etc... Can also be used to filter http response headers you return. This would let you mask your node.js server as apache for example. Many tuts out on the net to implement this.

Related

Websockets - Avoid unknown connections

I have website X which runs a websocket server (socket.io, all based on node.js) and feeds clients with live data.
Now, competitor Y started connecting to our websocket server and straight out stealing data from it, displaying it also live on their website.
I've made numerous attempts at blocking their IPs, but they'll just keep changing it within a matter of minutes. (All kinds of AWS / cloud hosting providers). I check the referrer header, user-agent, accept-language, pretty much anything but all of that is spoofable and they do this already.
The websocket connections are proxied through nginx, if that helps.
What would you do?
Client X (the website) or client Y (the competitor) is meaningless. It's just a client. There is no reliable (i.e. impossible to hack) way to distinguish them unless you restrict IPs (which you already know that it fails). That's because client Y can easily construct a HTTP request/Websocket connection from scratch so that it looks like client X. And there's more: going down that road might be a waste of time and other resources. Eventually you will be hacked. The question is: which company has more resources to withstand this fight? :)
Authentication doesn't change much. Because client Y can authenticate as well. It's just instead of fighting with IPs you fight with user credentials. It might be easier though. You should try it.
So IMHO all in all what you end up with is constant monitoring and reactions. If they break law/agreements then you should sue them. If they don't then you can try this guerilla warfare. You might win eventually, who knows.
You could verify the Origin header in the websocket request matches the origin of your clients. However, I'm not sure if it's possible to fake this header at all.
An Authentication subsystem is a well-known solution to this problem to a large degree.

What are some strategies to prevent flooding/abuse of api requests

I have an API on my server(node) that writes new data into my database.
To use the API the user is required to provide a token which acts as an identifier. So if someone floods my database or abuses the api, I can tell who it is.
But, what are some techniques I can use to prevent the ability to flood or hang my server all together? Notice that most request to the API are done by the server itself, so, in theory I might get dozens of requests a second from my own server's address.
I'd love to get some references to reading materials.
Thanks!
You could use this module: https://www.npmjs.com/package/ddos to put limits depending on the user.
However you will still be exposed to larger scale ddos attacks. These attacks cannot be stopped at the node.js level since they often target infrastructure. This is another can of worms however.
Try to configure limits on proxy or/and load balancer.
Alternatively, you can use rate-limiter-flexible package to limit number of requests by user per N seconds.
There is also black-and-white list, so you're able to whitelist your server's IP.

Setting up a secure back-end NodeJS server for multiple front-end domains

I've been doing a lot of research recently on creating a backend for all the websites that I run and a few days ago I leased a VPS running Debian.
Long-term, I'd like to use it as the back-end for some web applications. However, these client-side javascript apps are running on completely different domains than the VPS domain. I was thinking about running the various back-end applications on the VPS as daemons. For example, daemon 1 is a python app, daemons 2 and 3 are node js, etc. I have no idea how many of these I might eventually create.
Currently, I only have a single NodeJS app running on the VPS. I want to implement two methods on it listening over some arbitrary port, port 4000 for example:
/GetSomeData (GET request) - takes some params and serves back some JSON
/AddSomeData (POST request) - takes some params and adds to a back-end MySQL db
These methods should only be useable from one specific domain (called DomainA) which is different than the VPS domain.
Now one issue that I feel I'm going to hit my head against is CORS policy. It sounds like I need to include a response header for Access-Control-Allow-Origin: DomainA. The problem is that in the future, I may want to add another acceptable requester domain, for example DomainB. What would I do then? Would I need to validate the incoming request.connection.remoteAddress, and if it matched DomainA/DomainB, write the corresponding Access-Control-Allow-Origin?
As of about 5 minutes ago before posting this question, I came across this from the W3C site:
Resources that wish to enable themselves to be shared with multiple Origins but do not respond uniformly with "*" must in practice generate the Access-Control-Allow-Origin header dynamically in response to every request they wish to allow. As a consequence, authors of such resources should send a Vary: Origin HTTP header or provide other appropriate control directives to prevent caching of such responses, which may be inaccurate if re-used across-origins.
Even if I do this, I'm a little worried about security. By design anyone on my DomainA website can use the web app, you don't have to be a registered user. I'm concerned about attackers spoofing their IP address to be equal to DomainA. It seems like it wouldn't matter for the GetSomeData request since my NodeJS would then send the data back to DaemonA rather than the attacker. However, what would happen if the attackers ran a script to POST to AddSomeData a thousand times? I don't want my sql table being filled up by malicious requests.
On another note, I've been reading about nginx and virtual hosts and how you can use them to establish different routes depending on the incoming domain but I don't BELIEVE that I need these things; however perhaps I'm mistaken.
Once again, I don't want to use the VPS as a web-site server, the Node JS listener is going to be returning some collection of JSON hence why I'm not making use of port 80. In fact the primary use of the VPS is to do some heavy manipulation of data (perhaps involving the local MySQL db) and then return a collection of JSON that any number of front-end client browser apps can use.
I've also read some recommendations about making use of NodeJS Restify or ExpressJS. Do I need these for what I'm trying to do?

Optimizing Node.js for a large number of outbound HTTP requests?

My node.js server is experiencing times when it becomes slow or unresponsive, even occasionally resulting in 503 gateway timeouts when attempting to connect to the server.
I am 99% sure (based upon tests that I have run) that this lag is coming specifically from the large number of outbound requests I am making with the node-oauth module to contact external APIs (Facebook, Twitter, and many others). Admittedly, the number of outbound requests being made is relatively large (in the order of 30 or so per minute). Even worse, this frequently means that the corresponding inbound requests to my server can take ~5-10 seconds to complete. However, I had a previous version of my API which I had written in PHP which was able to handle this amount of outbound requests without any problem at all. Actually, the CPU usage for the same number (or even fewer) requests with my Node.js API is about 5x that of my PHP API.
So, I'm trying to isolate where I can improve upon this, and most importantly to make sure that 503 timeouts do not occur. Here's some stuff I've read about or experimented with:
This article (by LinkedIn) recommends turning off socket pooling. However, when I contacted the author of the popular nodejs-request module, his response was that this was a very poor idea.
I have heard it said that setting "http.globalAgent.maxSockets" to a large number can help, and indeed it did seem to reduce bottlenecking for me
I could go on, but in short, I have been able to find very little definitive information about how to optimize performance so these outbound connections do not lag my inbound requests from clients.
Thanks in advance for any thoughts or contributions.
FWIW, I'm using express and mongoose as well, and my servers are hosted on the Amazon Cloud (2x M1.Large for the node servers, 2x load balancers, and 3x M1.Small MongoDB instances).
It sounds to me that the Agent is capping your requests to the default level of 5 per-host. Your tests show that cranking up the agent's maxSockets helped... you should do that.
You can prove this is the issue by firing up a packet sniffer, or adding more debugging code to your application, to show that this is the limiting factor.
http://engineering.linkedin.com/nodejs/blazing-fast-nodejs-10-performance-tips-linkedin-mobile
Disable the agent altogether.

How do I maintain state across multiple web servers?

Can I have multiple web servers hooked up to a SQL Server cluster and still maintain a user's session?
I've thought of various approaches. The one suggested by the Microsoft site is to use response.redirect to the "correct" server. While I can understand the reasoning for this, it seems kind of short sighted.
If the load balancer is sending you to the server currently under the least strain, surely as a developer you should honor that?
Are there any best practices to follow in this instance? If so, I would appreciate knowing what they are and any insights into the pros/cons of using them.
Some options:
The load balancer can be configured to have sticky sessions. Make sure your app session timeout is less than the load balancers or you'll get bounced around with unpredictable results.
You can use a designated state server to handle session. Then it won't matter where they get bounced by the LB.
You can use SQL server to manage session.
Check this on serverfault.
https://serverfault.com/questions/19717/load-balanced-iis-servers-with-asp-net-inproc-session
I'm taking here from my experience of Java App Servers, some with very sophisticated balancing algorithms.
A reasonable general assumption is that "Session Affinity" is preferable to balancing every request. If we allocate the initial request for each user with some level of work-load knowledge (or even on a random basis) and the population comes and goes them we do end up with a reasonable behaviours. Remember that the objective is to give each user a good experience not to end up with evenly used servers!
In the event of a server failing we can then see our requests move eleswhere and we expect to see our session transfered. Lots of way to achieve that (session in DB, session state propogated via high speed messaging ...).
This isn't probably the answer you're looking for, but can you eliminate the NEED for session state? We've gone to great lengths to encode whatever we might need between requests in the page itself. That way I have no concern for state across a farm or scalability issues with having to hang onto something owned by someone who might never come back.
While you could use "sticky" sessions in your load balancer, a more optimal path is to have your Session use a State Server instead of InProc. At that point, all of your webservers can point to the same state server and share session.
http://msdn.microsoft.com/en-us/library/ms972429.aspx MSDN has plenty to say on the subject :D
UPDATE:
The State Server is a service on your windows server boxes, but yeah it produces a single point of failure.
Additionally, you could specify serialization of the session to a SQL Server, which wouldn't be a single point of failure if you had it farmed.
I'm not sure of how "heavy" the workload is for a state server, does anyone else have any metrics?

Resources