Websockets - Avoid unknown connections

Websockets - Avoid unknown connections - node.js

I have website X which runs a websocket server (socket.io, all based on node.js) and feeds clients with live data.
Now, competitor Y started connecting to our websocket server and straight out stealing data from it, displaying it also live on their website.
I've made numerous attempts at blocking their IPs, but they'll just keep changing it within a matter of minutes. (All kinds of AWS / cloud hosting providers). I check the referrer header, user-agent, accept-language, pretty much anything but all of that is spoofable and they do this already.
The websocket connections are proxied through nginx, if that helps.
What would you do?

Client X (the website) or client Y (the competitor) is meaningless. It's just a client. There is no reliable (i.e. impossible to hack) way to distinguish them unless you restrict IPs (which you already know that it fails). That's because client Y can easily construct a HTTP request/Websocket connection from scratch so that it looks like client X. And there's more: going down that road might be a waste of time and other resources. Eventually you will be hacked. The question is: which company has more resources to withstand this fight? :)
Authentication doesn't change much. Because client Y can authenticate as well. It's just instead of fighting with IPs you fight with user credentials. It might be easier though. You should try it.
So IMHO all in all what you end up with is constant monitoring and reactions. If they break law/agreements then you should sue them. If they don't then you can try this guerilla warfare. You might win eventually, who knows.

You could verify the Origin header in the websocket request matches the origin of your clients. However, I'm not sure if it's possible to fake this header at all.

An Authentication subsystem is a well-known solution to this problem to a large degree.

Related

How to do authentication over web sockets?

Why am I using websockets?
I'm working on routing all my HTTPS requests via a WebSocket, because my app has a chat feature and I need to keep the WebSocket open when the app is running, so why not just route all the requests through it.
My problem.
This turned out to be easier said that done. Should I use the same Access token & refresh token to verify client authentication. Or should I just verify it when the connection opens and then trust it for as long as it's open. So here are my questions:
Is wss(Web socket secure) enough to stop man in the middle attacks?
Should I generate a ticket sort of mechanism for every WebSocket connection, that lasts 2 - 10 minutes and then disconnect and ask the client to reconnect?
Or should I have a Access Token with every request from the client.
How to I make sure that when the server sends the data it is going to the right client.
Should I just end to end encrypt all the payloads to avoid a lot of problems?

Or should I just verify it when the connection opens and then trust it for as long as it's open.
That is fine as long as the connection is over a trusted channel, e.g. ssl/tls.
Is wss(Web socket secure) enough to stop man in the middle attacks?
Yes. Wss is simply ws over ssl/tls.
Should I generate a ticket sort of mechanism for every WebSocket connection, that lasts 2 - 10 minutes and then disconnect and ask the client to reconnect?
I'm not sure why would you do that. On the contrary, with chat-like app you want to keep the connection open as long as possible. Although I advice implementing ping calls on the client side and timeouts on the server side. With such approach you can require action on the client side every say 30s.
Or should I have a Access Token with every request from the client.
Not necessary. With ssl/tls you can authenticate the entire connection once and just remember on the server side that is authenticated. Tokens are used with the classical HTTP because it is easier to scale horizontally such app, e.g. it doesn't matter which server the connection goes to, you can even switch servers between calls and that won't affect auth. But with chat-like app (or any app that requires bidirectional communication) the connection has to be persistent to begin with, and thus tokens introduce unnecessary overhead.
How to I make sure that when the server sends the data it is going to the right client.
I'm not sure what you mean by that. That's pretty much what tcp + ssl/tls guarantees anyway. It is the same for any other protocol over secure tcp. Or do you mean at the app level? Well you have to match a user with a corresponding connection(s) once authenticated. The server has to track this.
Should I just end to end encrypt all the payloads to avoid a lot of problems?
What problems? E2E encryption serves very different purpose: it guarantees that you, a.k.a. the server, is unable to read messages. It guarantess high level of privacy, so that even the server cannot read messages, only peers. And so this is a business decision, not technical or security decision. Do you want to have full control over conversations? Then obviously you can't go with E2E. If on the other hand you want to give the highest level of privacy to your users then it is a good (if not mandatory) approach. Note that full featured E2E is inherently more difficult to implement than non-E2E.
I need to keep the WebSocket open when the app is running, so why not just route all the requests through it.
That is an interesting approach. I myself am thinking about doing that (and most likely will try it out). The advantage is that entire communication goes through a single protocol which is easier to debug. Another advantage is that with a proper protocol you can achieve higher performance. The disadvantage is that the classical HTTP is well understood, there are lots of tools and subprotocols (e.g. REST) covering it. Security, binary streaming (e.g. file serving), etc. are often managed out of the box. So it feels a bit like reinventing the wheel. Either way, I wish you good luck with that, hopefuly you can come back to us and tell us how it went.

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?

TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

What are some strategies to prevent flooding/abuse of api requests

I have an API on my server(node) that writes new data into my database.
To use the API the user is required to provide a token which acts as an identifier. So if someone floods my database or abuses the api, I can tell who it is.
But, what are some techniques I can use to prevent the ability to flood or hang my server all together? Notice that most request to the API are done by the server itself, so, in theory I might get dozens of requests a second from my own server's address.
I'd love to get some references to reading materials.
Thanks!

You could use this module: https://www.npmjs.com/package/ddos to put limits depending on the user.
However you will still be exposed to larger scale ddos attacks. These attacks cannot be stopped at the node.js level since they often target infrastructure. This is another can of worms however.

Try to configure limits on proxy or/and load balancer.
Alternatively, you can use rate-limiter-flexible package to limit number of requests by user per N seconds.
There is also black-and-white list, so you're able to whitelist your server's IP.

How to handle sticky-sessions with Socket.io 1.0 when behind a firewall?

I am trying to setup a POC for myself using Nginx, Node.js and Socket.io 1.0 using clustering on Rackspace. I am under the assumption that I need to use clustering because I want this to be scalable across multiple servers if needed. I want each node to have their own instance and as of now I can't see any need for each of the instances to have to talk to each other for any reason. Again as of now, I believe I need to use clustering for simply the fact that I may have many clients connecting to this server and I want it to be able to grown and shrink accordingly. My end goal is to build a little POC similar to what is shown here: https://cloud.google.com/developers/articles/real-time-gaming-with-node-js-websocket-on-gcp
I just got what I believe to be a valid setup of the new Socket.io 1.0 established, but when connecting from different devices behind my router, they are all showing the same PID in my logging and I assume this is due to the required sticky-sessioning by Socket.io. I am not sure if this is the same as the worker-process that we used to get with clustering, but again I am still trying to get my head wrapped around all this.
First I want to know if using clustering and sticky-sessions is required, since only 1 PID is issued for the same external IP, is there anyway to have each computer treated as its own instance? I do not want to send back a response that updates everyone behind that IP.
My second question is this and it may be a stupid question but i'm asking anyway :) In reading about how to get the sticky-sessions working I kept seeing people stating to "use sticky-sessions, like by IP Address". The word "like" is what got me. I seemed to have found people referring to using sticky-sessions with IP and cookies. Can you do it by anything else, such as a username, issued token or anything? My concern is if someone is playing with this on a mobile device and they switch towers, the tower will issue a new IP so in-turn a new PID would get issued and essentially that players game lost. Am I understanding this right?
Please forgive me as I am new to Node.js but thought this would be a cool way to learn node.js and clustering in the cloud. Any info or direction that anyone can provide would be of great help. Many of the tuts all seem to broadcast events to everyone but i am looking for a scalable solution where each connection can be sent events individually most fo the time. I also need to solve for a number of people behind the same firewall being treated as separate connections when the server communicates to them. Again if there is any reading or tutorials that you feel may help me with socket.io 1.0 and what I am trying to do, please reply. Thanks!

In general since you are using websockets you don't need to worry about stickiness as long as the connection does not terminate. This communication is bi-directional and the http connection is kept alive. If the connection drops the client is essentially reconnecting and starting over. So yes if anyone's ip gets renewed you will now get a new server socket.
Refer to article using-multiple-nodes where it states the requirement for XHR/JSONP long polling clients.
I don;t believe nginx has capabilities of load balancing on things like MAC address etc as per nginx load-balancing techniques.
I am thinking that you may need a solid load balancer that can use MAC addresses, virtual port ID or some headers for routing.

Dealing with / preventing potentially malicious requests (AWS, Node.js)

I have a server that is running on aws - it's load balanced to some ec2 instances that run node.js servers. The security groups are set up so that only the LB can hit them on the HTTP port.
I was tailing some log files, and saw a bunch (50 or so at a time, seemingly somewhat periodically) of requests to /manager/html - AFAIK this looks like an attempt to expose a vulnerability in my app or gain access to a database manager of some sort.
My questions are:
Am I being targeted or are these random crawlers? This is on a service that is not even launched yet, so it's definitely obscure. There's been a bit of press about the service, so it's feasible that a person would be aware of our domain, but this subdomain has not been made public.
Are there common conventions for not allowing these types of requests to hit my instances? Preferably, I'd be able to configure some sort of frequency or blacklist in my LB, and never have these types of requests hit an instance. Not sure how to detect malicious vs normal traffic though.
Should I be running a local proxy on my ec2 instances to avoid this type of thing? Are there any existing node.js solutions that can just refuse the requests at the app level? Is that a bad idea?
Bonus: If I were to log the origin of these requests, would that information be useful? Should I try to go rogue and hunt down the origin and send some hurt their way? Should I beeswithmachineguns the originating IP if it's a single origin? (I realize this is silly, but may inspire some fun answers).
Right now these requests are not effecting me, they get 401s or 404s, and it has virtually no impact on other clients. But if this were to go up in scale, what are my options?

There are too many random automated requests are being made, even I host a nodejs server, they try to use cgi and phpmyadmin/wordpress configs. You can just use basic rate limiting techniques (redis-throttle)[https://npmjs.org/package/redis-throttle] for both your NodeJS server and ssh fail2ban to protect yourself from simple DoS attacks.
Automatic requests cannot do harm unless NodeJS or the libraries you have as well known flaws, so you should be always input & security checking all over your server. You should not be worried if you coded well. (Don't dump errors to users, sanitize input etc.)
You can log your 401 and 404s for a week, and filter the most common ones via your LB. Hunting down the IPs and sources will not help you if you are not a hollywood producer or fighting terrorists, as yoır problem is not so imporant and most importantly these requests are mostly from botnets.

We had faced similar issues in the past and we had taken some preventive measures to stop such attacks though it can't guarantee to stop them completely but it showed significant measures in the reduction of such attacks.
http://uksysadmin.wordpress.com/2011/03/21/protecting-ssh-against-brute-force-attacks/
http://www.prolexic.com/knowledge-center-white-paper-ddos-mitigation-incident-response-plan-playbook.html
https://serverfault.com/questions/340307/how-can-i-prevent-a-ddos-attack-on-amazon-ec2
Hope this helps.

Consider running a proxy cache like Varnish in front of your app servers. Use it's VCL to allow access to only the URI you define and reject everything else, allow GET but block PUT and POST, etc... Can also be used to filter http response headers you return. This would let you mask your node.js server as apache for example. Many tuts out on the net to implement this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string