Load balancing sockets on a horizontally scaling WebSocket server? - node.js

Every few months when thinking through a personal project that involves sockets I find myself having the question of "How would you properly load balance sockets on a dynamic horizontally scaling WebSocket server?"
I understand the theory behind horizontally scaling the WebSockets and using pub/sub models to get data to the right server that holds the socket connection for a specific user. I think I understand ways to effectively identify the server with the fewest current socket connections that I would want to route a new socket connection too. What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.
I don't imagine this answer would be tied to a specific server implementation, but rather could be applied to most servers. I could easily see myself implementing this with vert.x, node.js, or even perfect.

First off, you need to define the bounds of the problem you're asking about. If you're truly talking about dynamic horizontal scaling where you spin up and down servers based on total load, then that's an even more involved problem than just figuring out where to route the latest incoming new socket connection.
To solve that problem, you have to have a way of "moving" a socket from one host to another so you can clear connections from a host that you want to spin down (I'm assuming here that true dynamic scaling goes both up and down). The usual way I've seen that done is by engaging a cooperating client where you tell the client to reconnect and when it reconnects it is load balanced onto a different server so you can clear off the one you wanted to spin down. If your client has auto-reconnect logic already (like socket.io does), you can just have the server close the connection and the client will automatically re-connect.
As for load balancing the incoming client connections, you have to decide what load metric you want to use. Ultimately, you need a score for each server process that tells you how "busy" you think it is so you can put new connections on the least busy server. A rudimentary score would just be number of current connections. If you have large numbers of connections per server process (tens of thousands) and there's no particular reason in your app that some might be lots more busy than others, then the law of large numbers probably averages out the load so you could get away with just how many connections each server has. If the use of connections is not that fair or even, then you may have to also factor in some sort of time moving average of the CPU load along with the total number of connections.
If you're going to load balance across multiple physical servers, then you will need a load balancer or proxy service that everyone connects to initially and that proxy can look at the metrics for all currently running servers in the pool and assign the connection to the one with the most lowest current score. That can either be done with a proxy scheme or (more scalable) via a redirect so the proxy gets out of the way after the initial assignment.
You could then also have a process that regularly examines your load score (however you decided to calculate it) on all the servers in the cluster and decides when to spin a new server up or when to spin one down or when things are too far out of balance on a given server and that server needs to be told to kick several connections off, forcing them to rebalance.
What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.
As described above, you either use a proxy scheme or a redirect scheme. At a slightly higher cost at connection time, I favor the redirect scheme because it's more scalable when running and creates fewer points of failure for an existing connection. All clients connect to your incoming connection gateway server which is responsible for knowing the current load score for each of the servers in the farm and based on that, it assigns an incoming connection to the host with the lowest score and this new connection is then redirected to reconnect to one of the specific servers in your farm.
I have also seen load balancing done purely by a custom DNS implementation. Client requests IP address for farm.somedomain.com and that custom DNS server gives them the IP address of the host it wants them assigned to. Each client that looks up the IP address for farm.somedomain.com may get a different IP address. You spin hosts up or down by adding or removing them from the custom DNS server and it is that custom DNS server that has to contain the logic for knowing the load balancing logic and the current load scores of all the running hosts.

Route the websocket requests to a load balancer that makes the decision about where to send the connections.
As an example, HAProxy has a leastconn method for long connections that picks the least recently used server with the lowest connection count.
The HAProxy backend server weightings can also be modified by external inputs, #jfriend00 detailed the technicalities of weighting in their answer.

I found this project that might be useful:
https://github.com/apundir/wsbalancer
A snippet from the description:
Websocket balancer is a stateful reverse proxy for websockets. It distributes incoming websockets across multiple available backends. In addition to load balancing, the balancer also takes care of transparently switching from one backend to another in case of mid session abnormal failure.
During this failover, the remote client connection is retained as-is thus remote client do not even see this failover. Every attempt is made to ensure none of the message is dropped during this failover.
Regarding your question : that new connection will be routed by the load balancer if configured to do so.
As #Matt mentioned, for example with HAProxy using the leastconn option.

Related

Horizontal scaling with a node.js app & socket io

My team and I are working on a digital signage platform.
We have ~ 2000 Raspberry Pi around the world connected to a Nodejs server using Socket IO. The Raspberries are initiating the connection.
We would like to be able to scale horizontally our application on multiple servers but we have a problem that we can’t figure out.
Basically, the application stores the sockets of the connected Raspberry in an array.
We have an external program that calls the API within the server, this results by the server searching which sockets will be "impacted" by the API call and send them the informations.
After lots of search, we assume that we have to stores the sockets (or their ID) elsewhere (Redis ?), to make the application stateless. Then, any server can respond to a API call and look the sockets in a central place.
Unfortunately, we can’t find any detailed example on how to do that.
Can you please help us ?
Thanks
(You can't store sockets from multiple server instances in a shared datastore like redis: they only make sense in the context of the server where they were initiated).
You will need a cluster of node.js servers to handle this. There are various ways to make a cluster. They all involve directing incoming connections from your RPis to a "generic" hostname, for example server.example.com. Behind that server.example.com hostname will be multiple node.js servers.
Each incoming connection from each RPi connects to just one of those multiple servers. (You know this, I believe.) This means one node.js server in your cluster "owns" each individual RPi.
(Telling you how to rig up a cluster of node.js servers is beyond the scope of this answer. Hints: round-robin DNS or a reverse-proxy nginx front end.)
Then, you want to route -- to fan out -- the incoming data from each API call to each server in the cluster, so the server can route it to the RPis it owns.
Here's a good way to handle that:
Set up a redis cache or other shared data store. It can be very small.
When each node.js server starts, have it register itself as active. That is, have it place its own specific address for handling API calls into the shared server. The specific address is probably of the form 12.34.56.78:3000: that is, an IP address and port.
Have each server update that address every so often, once a minute or so, to show it is still alive.
When an API call arrives at server.example.com, it will come to a more-or-less randomly chosen node.js server instance.
Get that server to read the list of server addresses from the redis cache
Get that server to repeat the API call to all servers except itself. Add a parameter like repeated=yes to the repeated API calls.
Then, each server looks at its list of connected sockets and does what your application requires.
On server shutdown, have the server unregister itself -- remove its address from redis -- if possible.
In other words, build a way of fanning out the API calls to all active node.js servers in your cluster.
If this must scale up to a very large number (more than a hundred or so) node.js servers, or to many hundreds of API calls a minute, you probably should investigate using message queuing software.
SECURE YOUR REDIS server from random cybercreeps on the internet.

does load balancer have all its child servers' sockets open

Let's say I have the following setup:
client (browser)-> load balancer (nginx) -> 10 nodejs servers
I'm using websocket (socket.io) for bidirectional communication between client and server.
My question is: when a client sends a message via socket, does load balancer simply redirect the socket request to another server like a normal http request, or something more complicated is going on, like making the socket remain open on the load balancer machine for the whole duration of the websocket connection between client and server.
Because if the above is true, it means that load balancer machine has the SUM of ALL its child servers' TCP connection opened, which means load balancer has to be quite a huge machine on it own, and it's not "entirely" distributing all the load.. and eventually I have to worry about "running out of sockets" on the load balancer machine?
Can anyone clarify this whole concept for me? Thanks!
For websockets, yes, the load balancer will almost certainly maintain a TCP connection in from the client and out to the server for every connection. The load balancer, however, has much less work to do than the server machines actually handling the clients -- it doesn't have to "think" about the protocol or generate or interpret any payload, it just has to copy data from one pipe to another and tear down connections when they close.
With an event-driven model and potentially using the Linux kernel's splice(2) system call, the "copied" data can be shuffled between the connections all in kernel space, for very efficient operation.
A well-designed load balancer is as likely to run up against the IPv4 limit of ~64K address/port pairs for a given IP address as any other resource, like CPU or memory.
Even with normal web traffic, where the balancer is understanding and making routing decisions based on request and response bodies, the app servers still typically have far more work to do than the balancer.
Anecdotally, some of the smallest machines in my infrastructure are the load balancers... and they still tend to have the smallest workloads as evidenced by memory, CPU, and disk access.

Azure load balancer session affinity not sticking. Why?

My client makes two http requests to my cloud service which has two replicas.
According to documentation (1) and since connection is kept alive, I'd expect the two requests to go to the same replica.
However, I see each request goes to a different replica. For performance reasons, this is undesirable.
What is causing the distribution?
How do I debug load balancer?
(1) https://azure.microsoft.com/en-us/documentation/articles/load-balancer-distribution-mode/
The default distribution will be 5-tuple (SourceIP, Destination IP, source Port, Destination Port, Protocol). It means that each new connection initiated by a client may land to a different server
If you use sourceIP, then the stickiness will be based on the client IP address
If you need application based stickiness (such as cookie based affinity, then you may look at https://azure.microsoft.com/en-us/documentation/services/application-gateway/
Yves

How to keep user requests on the same server when using IIS NLB?

I have two IIS servers running using NLB. Unfortunatelly I cannot use shared session server, so every server is using its own session. How can I ensure, that all requests from the same user are forwarded to the same IIS server?
Found this and decided to share with others:
Use the client affinity feature. When client affinity is enabled, Network Load Balancing directs all TCP connections to the same cluster host. This allows session state to be maintained in host memory. You can enable client affinity in the Add/Edit Port Rules dialog box in Network Load Balancing Manager. Choose either Single or Class C affinity to ensure that only one cluster host will handle all connections that are part of the same client session. This is important if the server application running on the cluster host maintains session state (such as server cookies) between connections. For more information about Network Load Balancing affinity, see Help in the Network Load Balancing snap-in.
I think what you're looking for is Sticky Sessions. Sticky sessions are implemented by your load balancer though. You probably need to setup an outside load balancer (BIG-IP, HAProxy, etc.) that can do sticky sessions.
You can do that easily as long as none of your customers use a distributed proxy system:
In the protieries of the NLB cluster, tab "port rules" you can choose the "filtering mode" and the affinity:
You cannot choose "none" because you don't have central sessions.
But "simple" would redirect every user to the same server as long as the ip stays the same.
If you e.g. anticiapte AOL proxy servers then "class C" might be a secure choice (albeit maybe reducing the load balancing a little bit), because the same class C net goes to the same server.
I guess that is easily implemented by MS in a way that both hosts know which ip is even or odd or which triplet of the class C net is even or odd and distribute the load always in the same way depending on the IP-address
Why would you want to do this? If it's because of session state then you should have a database or out-of-process server set up in a common place and have all nodes reference that.
I would consider a reverse proxy that sits in front of either server and remembers which external users are using which servers.
I know (from using it this way) Cherokee supports IPHash proxying but I'm sure there are lots more.
Just to add to Lloyd's answer, you should avoid using session in a load balanced environment anyway. The whole purpose behind using session is to avoid database calls; if you end up storing the session data back into the database you usually gain nothing.
The reason being that 1. you now have to make 2 database calls for each page load (retrieve and store) and 2. that data now has to go through serialization / deserialization boundaries. Most of the time this ends up being a more expensive operation than just retrieving the data you wanted to begin with.
Now, to your actual question. You do have the option to store the session data in the view state. Optionally, you could forgo session and instead use cookies. If you go this route, be sure to encrypt them on the way out and decrypt when receiving them.

Does a software load balancer manage a two-way SSL connection? If so, how?

I don't have the faintest clue on how a software or hardware load balancer works. I guess the hardware load balancer is basically a switch and based on some algorithm decides which node to switch to for a incoming request. On the software load balancer front, I guess the software picks up a node and uses a reverse proxy connection to it. In such a scenario, 2-way SSL wont work as the load balancer cannot have the client's private key.
Again, I don't how a software load balancer works but as my application would need a load balancer and as the application uses 2-way SSL connection, I wanted to know how does a software load balancer take care of a 2-way SSL connection.
No, SSL works with a load balancer. They typically work at the TCP level, so the clients connect to the LB IP address, but it NATs the connections on to the real servers. The connection persists to the same real server for its lifetime, but if the same client makes another one, it can (and typically would) go to a different server.
For HTTPS this works fine, except that if you have a web server which supports SSL session caching, then the SSL session cache will be lost if the client comes back to a different server. In practice this is not a big problem. Of course HTTP keep-alive sessions aren't affected because they are a single TCP connection so they stay on the same realserver.
Generally speaking, a software load balancer will note that there is a new incoming connection request, assess the workload on the machines available, and allocate the new request to the most appropriate machine. When there is a session-based service, that connection will last for the duration of the session; rebalancing would only occur if a server went down, and would probably establish new connections in a newly balanced configuration.
So, as Jon implied, the SSL session would be established with a server, and would continue with that server until the session terminates.
If you want to route connections more dynamically, then it may be that the SSL session has to be terminated (decrypted) in front of the software that dynamically sends requests to different servers.
All these are possible - they are not necessarily efficient or implemented.
A software load balancer will distribute sessions evenly across multiple servers.
So, if a user hits your load balancer, it will send him to a specific server and that server will negotiate the SSL. The user will continually talk to this server until his session expires. At that point, he will hit the load balancer again.

Resources