How to handle socket connections in a docker-swarm environment - node.js

I am building a webapplication using nodejs as the server and Docker Swarm to handle replication and load balancing.
Right now, I need to handle real-time data updates between clients and the replicated servers, so i thought of using Socket.IO to handle the connections. All the requests pass via an NGINX server that redirect it to the manager node of the swarm, and its him that handles the balancing.
Since the topology of the network can change rapidly based on the load of the network, i am reticent of letting NGINX handle the balancing and applying sticky sessions... (maybe am wrong)
For my understanding with this setup, if a client connects to my server, the load balancer of docker will send the request to one of my N replicated servers, and this an only this server will know that the client connected.
So, its possible that if some traditional HTTP-Request updates my data on another replica, the information will not be sent because of the lack of existence of this connection in the given server.
Is there a way of handling situations like this? I thought of including a Message queue between servers to send the data to all of them and then the one containing the connection will send the data, but is that the recommended way of doing it?
Thank you very much

I investigated a bit further since the time of the question. I'll post what I found in case it helps somebody with a similar issue.
One option i found is to use a MessageQueue or something similar to broadcast the messages to all the replicas, then each one filters only the messages that can send because the replica itself has knowledge of the TCP connections available in that replica.
But i think that would put excessive stress over the replicas because all of them are receiving all of the messages, so one solution would be to create a queue or a service that links the id of the given connection to the replica, and forward the messages only to those replicas interested.
I think it can be easily done with topics or making a queue for each tcp connection with some id as a identifier, and then pushing to the corresponding queue.
If anyone sees any problem or wants to add something, it will be very much appreciated!

Related

Working with WebSockets and NodeJs clusters

I currently have a Node server running that works with MongoDB. It handles some HTTP requests, but it largely used WebSockets. Basically, the server connects multiple users to rooms with WebSockets.
My server currently has around 12k WebSockets open and it's almost crippling my single threaded server, and now I'm not sure how to convert it over.
The server holds HashMap variables for the connected users and rooms. When a user does an action, the server often references those HashMap variables. So, I'm not sure how to use clusters in this. I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Does anyone have any ideas on what to do?
Thank you.
You can look at the socket.io-redis adapter for architectural ideas or you can just decide to use socket.io and the Redis adapter.
They move the equivalent of your hashmap to a separate process redis in-memory database so all clustered processes can get access to it.
The socket.io-redis adapter also supports higher-level functions so that you can emit to every socket in a room with one call and the adapter finds where everyone in the room is connected, contacts that specific cluster server, and has it send the message to them.
I thought maybe creating a thread for every WebSocket message, but I'm not sure if this is the right approach, and it would not be able to access the HashMaps for the other users
Threads in node.js are not lightweight things (each has its own V8 instance) so you will not want a nodejs thread for every WebSocket connection. You could group a certain number of WebSocket connections on a web worker, but at that point, it is likely easier to use clustering because nodejs will handle the distribution across the clusters for you automatically whereas you'll have to do that yourself for your own web worker pool.

redis in Node.js app environment

I am building an app with several Node.js instances as a Backend (http server, socket server and several a pool of domain servers). Now I am trying to cover several communication and configuration aspects and am wondering if redis makes an appropriate solution.
So, I would use it for the following purposes:
Implementation of a shared run-time lookup table. It's a table of several hundreds of relativelly simple records, accessed and manipulated by 2 node-instances.
Implementation of message queues. Each domain server receives commands from the http server and should execute them sequentially. Domain server should be able to listen on a redis-event, and execute each new command upon its arival
socket sever also has a regis message queue and listen to its event, in order to push notification to connected clients
Is redis "too heavy" for such a purpose?
Does it offer all needed functionality?
I can definitelly implement a look-up in a file and/or memory and a queue using sockets. However, it might make a code cleaner and a solution more robust with redis.
Redis is definitely not a heavy solution, on the contrary.
It's small, insanely fast (when using pipelining), easy to deploy. I consider it as a light solution, a kind of swiss knife that may solves many problems.
Redis based message queues are OK if you don't expect any guarantee on the message delivery. That is to say Redis based queues can't assure you the client has received the message. If it's a problem for your application you should consider using an heavier solution, like 0mq or Rabbitmq.

node.js server with socket.io handling 50000 simultaneous clients

We are developing a Javascript control which should be constantly connected to a server for receiving animation updates.
We are planning to host this stuff on an Amazon cloud.
The scenario is like this: server connects to activemq queue waiting for updates, for each update it broadcasts it to all connected clients.
Is it even possible to handle such load with node.js + socket.io?
Will a single node.js server be able to handle such load?
How to organize fast transport between different nodes if we will have to use more than one node?
Will single node.js server be able to handle such load?.. How to organize fast transport between different nodes if we will have to use more than one node
You say that you are planning to host on Amazon. So first off, nothing should be scoped for a single server. Amazon machines will simply "disappear", you have to assume that you are going to use multiple computers.
...handling 50k simultaneous clients
So to start with, 50k connections for a single box is a very big number. Here's a very detailed blog post discussing "getting to 10k" with node.js+socket.io.
Here's a very telling quote:
it seemed as though 10,000 clients simply required more serialization
than my server was able to handle.
So a key component to "getting to 50k" is going to be the amount of work required just pushing data over the wire.
How to organize fast transport between different nodes if we will have to use more than one node.
That blog post is the first of 3. When you're done the first, read the other two. That should point you in the right direction.

With a Node.js cluster, how do I share connections?

I have an Azure hosted application (iisnode) that accepts direct connections from multiple client services. This application streams data between the various connections. If running on a system with multiple instances of node.js, the actual TCP connections will be connecting to different instances.
Is there a way to somehow "move" or "share" the in-memory connection from one instance to another?
Sure, I could build some inter-instance communication to route data, but I don't think the application will scale since it's entire purpose is to move data around quickly. For example, I would have 4 instances, 100 connections to each, and I would spend as many resources moving the data between instances as I would spend moving the data between the client connections.
When you configure iisnode to create more than one node.exe process (using the nodeProcessCountPerApplication setting), it will dispatch incoming HTTP requests between them using a round robin logic; the application has no control over that behavior. Given your scenario there is no way to deterministically ensure that the requests ("connections") from two distinct clients will be colocated in the same node.exe process.
There is no mechanism to "move" an existing TCP connection or HTTP request between node.exe processes.
In general a better way to address such a notification scenario may be to use a subscription-based messaging infrastructure as your backend. ServiceBus in Azure provides such mechanisms. In this design, each instance of node.exe would subscribe to a particular topic when it receives a connection from the client, and be notified by ServiceBus when a matching notification arrives, possibly via a different instance of node.exe.

Node.js high-level servers' communication API

folks. I wander whether there is any high-level API for servers' communication in Node.js framework? For example, I have several servers, where my application runs and I want to control loading of this servers. Sometimes, if some server is overloaded I want to redirect some connection requests to another(more free one). Are there any functions which could help me? Or I have to implement my own functionality?
Try looking at cluster. This allows you to control multiple node proccess and scale nicely.
Alternatively just set up TCP sockets and pass messages around over TCP or pass messages around over a database like redis.
You should be able to pipe HTTP connection down streams. You have one HTTP server as a load balancer and this server just passes messages on to your other servers and passes them back.
You're looking for what's called a load balancer. There are many off-the-shelf solutions, nginx being one of the standards today (and VERY quick/easy to set up).
I don't know of a node-native solution, but it's not that hard to write one. In general, however, load balancers don't actually monitor server load, they monitor whether a server is live or not and distribute traffic relatively equally.
As for your communications question, no -- there's no standardized API to communicate to/from node.js servers. Again, however, not hard to set up -- Assuming you're already hosting HTTP (using express, or native), just listen for specific requests, perhaps to /comm/ or whatever you deem appropriate and pass JSON back-and-forth.
Not too sure for Nodejs but I've heard ppl using Capistrano with Nodejs

Resources