I am working on a nodejs application and the requirement is to send around 10k requests per second per connection. The client application has to open one websocket connection to send these requets and at the server side it has to just receive and send the data to a queue. The number of socket connections at the server side isn't that much, may be around 1k. I have few questions regarding this and any help is greatly appreciated.
First, is it possible to achieve this setup with a single master process? Since I cannot share the web socket connections with the child processes I need to get the bandwith from master process.
When I tried benchmarking nodejs ws library, I was only able to send approximately 1k requests per second of 9kb each. How can I increase the throughput?
Are there any examples on how to achieve max throughput since I can only find posts with how to achieve max connections?
Thanks.
You will eventually need to scale
First, is it possible to achieve this setup with a single master process?
I don't really think its possible to achieve this with a single thread.
(you should consider scaling and never design restricting yourself from that option)
Since I cannot share the web socket connections with the
child processes I need to get the bandwith from master process.
Im sure you will be happy to know about the existance of socket.io-redis.
With it you will be able to send/receive events (share clients) between multiple instances of your code (processes or servers). Read more : socket.io-redis (Github)
I know you are speaking about ws but maybe its worth the change to socket.io. Image Source Article
Specially knowing you can scale both vertically (increase the number of threads per machine)
and horizontally (deploy more "masters" accross more machines) with relative ease. (and I repeat myself : sharing your socket clients & communications accross all instances)
When I tried benchmarking nodejs ws library, I was only able to send
approximately 1k requests per second of 9kb each. How can I increase
the throughput?
I would suggest trying socket.io + socket.io-redis
spawn a master with a number of workers equal to the number of CPU
cores. (vertical scaling)
deploy your master accross 2 or more machines (horizontal scaling)
learn about load-balancing & perform benchmarks.
Are there any examples on how to achieve max throughput since I can
only find posts with how to achieve max connections?
You will increase the total throughput if you increase the number of instances communicating with clients. (Horizontal + Vertical Scaling)
socket.io-redis Application Example (github)
Using Redis with Node.js and Socket.IO (tutorial)
This might also be an interesting read :
SocketCluster 100k messages / sec (ycombinator)
SocketCluster (github)
Hope it helped.
Related
I am making a live app with the use of websockets (express-ws npm package) in node.js.
The users send a message via ws every 10 seconds. Each of such requests takes about 1-1.5 milliseconds to handle (I have made some .time benchmarks). Everything works perfectly while there are less than ~9000 connections. However, if it grows above that, those 9000 requests every 10 seconds take 9000*1.5=13500ms > 10s and some users do not get their requests handled (as node.js is single-threaded). This is my first live app that gets so many online users at the same time so I do not know what to do. How to handle that many connections correctly?
I have read some articles about that and I have found some solutions which do not seem to work for me (at least I do not understand how to make them working).
Use the cluster module. The problem is that the requests have to share variables. I have an array of data which is updated or read during every request and clusters, as I have read and tested, are basically another processes which cannot share memory.
The same applies to worker_threads. They can kinda share memory, but I have to set up the communication between all threads and it still comes up to handling 9000 connections in 10 seconds which are not significantly faster than 9000 connections that have been in the beginning (that 9000 requests are simply a database search and an update with a few validations whether a user is registered and has provided valid data). Probably, if I throw the validation to a worker thread, the connections limit will grow up to 13000, but it is still insufficient.
I thought of creating a separate server on an another port (probably even in c++) and send all the requests that have passed the validation there (websocket between the servers). That seems like the best solution as for now but it still comes up to handling 9000 requests in one thread which will not make it much better.
So, how do I handle that many requests that need to share a variable efficiently? How do game servers which need to update the states of thousands of players multiple times per second do that?
In nodejs api doc, it says
The cluster module supports two methods of distributing incoming
connections.
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
The second approach should, in theory, give the best performance. In
practice however, distribution tends to be very unbalanced due to
operating system scheduler vagaries. Loads have been observed where
over 70% of all connections ended up in just two processes, out of a
total of eight.
I know PM2 is using the first one, but why it doesn't use the second? Just because of unbalnced distribution? thanks.
The second may add CPU load when every child process is trying to 'grab' the socket master sent.
Trying to build a TCP server using Spring Integration in which keeps connections may run into thousands at any point in time. Key concerns are regarding
Max no. of concurrent client connections that can be managed as session would be live for a long period of time.
What is advise in case connections exceed limit specified in (1).
Something along the lines of a cluster of servers would be helpful.
There's no mechanism to limit the number of connections allowed. You can, however, limit the workload by using fixed thread pools. You could also use an ApplicationListener to get TcpConnectionOpenEvents and immediately close the socket if your limit is exceeded (perhaps sending some error to the client first).
Of course you can have a cluster, together with some kind of load balancer.
I'm implementing a websocket-secure (wss://) service for an online game where all users will be connected to the service as long they are playing the game, this will use a high number of simultaneous connections, although the traffic won't be a big problem, as the service is used for chat, storage and notifications... not for real-time data synchronization.
I wanted to use Alchemy-Websockets, but it doesn't support TLS (wss://), so I have to look for another service like Fleck (or other).
Alchemy has been tested with high number of simultaneous connections, but I didn't find similar tests for Fleck, so I need to get some real info from users of fleck.
I know that Fleck is non-blocking and uses Async calls, but I need some real info, cuz it might be abusing threads, garbage collector, or any other aspect that won't be visible to lower number of connections.
I will use c# for the client as well, so I don't need neither hybiXX compatibility, nor fallback, I just need scalability and TLS support.
I finally added Mono support to WebSocketListener.
Check here how to run WebSocketListener in Mono.
10K connections is not little thing. WebSocketListener is asynchronous and it scales well. I have done tests with 10K connections and it should be fine.
My tests shows that WebSocketListener is almost as fast and scalable as the Microsoft one, and performs better than Fleck, Alchemy and others.
I made a test on a Windows machine with Core2Duo e8400 processor and 4 GB of ram.
The results were not encouraging as it started delaying handshakes after it reached ~1000 connections, i.e. it would take about one minute to accept a new connection.
These results were improved when i used XSockets as it reached 8000 simultaneous connections before the same thing happened.
I tried to test on a Linux VPS with Mono, but i don't have enough experience with Linux administration, and a few system settings related to TCP, etc. needed to change in order to allow high number of concurrent connections, so i could only reach ~1000 on the default settings, after that he app crashed (both Fleck test and XSocket test).
On the other hand, I tested node.js, and it seemed simpler to manage very high number of connections, as node didn't crash when reached the limits of tcp.
All the tests where echo test, the servers send the same message back to the client who sent the message and one random other connected client, and each connected client sends a random ~30 chars text message to the server on a random interval between 0 and 30 seconds.
I know my tests are not generic enough and i encourage anyone to have their own tests instead, but i just wanted to share my experience.
When we decided to try Fleck, we have implemented a wrapper for Fleck server and implemented a JavaScript client API so that we can send back acknowledgment messages back to the server. We wanted to test the performance of the server - message delivery time, percentage of lost messages etc. The results were pretty impressive for us and currently we are using Fleck in our production environment.
We have 4000 - 5000 concurrent connections during peak hours. On average 40 messages are sent per second. Acknowledged message ratio (acknowledged messages / total sent messages) never drops below 0.994. Average round-trip for messages is around 150 miliseconds (duration between server sending the message and receiving its ack). Finally, we did not have any memory related problems due to Fleck server after its heavy usage.
I'm trying to scale a chatter app using socket.io + cluster. Is it possible for child processes to handle incoming request belong to its process id (assigned when fork)?
For example:
http://mydomain/calculate?process=1
The above request is only handled by process 1, other processes will ignore it. In this way, I want to make sure requests of the same room are handled by same process, so I may don't have to use RedisStore as socket.io backend.
I also wonder how RedisStore works, because when using it, I found io.sockets.manager.rooms data are not accurate in all processes.
Edit:
Put it another way: can cluster master process dispatch request to different child processes based on the querystring?
The answer is no. The OS takes care of load balancing in this situation and in order to process query string you already have to be connected to a web server ( in your case child process ).
From my experience I find cluster a bit useless. It is a lot easier to spawn multiple NodeJS processes ( on multiple ports ) and put a proxy ( nginx? ) in front of them. It is easy and scalable.
As for socket.io: I don't think it works correctly with cluster ( because of sharing global variables, which causes issues ). Again: spawning separate NodeJS processes should fix the problem. Also it will be useful once you reach the point when you will have to scale to multiple machines. Any tricks with cluster won't help you at that point.
One last note: socket.io does not scale well. I suggest writing your own WebSocket server ( based on WS for example ) and implement your own scaling mechanism. For example based on all-to-all UDP pinging, which should scale well when dealing with small amount of servers ( 50? 100? ).