Is there a hard limit on socket.io connections? - node.js

Background
We have a server that has socket.io 2.0.4. This server receives petitions from a stress script that simulates clients using socket.io-client 2.0.4.
The scrip simulates the creation of clients ( each client with its own socket ) that sends a petition and immediately dies after, using socket.disconnect();
Problem
During the first few of seconds all goes well. But every test reaches a point in which the script starts spitting out the following error:
connect_error: Error: websocket error
This means that the clients my script is creating are not connecting to the server because they are unable to connect.
This script creates 7 clients per second ( spaced evenly throughout the second ), each client makes 1 petition and then dies.
Research
At first I thought there was an issue with file descriptors and limits imposed by UNIX, since the server is in a Debian machine:
https://github.com/socketio/socket.io/issues/1393
After following these suggestions, the issue remained however.
Then I though maybe my test script was not connecting correctly, so I changed the connection options as in this discussion:
https://github.com/socketio/socket.io-client/issues/1097
Still, to no avail.
What could be wrong?
I see the machine's CPU's are constantly at 100% so I guess I am pounding the server with requests.
But if I am not mistaken, the server should simply accept more requests and process them when possible.
Questions
Is there a limit to the amount of connections a socket.io server can handle?

When making such stress tests one needs to be aware of protections and gate keepers.
In our case, our stack was deployed in AWS. So first, the AWS load balancers started blocking us because they thought the system was being DDOSed.
Then, the Debian system was getting flooded and it started refusing connections with SYN_FLOOD.
But after fixing that we were still having the error. Turns out we had to increase TCP connection's buffer and how TCP connections were being handled in the kernel.
Now it accepts all connections, but I wish no one the suffering we went through to find it out...

Related

Which is the better way to implement heartbeat on the client side for websockets?

On the Server side for websockets there is already an ping/pong implementation where the server sends a ping and client replies with a pong to let the server node whether a client is connected or not. But there isn't something implemented in reverse to let the client know if the server is still connected to them.
There are two ways to go about this I have read:
Every client sends a message to server every x seconds and whenever
an error is thrown when sending, that means the server is down, so
reconnect.
Server sends a message to every client every x seconds, the client receives this message and updates a variable on the client, and on the client side you have a thread that constantly checks every x seconds which checks if this variable has changed, if it hasn't in a while it means it hasn't received a message from the server and you can assume the server is down so reestablish a connection.
You can achieve trying to figure out on client side whether the server is still online using either methods. The first one you'll be sending traffic to the server whereas the second one you'll be sending traffic out of the server. Both seem easy enough to implement but I'm not so sure which is the better way in terms of being the more efficient/cost effective.
Server upload speeds are higher than client upload speeds, but server CPUs are an expensive resource while client CPUs are relatively cheap. Unloading logic onto the client is a more cost-effective approach...
Having said that, servers must implement this specific logic (actually, all ping/timeout logic), otherwise they might be left with "half-open" sockets that drain resources but aren't connected to any client.
Remember that sockets (file descriptors) are a limited resource. Not only do they use memory even when no traffic is present, but they prevent new clients from connecting when the resource is maxed out.
Hence, servers must clear out dead sockets, either using timeouts or by implementing ping.
P.S.
I'm not a node.js expert, but this type of logic should be implemented using the Websocket protocol ping rather than by your application. You should probably look into the node.js server / websocket framework and check how to enable ping-ing.
You should set pings to accommodate your specific environment. i.e., if you host on Heroku, than Heroku will implement a timeout of ~55 seconds and your pings should be sent before this timeout occurs.

My application stops writing to files and opening new TCP connections after some time

I have no idea what could be causing this.
I have a Node application which connects to an external server over TCP and communicates with it. Part of its functionality also includes making relatively frequent HTTP requests.
Each instance of the application establishes up to 30 TCP connections to the external server, and makes HTTP requests as needed. Previously, I've been hosting the application on relatively cheap VPSes, with one instance of the application per server.
Now I'm setting it up on a proper dedicated server. I could set it up to run one instance on the dedicated and increase the connection limit that I've set so that one instance could cover several smaller instances on the VPSes, but I'd rather set up several instances of the application on the dedicated each limited to 30 connections.
The application also writes logs to disk (just a plain flat file), and sends logs via UDP to an external logging server. This is done using winston.
After some uptime, however, I'm experiencing an issue where HTTP requests time out (ETIMEDOUT) and the logs stop being written to disk. The application itself is still running, and the TCP connection to the server is still active and working. I can communicate with the application through that connection and it responds as expected. The logging server is still receiving the UDP packets as well. I've noticed that the log files stop being written to, but after a few minutes they appear to be flushed to disk finally, and the missed logs then appear.
My first suspicion was an open-files limit being hit, but the OS (Ubuntu) doesn't have a limit that I'm hitting. I tried disabling any Node HTTP Agent behavior (I'm using the request module, so I just passed false for the agent option).
It's not the webserver on the other end rejecting my connections. While the issue was occurring I was able to successfully wget a file from the webserver using the same external IP as the Node app is using.
I'm tailing the log file and noticing that the time between when a line is generated and when it's flushed to the disk is gradually increasing.
CPU and memory usage are low so there's no way that's the issue. iowait in top is 0.0. I have no idea where to go from here. Any help at all would be greatly appreciated.
I have Node 5.10.1.

socket.io disconnects clients when idle

I have a production app that uses socket.io (node.js back-end)to distribute messages to all the logged in clients. Many of my users are experiencing disconnections from the socket.io server. The normal use case for a client is to keep the web app open the entire working day. Most of the time on the app in a work day time is spent idle, but the app is still open - until the socket.io connection is lost and then the app kicks them out.
Is there any way I can make the connection more reliable so my users are not constantly losing their connection to the socket.io server?
It appears that all we can do here is give you some debugging advice so that you might learn more about what is causing the problem. So, here's a list of things to look into.
Make sure that socket.io is configured for automatic reconnect. In the latest versions of socket.io, auto-reconnect defaults to on, but you may need to verify that no piece of code is turning it off.
Make sure the client is not going to sleep such that all network connections will become inactive get disconnected.
In a working client (before it has disconnected), use the Chrome debugger, Network tab, webSockets sub-tab to verify that you can see regular ping messages going between client and server. You will have to open the debug window, get to the network tab and then refresh your web page with that debug window open to start to see the network activity. You should see a funky looking URL that has ?EIO=3&transport=websocket&sid=xxxxxxxxxxxx in it. Click on that. Then click on the "Frames" sub-tag. At that point, you can watch individual websocket packets being sent. You should see tiny packets with length 1 every once in a while (these are the ping and pong keep-alive packets). There's a sample screen shot below that shows what you're looking for. If you aren't seeing these keep-alive packets, then you need to resolve why they aren't there (likely some socket.io configuration or version issue).
Since you mentioned that you can reproduce the situation, one thing you want to know is how is the socket getting closed (client-end initiated or server-end initiated). One way to gather info on this is to install a network analyzer on your client so you can literally watch every packet that goes over the network to/from your client. There are many different analyzers and many are free. I personally have used Fiddler, but I regularly hear people talking about WireShark. What you want to see is exactly what happens on the network when the client loses its connection. Does the client decide to send a close socket packet? Does the client receive a close socket packet from someone? What happens on the network at the time the connection is lost.
webSocket network view in Chrome Debugger
The most likely cause is one end closing a WebSocket due to inactivity. This is commonly done by load balancers, but there may be other culprits. The fix for this is to simply send a message every so often (I use 30 seconds, but depending on the issue you may be able to go higher) to every client. This will prevent it from appearing to be inactive and thus getting closed.

Throttling express server

I'm using a very simple express server, with a PUT and GET routes on an Ubuntu machine, but if I use several clients (around 8) doing requests at the same time it very easily gets flooded and starts to return connect EADDRNOTAVAIL errors. I have found no way to avoid this other than reducing the number of requests per client, but is there a way to throttle answers on the server so that instead of returning error it queues petitions and serves them in due time?
Maybe it's better to check whether there are answers to requests on the client and not insert new ones if they have not been served? Client is here
Queuing seems to be wrong, you should first check your current ulimit (every connection needs a handle).
To solve your problem, just change the ulimit.

Node.js SSL server frozen, high CPU, not crashed but no connections

I hope anyone could help me with this issue.
In our company we are setting up a node.js server, connected to a Java Push server.
I'm using https module instead of http and SLL certificates.
The connection between node and clients is made by socket.io, in server and client.
At the same time the node.js server is client of the java server, this connection is being made with regular sockets (net.connect).
The idea is that users connect to the server, join some channels, and when some data arrive from java server, this is dispatched to the corresponding users.
Everything seems to work fine, but after a while, like randomly, having like between 450 and 700 users, the server's CPU reaches 100%, all the connections are broken, but the server is not crashed. The thing is that if you go to the https://... in the browser, you are not getting 404 or something like that but SSL connection error, and its really fast.
I tried to add logs everywhere, but there's not something like a pattern, its like random.
If anybody have the same problem or could bring me a clue, or a tip to debug better, I'll appreciate anything.
Thanks a lot.
Okay, the problem is solved. It is a problem that will occur in every Linux server. So, if you are working with one of these, you need to read this.
The reason was the default limit of files the Linux server had per each process.
Seems that ever single linux server comes with this limitation of 1024 files opened by each process, you can check your limit with:
# ulimit -n
To increase this number
# ulimit -n 5000 (for example)
Each socket creates a new virtual file.
For some reason my server was not displaying any error, the server just got frozen, stopping the log and no signal or evidence of anything. It was when I set up a copy of the server in another machine, when it started to send
warn: error raised: Error: accept EMFILE
warn: error raised: Error: accept EMFILE
warn: error raised: Error: accept EMFILE
...
Be careful because if you are not root, you will only change this for the current session and not permanently.
Trick: If you want to cound the number of files, in this case, the number of files opened by your node process, take note of your process id and call this command.
# ls -l /proc/XXXXX/fd | wc -l
Where XXXXX is the process id. This will help you to know if this is your problem, once you launch your node server, you can use this command to check if it reaches a top, and it stops growing after it gets frozen. (by default 1024 or "ulimit -n").
If you only want to check which files are open by the process:
# ls -l /proc/XXXXX/fd
Hope this can help you. Any way if you are setting up a node js server I'm pretty sure you want to do that to be sure it won't melt.
Finally if you need help in future errors without log, you can try to straceing or dtrussing process
# strace -p <process-id>
should do the job.

Resources