I have setup a Node.JS server with socket.io on a VPS and I broadcast every 10 seconds the number of connected clients to all. This usually works fine, though often times, the connection can't be established and I get this error (I changed the IP a bit):
GET http://166.0.0.55:8177/socket.io/1/?t=1385120872574
After reloading the site, usually the connection can be established, though I have no idea why the failed connection happens in the first place, also I don't know how to debug the socket.io code. Sometimes I can't connect to the server anymore and I have to restart the server.
Additional information:
My main site runs on a different server (using a LAMP environment with CakePHP) than the Node.js server.
I use forever to run the server
I have a lot of connected clients (around 1000)
My VPS has 512 MB Ram and the CPU is never higher than 25%
After top command, try:
socket.on('error', function (err) {
console.log("Socket.IO Error");
console.log(err.stack); // this is changed from your code in last comment
});
Also, you could try a the slower transport. Socket.io use Websocket by default but if your server cannot allocate enough resource, you can try another transport which is slower but use less resources
io = socketIo.listen(80);
io.set('transports', ['xhr-polling']);
Related
I am working on a nodejs app with Socket.io and I did a test in a single process using PM 2 and it was no errors. Then I move to our production environment(We use Google Cloud Compute Instance).
I run 3 app processes and a iOS client connects to the server.
By the way the iOS client doesn't keep the socket connection. It doesn't send disconnect to the server. But it's disconnected and reconnect to the server. It happens continuously.
I am not sure why the server disconnects the client.
If you have any hint or answer for this, I would appreciate you.
That's probably because requests end up on a different machine rather than the one they originated from.
Straight from Socket.io Docs: Using Multiple Nodes:
If you plan to distribute the load of connections among different processes or machines, you have to make sure that requests associated with a particular session id connect to the process that originated them.
What you need to do:
Enable session affinity, a.k.a sticky sessions.
If you want to work with rooms/namespaces you also need to use a centralised memory store to keep track of namespace information, such as the Redis/Redis Adapter.
But I'd advise you to read the documentation piece I posted, things might have changed a bit since the last time I've implemented something like this.
By default, the socket.io client "tests" out the connection to its server with a couple http requests. If you have multiple server requests and those initial http requests don't go to the exact same server each time, then the socket.io connect will never get established properly and will not switch over to webSocket and it will keep attempting to use http polling.
There are two ways to fix this.
You can configure your clients to just assume the webSocket protocol will work. This will initiate the connection with one and only one http connection which will then be immediately upgraded to the webSocket protocol (with socket.io running on top of that). In socket.io, this is a transport option specified with the initial connection.
You can configure your server infrastructure to be sticky so that a request from a given client always goes back to the exact same server. There are lots of ways to do this depending upon your server architecture and how the load balancing is done between your servers.
If your servers are keeping any client state local to the server (and not in a shared database that all servers access), then you will need even a dropped connection and reconnect to go back to the same server and you will need sticky connections as your only solution. You can read more about sticky sessions on the socket.io website here.
Thanks for your replies.
I finally figured out the issue. The issue was caused by TTL of backend service in Google Cloud Load Balancer. The default TTL was 30 seconds and it made each socket connection tried to disconnect and reconnect.
So I updated the value to 3600s and then I could keep the connection.
I need to handle users disconnecting from my sockjs application running in xhr-polling mode. When I connect to localhost, everything works as expected. When I put apache between nodejs and browser, I get ~20 sec delay between closed browser and disconnect event inside nodejs. My apache proxy config is following:
<Location />
ProxyPass http://127.0.0.1:8080/
ProxyPassReverse http://127.0.0.1:8080/
</Location>
The rest of the file is default, you can see it here. I tried playing with ttl=2 and timeout=2 options, but either nothing changes, or I get reconnected each 2 seconds without closing browser. How can I reduce additional disconnect timeout, introduced, but apache, somewhere in its defaults?
It's possible that your Apache server is configured to use HTTP Keep Alive which will keep a persistent connection open. In that case I would try disabling KeepAlive, or lowering the KeepAliveTimeout setting in your Apache configuration to see if this solves the problem.
If that doesn't work, I would also take a look at netstat and see what is the status of each socket and start a root cause analysis. This chart gives is the TCP state machine and can tell you where each connection is. Wireshark can also give you some information on what is going on.
In long polling the connection happens like below
<client> ---> apache ---> <node.js>
When client breaks the connection
<client> -X-> apache ---> <node.js>
Apache still keeps the connection open. Now there are two workaround to this
ProxyTimeout
You can add below to your apache config
ProxyTimeout 10
This will break the connection after 10 seconds, but then this break every long polling connection after 10 seconds, which you don't want
Ping
Next option is to ping the client
pingTimeout (Number): how many ms without a pong packet to consider the connection closed (60000)
pingInterval (Number): how many ms before sending a new ping packet (25000).
var io = require('socket.io')(server, { 'transports': ['polling'], pingTimeout: 5000, pingInterval: 2500});
Above will make sure the client is disconnected within 5 seconds of going off, you can lower it again further but then this may impact the usual loading scenarios
But reading through all the posts, threads and sites, I don't think you can replicate the behavior you get when connect to socket.io directly, because the connection break then can be detected easily by socket.io
The 20 sec delay not in the apache proxy. I got the same issue, delay not happening in the local URL, but delay happening in global URL.
The issue was solved in the NodeJs itself. Need to send one time data from server to client to make sure it's initialized. This problem not in the documentation and issues blog at the WebSocket plugin.
Send a dummy message after request accepted in the server, like below.
let connection = request.accept(null, request.origin);
connection.on('message', function (evt) {
console.log(evt);
});
connection.on('close', function (evt) {
console.log(evt);
});
connection.send("Success"); //dummy message to the client from server
I have a big problem with socket.io connection limit. If the number of connections is more than 400-450 connected clients (by browsers) users are getting disconnected. I increased soft and hard limits for tcp but it didn't help me.
The problem is only for browsers. When I tried to connect by socket-io-client module from other node.js server I reached 5000 connected clients.
Its very big problem for me and totally blocked me. Please help.
Update
I have tried with standard Websocket library (ws module with node.js) and problem was similar. I can reach only 456 connected clients.
Update 2
I devided connected clients between a few instances of server. Every group of clients were connecting by other port. Unfortunately this change didn't help me. Sum of connected users was the same like before.
Solved (2018)
There were not enough open ports for a Linux user which run the pm2 manager ("pm2" or "pm" username).
You may be hitting a limit in your operating system. There are security limits in the number of concurrent files open, take a look at this thread.
https://github.com/socketio/socket.io/issues/1393
Update:
I wanted to expand this answer because I was answering from mobile before. Each new connection that gets established is going to open a new file descriptor under your node process. Of course, each connection is going to use some portion of RAM. You would most likely run into the FD limit first before running out of RAM (but that depends on your server).
Check your FD limits: https://rtcamp.com/tutorials/linux/increase-open-files-limit/
And lastly, I suspect your single client concurrency was not using the correct flags to force new connections. If you want to test concurrent connections from one client, you need to set a flag on the webserver:
var socket = io.connect('http://localhost:3000', {'force new connection': true});
So, I've been trying to test out my server code, but client sockets catch 'error' when 1012 connections have been established. Client simulator keeps trying 'til it's tried to connect as many times as I've told it to (obviously). Though, as stated, the server is unwilling to serve more than 1012 connections.
I'm running both client simulator & server on the same computer (might be dumb, but shouldn't it work anyway?).
(Running on socket.io)
To increase the limit of open connection/files in Linux:
ulimit -n 2048
Here is more info regarding ulimit
Consider this small server for node.js
var net = require ('net');
var server = net.createServer(function (socket) {
console.log("Connection detected");
socket.on('end', function() {
console.log('server disconnected');
});
socket.write("Hello World");
socket.end();
});
server.listen("8888");
When I test the server with Chrome on my Macbook Pro, I get three times the "Connection detected" message in the console.
I know one is for well connecting, another for the favicon, but what's the third one all about?
I tested it with Firefox and wget (which is a Linux command line program), as well as telnet to do the deep investigation. Surprisingly, all of these don't make any extra connection (obviously they don't even try to download the favicon). So I fired up Wireshark and captured a session, and quickly discovered that Chorme systematically makes useless connection, ie it just connects (SYN, SYN-ACK, ACK) and then closes the connection (RST, ACK) without sending anything.
Just a quick googlin and I found this bug report (excerpt):
I suspect the "empty" TCP connections are
backup TCP connections,
IPv4/IPv6 parallel connections, or
TCP pre-connections,
A backup TCP connection is made only if the original TCP connection is
not set up within 250 milliseconds. IPv4/IPv6 parallel connections
are made only if the server has both IPv6 and IPv6 addresses and the
IPv6 connection is not set up within 300 milliseconds. Since you're
testing a local server at localhost:8080, you should be able to
connect to it quickly, so I suspect you are seeing TCP
pre-connections.
To verify if the "empty" TCP connections are TCP pre-connections, open
the "wrench" menu > Settings > Under the Hood > Privacy, and clear the
"Predict network actions to improve page load performance" check box.
Shut down and restart Chrome. Are the "empty" TCP connections gone?
For further reference, see the linked thread, which explains more in-depth what backup, parallel and pre-connections are and if/why this is a good optimization.