socket.io, node cluster, express, session store with redis - node.js

I'm not exactly sure how to describe this, but I'm running a node app with a cluster on 4 cores, port 80, using RedisStore as the socket.io store, express.js, and socket.io listens on it.
Some interesting behavior that happens is that on around 40% of clients that connect to socket.io using Chrome (and Firefox, but we stopped bothering using different browsers because it seems to be across the board), it connects and works fine for the first 25-30 seconds, then after that there is 60 seconds of dead time where requests are sent from the client but none are received or acknowledged by the server, and at 1.5 min - 1.6 min, the client starts a new websocket connection. Sometimes that new websocket connection has the same behavior, at other times the connection "catches" and is persistent for the next few hours and is absolutely fine.
What's funny is that this behavior doesn't happen on our test server that runs on a different port (which also uses a cluster); and moreover it doesn't happen on any of our local development servers (some of which implement clusters, others of which don't).
Any ideas?

Related

Why socket.io disconnects for every 5 second and then reconnects on client side?

This might be a simple question.Any help is really appreciated.
I am using socket.io with express framework on AWS.
I am running a timer using socket.io .I need to constantly emit server time for each seconds.My sockets get disconnected for every 5 seconds and then reconnects automatically.This causes a break in my timer.I really believe that this won't be a problem with code as everything worked good in my previous server.
Is there any configuration that we need to handle in AWS to avoid such disconnects?

Socket.io disconnects every 5 minutes

In Chrome, Socket IO seems to stop transmitting data. Is there an internal reason for this?
I've tried a very simple client and simple server side but consistently the server stops receiving any emits after 5 minute, will then reconnect and it's fine for another 5 minutes.
On top of the internal ping mechanism I have a polling mechanism which sends back session data every 20 seconds.
I don't use WebSocket with NodeJS or Socket.io but experienced the same behaviour with Jetty. It turns out that Jetty has an idle timeout default to 5 minutes (or 300 seconds) for all WebSocket's sessions. You could change the default idle timeout value to an appropriate value or ping/pong those connections before it timed out.
In my situation, I decided to use ping/pong as it also helps determine when the connection is no longer there. I observed that in some cases, connection was not closed even when the network is down.
According to engine.io (which is used by socket.io) docs, the server seems to have default pingInterval of 25 seconds. So unless you inadvertently disabled or changed default options, the ping/pong mechanism should be in place.

Socket.io huge server response time when using xhr-polling

I am trying to scale a messaging app. Im using nodeJS with Socket.io and Redis-Store on the backend. The client can be iphone native browser, android browsers .. etc
I am using SSL for the node connection, using Nginx to load balance the socket connections. I am not clustering my socket.io app , instead i am load balancing over 10 node servers ( we have a massive amount of users ). Everything looks fine when the transport is Websockets , however when it falls back to xhr-polling ( in case of old android phones ) I see a HUGE response time of up to 3000 rpm in New-relic. And I have to restart my node servers every hour or so otherwise the server crashes.
I was wondering if I am doing anything wrong , and if there are any measures I can take to scale socket.io when using xhr-polling transport ? like increasing or decreasing the poll duration ?
You are not doing anything wrong, xhr-polling is also called long polling. The name comes from the fact that the connection is maintained open longer, usually until some answer can be sent down the wire. After the connection closes, a new connection is open waiting for the next piece of information.
You can read more on this here http://en.wikipedia.org/wiki/Push_technology#Long_polling
New Relic shows you the response time of the polling request. Socket.IO has a default "polling duration" of 20 seconds.
You will get a higher RPM for smaller polling duration and a smaller RPM for a higher polling duration. I would consider increasing the polling duration or just keep the 20 sec default.
Also, to keep New Relic from displaying irrelevant data for the long polling, you can add ignore rules in the newrelic.js that you require in you app. This is also detailed in the newrelic npm module documentation here https://www.npmjs.org/package/newrelic#rules-for-naming-and-ignoring-requests

Non-Websocket Socket.io clients for benchmarking

There are a bunch of Socket.io client implementations out there in e.g. Java (see Java socket.io client), that seem to exclusively support the Websocket protocol.
For benchmarking the server performance of other protocols - and I'm particularly interested in htmlfile as it will be used by IE browsers < 10, unless I enable Flash, which I'm not sure I'll do, as socket.io transport 'flashsocket' takes 5 seconds to start on IE 8 - is there any Socket.io client available that would allow benchmarking of the server?
I don't care too much what OS or programming language it is.
There's
https://github.com/Gottox/socket.io-java-client
In addition to WebSocket, it only does XHR, and that feature is currently considered to be in beta:
Status: Connecting with Websocket is production ready. XHR is in beta.
Light testing xhr polling seems to support the claim that it is not production ready yet. I had a bunch of disconnects without subsequent reconnects. This was when testing a few hundred client instances simultaneously in one JVM. As there were errors in the server log, I guess it's the client.
One more guess: As the connection drops are so frequent, and the load on the server is so much higher compared to WebSockets, I wonder whether this client's 'xhr polling' does not do HTTP Keepalive, that would explain a lot... Will check as soon as time permits.
Using WebSocket, I could do 1000 instances per JVM (probably more) and 5000 instances (5 JVMs x 1000 instances each) per machine (probably more, too) without issues.
And apparently
it's easy to write your own transport
Will check this out.
It should also be easy to create your own by
sniffing on a browser <-> socket.io session (On Windows, that's Fiddler)
copying the code from one of the official socket.io tests

client not handshaken client should reconnect, socket.io in cluster

My node.js app with express, redis, mongo and socket.io works prefectly.
Today when I introduced cluster, I found that it works, but there are lot of messages by the app,
'client not handshaken' 'client should reconnect'
Many a times the response time from socket.io is very bad, up several seconds.
Then I used http-proxy front ending the requests from the browsers. Chrome works intermittently without throwing up such messages. Sometimes if I open the same URL again, it starts throwing up these messages and response is delayed.
Firefox also behaves the same way. randomly, it starts throwing these messages continuously..
Looks like some problem with websocket in clustered environment.
My node.js version is 0.6.10, socket.io 0.9.0, express 2.5.9, http-proxy 0.8.0
This is most probably because Socket.IO keeps your connections in memory, so each server will have it's own set of clients. To share Socket.IO over multiple server instances, look into using their RedisStore. The same applies to Express sessions, where you have connect-redis as an option.

Resources