Socket.io - invalid HTTP status code on SOME browsers - node.js

Just after a few weeks of working fine, our Socket.io started spewing errors on some browsers. I've tried updated to the latest Socket.io version, I've tried our setup on different machines, I've tried all sorts of machines, it seems to work on most browsers with no clear pattern of which work.
These errors appear on a second interval:
OPTIONS https://website.com/socket.io/?EIO=2&transport=polling&t=1409760272713-52&sid=Dkp1cq0lpKV75IO8AdA3 socket.io-1.0.6.js:2
XMLHttpRequest cannot load https://website.com/socket.io/?EIO=2&transport=polling&t=1409760272713-52&sid=Dkp1cq0lpKV75IO8AdA3. Invalid HTTP status code 400
We're behind Amazon's ELB, Socket.io on polling because the ELB router doesn't support WebSockets.

I found the problem that has been causing this, and it's is really unexpected...
This problem comes from using load balanced services like AWS ELB (independent EC2 should be fine though) and Heroku, their infrastructure doesn't support Socket.io features fully. AWS ELB flat out won't support WebSockets, and Heroku's router is trash for Socket.io, even in conjunction with socket.io-redis.
The problem is hidden when you use a single server, but as soon you start clustering, you will get issues. A single Heroku dyno on my application worked fine, and then the problems started appearing in production out of development, when we weren't using more than one server. We tried on ELB with sticky-load balance and even then, we still had the same issues.
When socket.io returns 400 errors, in this case it was saying "This session doesn't exist and you never completed the handshake", because you completed the handshake on a different server in your cluster.
The solution for me was just dedicating an EC2 instance for my web app to handle Socket.io.

Related

Socket.io connecting/disconnecting unexpectedly in production - MEAN stack hosted on Elastic Beanstalk

I have a MEAN stack application hosted on AWS Elastic Beanstalk that uses socket.io
This socket.io is connecting/disconnecting unexpectedly.
The socket causes a code of 4xx to the server. After a few 4xx are sent to AWS, the server degrades automatically and the socket disconnects. So it becomes like a loop, AWS first receives the 4xx, then the environment becomes unhealthy, then the socket behaves even more strange because the server is about to go down, etc.
What matters is that the starting point is the 4xx caused by the socket.
The log I have says:
"GET /socket.io/?EIO=3&transport=polling&t=Nx_u0FB HTTP/1.1" 400 62
I tried to add the CORS option to the socket with the app domain as the origin, but it didn't help.
Please note that this happens in production only and not on localhost.
Also, please note that this case doesn't happen unless we have like 10-20 users/sockets connecting from different parts of the world.
If we have a few sockets, it rarely happens, and sometimes even if we have many users connecting from the same country, it doesn't happen. The behavior is very random.
Anyone can help with this?

Recently having trouble with socket.io connections via a Digitalocean load balancer (400 Error)

I have a DO Load Balancer that has 4 servers behind it, I've been using socket.io with Sticky Sessions enabled in the Load Balancer settings and it had been working just fine for a while.
Recently clients have not been able to connect at all getting a 400 error immediately on connection. I haven't changed anything in the way I connect to the sockets at all. If I do require that the transport be 'websocket' only from the client it does connect successfully, but then I lose out on the polling backup (one of the main benefits of socket.io).
Also, connecting directly to one of the droplets works as expected, so the issue definitely stands with the Load Balancer.
Does anyone have any idea as to any kind of set up that should be in place for this to work with the DO Load Balancers? Anything that might have changed recently?
I'm running socket.io on a NodeJS server with Express if that helps at all.
Edit #1: Added a screenshot of the LB Settings

Node server stop responding to requests after a while

I recently created an e-commerce site using express and the node server worked fine on my local machine. When I uploaded it to my VPS and tried running it using pm2 and with nodemon, the server stopped responding to requests after few minutes, even when the number of requests is low.
Although all the internal functionalities other than request handling were working well. I used a lot of console.log()s in my codes, is this problem due to the excessive use of console.log()?

Socket.IO keeps reconnecting Websocket on Cloudflare

I have a Node/Express app on server dedicated to sockets and on the client it's Angular 1.5. Running the code locally on http using the same architecture e.g. separate socket server it all works perfectly fine.
When I run the code locally it creates one connection and does very little polling via xhr. On cloudflare with https it does a lot of polling, reconnects continually and not all the messages seem to be getting to the web client
messages hit cloudflare which then redirects them to a loadbalancer running haproxy which then routes the requests to an app running in a docker instance on another machine.
Your Issue is most likely occuring beacause of an issue with cloudflare only allowing traffic to a limited set of ports. Try one of the ports listed in the below link for your server and try connecting to it.
https://support.cloudflare.com/hc/en-us/articles/200169156-Which-ports-will-Cloudflare-work-with-
After a lot of investigation I found the issue to be down to the config in haproxy. I needed to alter the timeouts around the socket routing.
This was nothing to do with ports not being open on cloudflare.
The following link helped me
http://blog.haproxy.com/2012/11/07/websockets-load-balancing-with-haproxy/

NodeJS make problems under ELB

we are using ELB to Load balance between two NODEJS server.
Suddenly yesterday the service has started to recieve errors while i have two servers under the ELB.
when removing one of the servers and staying with only one server the service is working fine.
i don't have any log of traffic direction between the the servers and it seems that the system works fine with one server (no matter which one of them) and doesn't work with more than one server.
Any suggestions what should we check ?
10x!

Resources