I have created a Swoole WebSocket server using the example given on the official OpenSwoole website:
https://openswoole.com/docs/modules/swoole-websocket-server#quick-start-example
The response header returned by the server to the client contains the Server: OpenSwoole 4.11.1 entry.
Connection: Upgrade
Sec-WebSocket-Accept: mABCD0joUS/Z/yPYqrqfa3+I2sT=
Sec-WebSocket-Version: 13
Server: OpenSwoole 4.11.1
Upgrade: websocket
Could we remove this server header line completely or replace it with a fake name like Server: XYZ?
I have a Graphql server with subscriptions enabled, when accessed directly from my client it works great. Now what I need to do is reroute the connection through my backend (an express server).
I have the following configuration on my server:
app.use(
"/graphql",
createProxyMiddleware({
target: `http://${GQL_URL}`,
changeOrigin: true,
ws: true,
})
);
This eventually works, however, the first connection when the server starts-up, which I can see in my console the message [HPM] Upgrading to WebSocket takes over five minute to complete.
By the time it manages to upgrade the connection, the client has already disconnected and retried multiple times and I get "instantly" all the stacked pending retries:
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Client disconnected
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Client disconnected
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Client disconnected
[server] [HPM] Client disconnected
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Client disconnected
[server] [HPM] Upgrading to WebSocket
[server] [HPM] Client disconnected
[server] [HPM] Upgrading to WebSocket
After that, the connections to the server work properly.
In principle this is not a major problem, as when I deploy, the application will not work for a few minutes and then start working, is not ideal but not all bad.
However, the major problem is when I am developing. Whenever I make any change to the servers code and it reloads I have to wait those few minutes to properly checkout if my application is working.
How can I fix this issue? Am I missing something?
I am unable to scale my simple Socket.IO app past around 980 concurrent connections using Docker. However, if I run it locally on my macOS Sierra 10.12.6 I can get over 3000 connections. I have included this repo of a simple SocketIO application that I am testing with:
https://github.com/gsccheng/simple-socketIO-app
My Docker-for-Mac is configured at 4 CPUs and 5 GB memory. The Version is
Version 17.09.0-ce-mac35 (19611)
Channel: stable
a98b7c1b7c
I am using Artillery version 1.6.0-9 to load test it with
$ artillery run load-test.yaml
I'm showing some redundant configurations of the settings (to show you that they have been considered). Here are my steps to reproduce.
$ docker build . -t socket-test
$ docker run -p 8000:8000 -c 1024 -m 4096M --privileged --ulimit nofile=9000:9000 -it test-socket:latest /bin/sh
#> DEBUG=* npm start
Up to around 980 connections I will get logs like this:
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room 0ohCcHMWYASnfRgJAAPS +0ms
engine:ws received "2" +5ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine upgrading existing transport +2ms
engine:socket might upgrade socket transport from "polling" to "websocket" +0ms
engine intercepting request for path "/socket.io/" +2ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqL&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "28:42["news",{"hello":"world"}]" +0ms
engine:socket executing batch send callback +1ms
engine:ws received "2probe" +4ms
engine:ws writing "3probe" +0ms
engine intercepting request for path "/socket.io/" +4ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfqV&b64=1&sid=0ohCcHMWYASnfRgJAAPS" +0ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
engine:socket writing a noop packet to polling for fast upgrade +10ms
engine:polling writing "1:6" +0ms
engine:ws received "5" +2ms
engine:socket got upgrade packet - upgrading +0ms
engine:polling closing +0ms
engine:polling transport discarded - closing right away +1ms
engine:ws received "2" +20ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +1ms
engine:ws writing "3" +0ms
engine intercepting request for path "/socket.io/" +1ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pfr1&b64=1" +0ms
engine handshaking client "6ccAiZwbvrchxZEiAAPT" +0ms
engine:socket sending packet "open" ({"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"6ccAiZwbvrchxZEiAAPT","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:client connecting to namespace / +1ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +0ms
socket.io:socket joining room 6ccAiZwbvrchxZEiAAPT +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
At about 980 connections I will begin seeing these disconnected events:
disconnected to Socket!
transport close
engine intercepting request for path "/socket.io/" +27ms
engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1T&b64=1" +0ms
engine handshaking client "C-pdSXFCbwQaTeYLAAPh" +0ms
engine:socket sending packet "open" ({"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:socket sending packet "message" (0) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing "97:0{"sid":"C-pdSXFCbwQaTeYLAAPh","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40" +0ms
engine:socket executing batch send callback +0ms
socket.io:server incoming connection with id C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:client connecting to namespace / +0ms
socket.io:namespace adding socket to nsp / +0ms
socket.io:socket socket connected - writing packet +1ms
socket.io:socket joining room C-pdSXFCbwQaTeYLAAPh +0ms
socket.io:socket packet already sent in initial handshake +0ms
Connected to Socket!
socket.io:client writing packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoding packet {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} +0ms
socket.io-parser encoded {"type":2,"data":["news",{"hello":"world"}],"nsp":"/"} as 2["news",{"hello":"world"}] +0ms
engine:socket sending packet "message" (2["news",{"hello":"world"}]) +0ms
socket.io:socket joined room C-pdSXFCbwQaTeYLAAPh +0ms
engine intercepting request for path "/socket.io/" +13ms
engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=Ly8pg1g&b64=1&sid=C-pdSXFCbwQaTeYLAAPh" +0ms
engine setting new request for existing client +1ms
engine:polling received "1:1" +0ms
engine:polling got xhr close packet +0ms
socket.io:client client close with reason transport close +0ms
socket.io:socket closing socket - reason transport close +1ms
disconnected to Socket!
Then it'll be this repeated over and over again:
engine:ws writing "3" +0ms
engine:ws received "2" +42ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +1ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +4ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +45ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
engine:ws received "2" +7ms
engine:socket packet +0ms
engine:socket got ping +0ms
engine:socket sending packet "pong" (undefined) +0ms
engine:socket flushing buffer to transport +0ms
engine:ws writing "3" +0ms
As you can see in my Dockerfile, I have set a few configurations that I've gathered from googling my problem:
COPY limits.conf /etc/security/
COPY sysctl.conf /etc/
COPY rc.local /etc/
COPY common-session /etc/pam.d/
COPY common-session-noninteractive /etc/pam.d/
COPY supervisord.conf /etc/supervisor/
On my local system I've also done a few configurations like following this example. Here is the state of my host machine:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 64000
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
What can I do to get more than ~980 concurrent socket connections? Why do I fail to make any more connections at that point? How can my repo be tweaked (if needed) to get this to work?
Edit
When I lower the nofiles limit to say 500 for the container, I see that my application disconnects seem to fail the same way. When I increase or decrease my memory and CPU by say half/double. I don't see any different in behavior, so it doesn't seem like that is the issue.
There's a significant difference between the network path to the app locally and the app running in Docker for Mac.
The path to your app on the mac is straight in via the loopback interface:
mac
client -> lo -> nodejs
When using Docker for Mac, the path in includes more hops and includes two userland proxy processes, vpnkit on your mac and docker-proxy which accept TCP connections on the forwarded port and forward data in:
mac | vm | container
client -> lo -> vpnkit -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs
Try with a VM that has a network directly accessible to the mac to see if vpnkit is making an appreciable difference.
mac | vm | container
client -> if -> if -> docker-proxy -> NAT -> bridge -> if -> nodejs
You can also remove docker-proxy by attaching the containers interface directly to the the VM network so the container doesn't require the port mapping (-p). This can be done by mapping a macvlan interface to the container or placing the container on a bridge attached to the VM network. This is a vagrant setup I use for the bridged network.
mac | container <- there is a little vm here, but minimal.
client -> if -> if -> nodejs
mac | vm | container
client -> if -> if -> bridge -> if -> nodejs
Once you've got rid of the network differences then I'd look at tuning the VM and container in a bit more detail. I'd guess you should see a 10-20% decrease in the VM, not 66%.
I faced the engine:polling got xhr close packet And I tried to search all from stackoverflow, but only this question has this info.
I have briefly investigated into it, and it is that when client sending both get+post http request, somehow, the load balancer rejected the get while the post may still work fine, so this also happens on our sites.
The problem should be escalated to the stability of load balancer. (Especially its stability of sticky session)
I'm using nginx 1.4.6 as a proxy to both a Django app and a related NodeJS app that uses socket.io 1.2.1 along with socketio-jwt 2.3.5
Locally, I'm coding using a Vagrant instance (ubuntu 14.04.1), while remotely I'm using a dedicated AWS EC2 instance -- both are configured using the same Vagrantfile and setup shell script so they're very close to identical.
When running locally, my client connects to the nginx server which upgrades the connection and passes it to the socket.io server implementation under Node. Everything works really well.
Remotely, I log a first connection with transport=polling which also instructs the client to upgrade to websocket transport, then a request second with transport=websockets and then a third with transport=polling again.
While the application still works, polling < websockets for many reasons, and I hope I can figure out a way to sort this.
The second request is answered with a 400 Bad Request. Running my Node app with DEBUG=* node index.js gives the following output:
engine intercepting request for path "/socket.io/" +24s
engine handling "GET" http request "/socket.io/?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6Im1hcmNlbCIsIm9yaWdfaWF0IjoxNDE4MDU5ODg1LCJ1c2VyX2lkIjozLCJlbWFpbCI6IiIsImV4cCI6MTQxODE0NjI4NX0.2PE0TNRol9G4hby4OzQ-af2e0yjfgFAb-gQJF5tKWRwxWnFLv1NGp3Yo87UaqNaQceMW6KqzMIx2gLcRFnk09A&EIO=3&transport=polling&t=1418062322853-0" +0ms
engine handshaking client "6YDFOjBfvZ9UyutVAAAB" +1ms
engine:socket sending packet "open" ({"sid":"6YDFOjBfvZ9UyutVAAAB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}) +0ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +0ms
engine:polling writing " �0{"sid":"6YDFOjBfvZ9UyutVAAAB","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}" +1ms
engine:socket executing batch send callback +1ms
socket.io:server incoming connection with id 6YDFOjBfvZ9UyutVAAAB +1.7m
socket.io:client connecting to namespace / +1.7m
socket.io:namespace adding socket to nsp / +1.7m
socket.io:socket socket connected - writing packet +1.7m
socket.io:socket joining room 6YDFOjBfvZ9UyutVAAAB +0ms
socket.io:client writing packet {"type":0,"nsp":"/"} +79ms
socket.io-parser encoding packet {"type":0,"nsp":"/"} +1.7m
socket.io-parser encoded {"type":0,"nsp":"/"} as 0 +0ms
engine:socket sending packet "message" (0) +79ms
socket id: "6YDFOjBfvZ9UyutVAAAB" connected to user: "marcel"
socket.io:socket joined room 6YDFOjBfvZ9UyutVAAAB +1ms
engine intercepting request for path "/socket.io/" +121ms
engine handling "GET" http request "/socket.io/?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6Im1hcmNlbCIsIm9yaWdfaWF0IjoxNDE4MDU5ODg1LCJ1c2VyX2lkIjozLCJlbWFpbCI6IiIsImV4cCI6MTQxODE0NjI4NX0.2PE0TNRol9G4hby4OzQ-af2e0yjfgFAb-gQJF5tKWRwxWnFLv1NGp3Yo87UaqNaQceMW6KqzMIx2gLcRFnk09A&EIO=3&transport=polling&t=1418062323048-1&sid=6YDFOjBfvZ9UyutVAAAB" +0ms
engine setting new request for existing client +1ms
engine:polling setting request +0ms
engine:socket flushing buffer to transport +1ms
engine:polling writing "�40" +38ms
engine:socket executing batch send callback +1ms
engine intercepting request for path "/socket.io/" +0ms
engine handling "GET" http request "/socket.io/?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6Im1hcmNlbCIsIm9yaWdfaWF0IjoxNDE4MDU5ODg1LCJ1c2VyX2lkIjozLCJlbWFpbCI6IiIsImV4cCI6MTQxODE0NjI4NX0.2PE0TNRol9G4hby4OzQ-af2e0yjfgFAb-gQJF5tKWRwxWnFLv1NGp3Yo87UaqNaQceMW6KqzMIx2gLcRFnk09A&EIO=3&transport=websocket&sid=6YDFOjBfvZ9UyutVAAAB" +1ms
engine bad request: unexpected transport without upgrade +0ms
engine intercepting request for path "/socket.io/" +120ms
engine handling "GET" http request "/socket.io/?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6Im1hcmNlbCIsIm9yaWdfaWF0IjoxNDE4MDU5ODg1LCJ1c2VyX2lkIjozLCJlbWFpbCI6IiIsImV4cCI6MTQxODE0NjI4NX0.2PE0TNRol9G4hby4OzQ-af2e0yjfgFAb-gQJF5tKWRwxWnFLv1NGp3Yo87UaqNaQceMW6KqzMIx2gLcRFnk09A&EIO=3&transport=polling&t=1418062323180-2&sid=6YDFOjBfvZ9UyutVAAAB" +1ms
engine setting new request for existing client +0ms
engine:polling setting request +0ms
The issue appears to be related to the line engine bad request: unexpected transport without upgrade +0ms, but I don't understand that. The nginx config definitely mentions an upgrade, and it works on my vagrant machine.
The relevant part of the nginx config is as follows:
server {
...
location / {
...
}
location /socket.io/ {
proxy_pass http://127.0.0.1:4000/socket.io/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
proxy_redirect off;
}
}
The only point of difference that I can think of between the local and remote setups is that locally, the client connects to localhost on two different ports (I proxy the Node app behind nginx, but the web app runs through Gulp), while remotely the Node app is on a different domain, behind port 80.
Any clues much appreciated :)
I have had exactly the same problem. After looking at some tcp captures it looked like nginx did not even receive the incoming Upgrade header, so it didn't pass it on to node.js. (I do suspect cloudflare for doing this, but i am not sure).
The way i fixed this is really hacky, but it does work very well. Basically i modified the nginx configuration to look for the Sec-Websocket-Key header, and when it finds that, it sets the Upgrade header to websocket, and the Connection header to upgrade. This works.
Example configuration:
map $http_sec_websocket_key $upgr {
"" ""; # If the Sec-Websocket-Key header is empty, send no upgrade header
default "websocket"; # If the header is present, set Upgrade to "websocket"
}
map $http_sec_websocket_key $conn {
"" $http_connection; # If no Sec-Websocket-Key header exists, set $conn to the incoming Connection header
default "upgrade"; # Otherwise, set $conn to upgrade
}
You have to define these 2 map blocks before your server {} block.
Now in your socket.io location add the following 2 lines:
proxy_set_header Upgrade $upgr;
proxy_set_header Connection $conn;
This sets the Upgrade and Connection headers to the mapped values.
I hope this is the solution to your problem as well.
I am having a problem with socket.io (websocket) when running in EC2. I don't have any http proxy or loadbalancer installed infront of the node instance. Same code works fine when running in local environment but keep reconnecting in EC2. Also even in the EC2 instance xhr-polling works fine.
Following is socket.io debug log output when transport is set to websocket and xhr-polling.
debug - discarding transport
debug - authorized
info - handshake authorized aoiP_6qFnqiqEC3r2-0N
debug - setting request GET /socket.io/1/websocket/aoiP_6qFnqiqEC3r2-0N
debug - set heartbeat interval for client aoiP_6qFnqiqEC3r2-0N
debug - client authorized for
debug - websocket writing 1::
warn - websocket parser error: reserved fields must be empty
info - transport end (undefined)
debug - set close timeout for client NxkBCtJqwOscfHzE0xba
debug - cleared close timeout for client NxkBCtJqwOscfHzE0xba
debug - cleared heartbeat interval for client NxkBCtJqwOscfHzE0xba