Websocket closing after 60 seconds of being idle while connection node server using AWS ELB - node.js

I have one node server running on EC2 instance and client is also running on same EC2 instance, Client open websocket connection to communicate node server, it is working in QA and Dev AWS environment but same web connection is getting close after 60 seconds of being idle in prod environment ,I am running client and node server behind ELB in aws environment.
Client Code:
ws = new WebSocket('ws://localhost:8443');
ws.onclose = function () {
console.log("Websocket connection has been closed.");
clientObj.emit('LogoffSuccess', 'LogoffSuccessfully');
};
ws.onerror=function(event)
{
console.log(event.data);
};
ws.addEventListener('open', function (event) {
console.log('Websocket connection has been opened');
ws.send(JSON.stringify(loginCreds));
});
Node server Code below:
const wss = new WebSocket.Server({ server: app });
const clients = {};
const idMap = {};
wss.on(`connection`, ws => {
const headers = ws.upgradeReq.headers;
const host = headers.host;
const key = ws.upgradeReq.headers[`sec-websocket-key`];
ctiServer.on(`responseMessage`, message => {
clients[message.AgentId].send(JSON.stringify(message));
});
ws.on(`message`, message => {
log.info(`Message received. Host: ${host}, Msg: ${message}`);
if (JSON.parse(message).EventName === `Login`) {
clients[JSON.parse(message).AgentId] = ws;
idMap[key] = JSON.parse(message).AgentId;
}
ctiServer.processIncomingRequest(message);
});
ws.on(`close`, () => {
log.info(`Connection closed. Host: ${host}`);
const message = {
EventName: `Logoff`,
AgentId: idMap[key],
EventData: {}
};
});
});

By default, Elastic Load Balancing sets the idle timeout value to 60 seconds. Therefore, if the target doesn't send some data at least every 60 seconds while the request is in flight, the load balancer can close the front-end connection. To ensure that lengthy operations such as file uploads have time to complete, send at least 1 byte of data before each idle timeout period elapses, and increase the length of the idle timeout period as needed.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout
Note that your interests are best served by periodically sending traffic to keep the connection alive. You can set the idle timeout to up to 4000 seconds in an Application Load Balancer, but you will find that stateful intermediate network infrastructure (firewalls, NAT devices) tends to reset connections before they are actually idle for so long.

PING!
Write a ping implementation (or a nil message implementation)...
...otherwise the AWS proxy (probably nginx) will shut down the connection after a period of inactivity (60 seconds in your case, but it's a bit different on different systems).

Do you use NGINX? Their requests timeout after 60 seconds.
You can extended the timeout in the NGINX configuration file for your websockets specific location.
In your case it could look something like this when extending the timeout to an hour:
...
location / {
...
proxy_pass http://127.0.0.1:8443;
...
proxy_read_timeout 3600;
proxy_send_timeout 3600;
...
}
Also see this website for more information:
https://ubiq.co/tech-blog/increase-request-timeout-nginx/
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_send_timeout

Related

HAProxy locks up simple express server in 5 minutes?

I have a really strange one I just cannot work out.
I have been building node/express apps for years now and usually run a dev server just at home for quick debugging/testing. I frontend it with a haproxy instance to make it "production like" and to perform the ssl part.
In any case, just recently ALL servers (different projects) started mis-behaving and stopped responding to requests around exactly 5 minutes after being started. That is ALL the 3 or 4 I run sometimes on this machine, yet the exact same instance of haproxy is front-ending the production version of the code and that has no issues, it's still rock solid. And, infuriatingly, I wrote a really basic express server example and if it's front ended by the same haproxy it also locks up, but if I switch ports, it runs fine forever as expected!
So in summary:
1x haproxy instance frontending a bunch of prod/dev instances with the same rule sets, all with ssl.
2x production instances working fine
4x dev instances(and a simple test program) ALL locking up after around 5 min when behind haproxy
and if I run the simple test program on a different port so it's local network only, it works perfectly.
I do also have uptime robot liveness checks hitting the haproxy as well to monitor the instances.
So this example:
const express = require('express')
const request = require('request');
const app = express()
const port = 1234
var counter = 0;
var received = 0;
process.on('warning', e => console.warn(e.stack));
const started = new Date();
if (process.pid) {
console.log('Starting as pid ' + process.pid);
}
app.get('/', (req, res) => {
res.send('Hello World!').end();
})
app.get('/livenessCheck', (req, res) => {
res.send('ok').end();
})
app.use((req, res, next) => {
console.log('unknown', { host: req.headers.host, url: req.url });
res.send('ok').end();
})
const server = app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
app.keepAliveTimeout = (5 * 1000) + 1000;
app.headersTimeout = (6 * 1000) + 2000;
setInterval(() => {
server.getConnections(function(error, count) {
console.log('connections', count);
});
//console.log('tick', new Date())
}, 500);
setInterval(() => {
console.log('request', new Date())
request('http://localhost:' + port, function (error, response, body) {
if (error) {
const ended = new Date();
console.error('request error:', ended, error); // Print the error if one occurred
counter = counter - 1;
if (counter < 0) {
console.error('started ', started); // Print the error if one occurred
const diff = Math.floor((ended - started) / 1000)
const min = Math.floor(diff / 60);
console.error('elapsed ', min, 'min ', diff - min*60, 'sec');
exit;
}
return;
}
received = received + 1;
console.log('request ', received, 'statusCode:', new Date(), response && response.statusCode); // Print the response status code if a response was received
//console.log('body:', body); // Print the HTML for the Google homepage.
});
}, 1000);
works perfectly and runs forever on a non-haproxy port, but only runs for approx 5 min on a port behind haproxy, it usually gets to 277 request responses each time before hanging up and timing out.
The "exit()" function is just a forced crash for testing.
I've tried adjusting some timeouts on haproxy, but to no avail. And each one has no impact on the production instances that just keep working fine.
I'm running these dev versions on a mac pro 2013 with latest OS. and tried various versions of node.
Any thoughts what it could be or how to debug further?
oh, and they all server web sockets as well as http requests.
Here is one example of a haproxy config that I am trying (relevant sections):
global
log 127.0.0.1 local2
...
nbproc 1
daemon
defaults
mode http
log global
option httplog
option dontlognull
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 4s
timeout server 5s
timeout http-keep-alive 4s
timeout check 4s
timeout tunnel 1h
maxconn 3000
frontend wwws
bind *:443 ssl crt /etc/haproxy/certs/ no-sslv3
option http-server-close
option forwardfor
reqadd X-Forwarded-Proto:\ https
reqadd X-Forwarded-Port:\ 443
http-request set-header X-Client-IP %[src]
# set HTTP Strict Transport Security (HTST) header
rspadd Strict-Transport-Security:\ max-age=15768000
acl host_working hdr_beg(host) -i working.
use_backend Working if host_working
default_backend BrokenOnMac
backend Working
balance roundrobin
server working_1 1.2.3.4:8456 check
backend BrokenOnMac
balance roundrobin
server broken_1 2.3.4.5:8456 check
So if you go to https://working.blahblah.blah it works forever, but the backend for https://broken.blahblah.blah locks up and stops responding after 5 minutes (including direct curl requests bypassing haproxy).
BUT if I run the EXACT same code on a different port, it responds forever to any direct curl request.
The "production" servers that are working are on various OSes like Centos. On my Mac Pro, I run the tests. The test code works on the Mac on a port NOT front-ended by haproxy. The same test code hangs up after 5 minutes on the Mac when it has haproxy in front.
So the precise configuration that fails is:
Mac Pro + any node express app + frontended by haproxy.
If I change anything, like run the code on Centos or make sure there is no haproxy, then the code works perfectly.
So given it only stopped working recently, then is it the latest patch for OSX Monterey (12.6) maybe somehow interfering with the app socket when it gets a certain condition from haproxy? Seems highly unlikely, but the most logical explanation I can come up with.

NodeJS WebSocket Instance Warmup

While scaling out the azure instance at a high load, many WebSocket requests are not handled properly as these instances are not warmed up. To pre-warm up the instance, we have added a piece of code in the listening function in the WebSocket. In this code, we are trying to create multiple WebSocket connections to the same instance for the warmup of the machine in order to prepare the instance to handle a high load. But we are not able to establish these WebSocket connections to the same machine in the Azure app service while we could make WebSocket connections to the same machine in the local machine.
Really appreciate it if anyone could guide us on this.
function onListening() {
//psuedo code for warming up the instances quickly//
for (let warmupIndex = 0; warmupIndex < 20; warmupIndex++) {
let ws = new WebSocket('ws://' + os.hostname() + ':'+ port+'/');
setTimeout(function () {
logger.infoLog("Inside timeout")
ws.close();
}, 10000);
}
callback(null);
}

Socket.IO, SSL Problems With cloudflare

I'm having a socket.io app that basically receives signals from a frontend in order to kill and start a new ffmpeg process (based on .spawn()).
Everything works like expected, but often I get a 525 error from cloudflare. The error message is: Cloudflare is unable to establish an SSL connection to the origin server.
It works like 9 out of 10 times.I noticed that more of these errors pop up whenever a kill + spawn is done. Could it be the case that something block the event loop and because of this blocks all incoming requests and cloudflare logs these as a handshake failed error?
Contacting cloudflare support gives me back this info (this is the request they do to my server):
Time id host message upstream
2017-08-16T09:14:24.000Z 38f34880faf04433 xxxxxx.com:2096 peer closed connection in SSL handshake while SSL handshaking to upstream https://xxx.xxx.xxx.xxx:2096/socket.io/?EIO=3&transport=polling&t=LtgKens
I'm debugging for some time now, but can't seem to find a solutions myself.
This is how I initialize my socketIO server.
/**
* Start the socket server
*/
var startSocketIO = function() {
var ssl_options = {
key: fs.readFileSync(sslConfig.keyFile, 'utf8'),
cert: fs.readFileSync(sslConfig.certificateFile, 'utf8')
};
self.app = require('https').createServer(ssl_options, express);
self.io = require('socket.io')(self.app);
self.io.set('transports', ['websocket', 'polling']);
self.app.listen(2096, function() {
console.log('Socket.IO Started on port 2096');
});
};
This is the listener code on the server side
this.io.on('connection', function (socket) {
console.log('new connection');
/**
* Connection to the room
*/
socket.on('changeVideo', function (data) {
//Send to start.js and start.js will kill the ffmpeg process and
start a new one
socket.emit('changeVideo');
});
});
Another thing that I observer while debugging (I only got this a few times):
The text new connection displayed on the server and the connected client emits the changevideo event but nothing happens on the server side instead the client just
keeps reconnecting.
This is a simplified version of the nodejs code. If you have more questions, just let me know.
Thanks!

NodeJs application behind Amazon ELB throws 502

We have a node application running behind Amazon Elastic Load Balancer (ELB), which randomly throws 502 errors when there are multiple concurrent requests and when each request takes time to process. Initially, we tried to increase the idle timeout of ELB to 5 minutes, but still we were getting 502 responses.
When we contacted amazon team, they said this was happening because the back-end is closing the connection with ELB after 5s.
ELB will send HTTP-502 back to your clients for the following reasons:
The load balancer received a TCP RST from the target when attempting to establish a connection.
The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target.
The target response is malformed or contains HTTP headers that are not valid.
A new target group was used but no targets have passed an initial health check yet. A target must pass one health check to be considered healthy.
We tried to set our application's keep-alive/timeouts greater than ELB idle timeout (5 min), so the ELB can be responsible for opening and closing the connections. But still, we are facing 502 errors.
js:
var http = require( 'http' );
var express = require( 'express' );
var url = require('url');
var timeout = require('connect-timeout')
const app = express();
app.get( '/health', (req, res, next) => {
res.send( "healthy" );
});
app.get( '/api/test', (req, res, next) => {
var query = url.parse( req.url, true ).query;
var wait = query.wait ? parseInt(query.wait) : 1;
setTimeout(function() {
res.send( "Hello!" );
}, wait );
});
var server = http.createServer(app);
server.setTimeout(10*60*1000); // 10 * 60 seconds * 1000 msecs
server.listen(80, function () {
console.log('**** STARTING SERVER ****');
});
Try setting server.keepAliveTimeout to something other than the default 5s. See: https://nodejs.org/api/http.html#http_server_keepalivetimeout. Per AWS docs, you'd want this to be greater than the load balancer's idle timeout.
Note: this was added in Node v8.0.0
Also, if you're still on the Classic ELB, consider moving to the new Application Load Balancer as based on current experience this seems to have improved things for us a lot. You'll also save a few bucks if you have a lot of separate ELBs for each service. The downside could be that there's 1 point of failure for all your services. But in AWS we trust :)
In my case, I needed upgrade nodejs version:
https://github.com/nodejs/node/issues/27363
After that the problem was fixed.
Change your server.listen() to this:
const port = process.env.PORT || 3000;
const server = app.listen(port, function() {
console.log("Server running at http://127.0.0.1:" + port + "/");
});
You can read more about this here.

How to do graceful stop for koajs server?

There are a lot of examples of graceful stop for expressjs, how can I achieve the same for koajs?
I would like to disconnect database connections as well
I have a mongoose database connection, and 2 oracle db connection (https://github.com/oracle/node-oracledb)
I created an npm package http-graceful-shutdown (https://github.com/sebhildebrandt/http-graceful-shutdown) some time ago. This works perfectly with http, express and koa. As you want to add also your own cleanup stuff, I modified the package, so that you now can add your own cleanup function, that will be called on shutdown. So basically this package handles all http shutdown things plus calling your cleanup function (if provided in the options):
const koa = require('koa');
const gracefulShutdown = require('http-graceful-shutdown');
const app = new koa();
...
server = app.listen(...); // app can be an express OR koa app
...
// your personal cleanup function - this one takes one second to complete
function cleanup() {
return new Promise((resolve) => {
console.log('... in cleanup')
setTimeout(function() {
console.log('... cleanup finished');
resolve();
}, 1000)
});
}
// this enables the graceful shutdown with advanced options
gracefulShutdown(server,
{
signals: 'SIGINT SIGTERM',
timeout: 30000,
development: false,
onShutdown: cleanup,
finally: function() {
console.log('Server gracefulls shutted down.....')
}
}
);
I have answered a variation of "how to terminate a HTTP server" many times on different node.js support channels. Unfortunately, I couldn't recommend any of the existing libraries because they are lacking in one or another way. I have since put together a package that (I believe) is handling all the cases expected of graceful HTTP server termination.
https://github.com/gajus/http-terminator
The main benefit of http-terminator is that:
it does not monkey-patch Node.js API
it immediately destroys all sockets without an attached HTTP request
it allows graceful timeout to sockets with ongoing HTTP requests
it properly handles HTTPS connections
it informs connections using keep-alive that server is shutting down by setting a connection: close header
it does not terminate the Node.js process
Usage with Koa:
import Koa from 'koa';
import {
createHttpTerminator,
} from 'http-terminator';
const app = new Koa();
const server = app.listen();
const httpTerminator = createHttpTerminator({
server,
});
await httpTerminator.terminate();
To make sure the Oracle DB connections are closed nicely, you can use a connection pool and call pool.close() with a drainTime of 0 or greater. This will let the app relatively cleanly interrupt any operation that is currently using a connection. It allows freeing the DB end of the connections without the DB waiting for whatever timeout period to expire before it cleans itself up. Even with two connections this is a solution I'd look at, since it doesn't matter that the pool is small. You may need to set the Oracle Net out-of-band break detection as well, see Connections and High Availability.
Modern versions of node have support for AbortController, so no need for external libraries. A Simple example:
const app = new Koa();
const server = http.createServer(app.callback());
const controller = new AbortController();
server.listen({
host: 'localhost',
port: 80,
signal: controller.signal
});
// middleware... etc.
app.use(async (ctx) => {
ctx.body = 'Hello World';
});
// Later, when you want to close the server.
controller.abort();

Resources