NodeJs application behind Amazon ELB throws 502 - node.js

We have a node application running behind Amazon Elastic Load Balancer (ELB), which randomly throws 502 errors when there are multiple concurrent requests and when each request takes time to process. Initially, we tried to increase the idle timeout of ELB to 5 minutes, but still we were getting 502 responses.
When we contacted amazon team, they said this was happening because the back-end is closing the connection with ELB after 5s.
ELB will send HTTP-502 back to your clients for the following reasons:
The load balancer received a TCP RST from the target when attempting to establish a connection.
The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target.
The target response is malformed or contains HTTP headers that are not valid.
A new target group was used but no targets have passed an initial health check yet. A target must pass one health check to be considered healthy.
We tried to set our application's keep-alive/timeouts greater than ELB idle timeout (5 min), so the ELB can be responsible for opening and closing the connections. But still, we are facing 502 errors.
js:
var http = require( 'http' );
var express = require( 'express' );
var url = require('url');
var timeout = require('connect-timeout')
const app = express();
app.get( '/health', (req, res, next) => {
res.send( "healthy" );
});
app.get( '/api/test', (req, res, next) => {
var query = url.parse( req.url, true ).query;
var wait = query.wait ? parseInt(query.wait) : 1;
setTimeout(function() {
res.send( "Hello!" );
}, wait );
});
var server = http.createServer(app);
server.setTimeout(10*60*1000); // 10 * 60 seconds * 1000 msecs
server.listen(80, function () {
console.log('**** STARTING SERVER ****');
});

Try setting server.keepAliveTimeout to something other than the default 5s. See: https://nodejs.org/api/http.html#http_server_keepalivetimeout. Per AWS docs, you'd want this to be greater than the load balancer's idle timeout.
Note: this was added in Node v8.0.0
Also, if you're still on the Classic ELB, consider moving to the new Application Load Balancer as based on current experience this seems to have improved things for us a lot. You'll also save a few bucks if you have a lot of separate ELBs for each service. The downside could be that there's 1 point of failure for all your services. But in AWS we trust :)

In my case, I needed upgrade nodejs version:
https://github.com/nodejs/node/issues/27363
After that the problem was fixed.

Change your server.listen() to this:
const port = process.env.PORT || 3000;
const server = app.listen(port, function() {
console.log("Server running at http://127.0.0.1:" + port + "/");
});
You can read more about this here.

Related

HAProxy locks up simple express server in 5 minutes?

I have a really strange one I just cannot work out.
I have been building node/express apps for years now and usually run a dev server just at home for quick debugging/testing. I frontend it with a haproxy instance to make it "production like" and to perform the ssl part.
In any case, just recently ALL servers (different projects) started mis-behaving and stopped responding to requests around exactly 5 minutes after being started. That is ALL the 3 or 4 I run sometimes on this machine, yet the exact same instance of haproxy is front-ending the production version of the code and that has no issues, it's still rock solid. And, infuriatingly, I wrote a really basic express server example and if it's front ended by the same haproxy it also locks up, but if I switch ports, it runs fine forever as expected!
So in summary:
1x haproxy instance frontending a bunch of prod/dev instances with the same rule sets, all with ssl.
2x production instances working fine
4x dev instances(and a simple test program) ALL locking up after around 5 min when behind haproxy
and if I run the simple test program on a different port so it's local network only, it works perfectly.
I do also have uptime robot liveness checks hitting the haproxy as well to monitor the instances.
So this example:
const express = require('express')
const request = require('request');
const app = express()
const port = 1234
var counter = 0;
var received = 0;
process.on('warning', e => console.warn(e.stack));
const started = new Date();
if (process.pid) {
console.log('Starting as pid ' + process.pid);
}
app.get('/', (req, res) => {
res.send('Hello World!').end();
})
app.get('/livenessCheck', (req, res) => {
res.send('ok').end();
})
app.use((req, res, next) => {
console.log('unknown', { host: req.headers.host, url: req.url });
res.send('ok').end();
})
const server = app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
app.keepAliveTimeout = (5 * 1000) + 1000;
app.headersTimeout = (6 * 1000) + 2000;
setInterval(() => {
server.getConnections(function(error, count) {
console.log('connections', count);
});
//console.log('tick', new Date())
}, 500);
setInterval(() => {
console.log('request', new Date())
request('http://localhost:' + port, function (error, response, body) {
if (error) {
const ended = new Date();
console.error('request error:', ended, error); // Print the error if one occurred
counter = counter - 1;
if (counter < 0) {
console.error('started ', started); // Print the error if one occurred
const diff = Math.floor((ended - started) / 1000)
const min = Math.floor(diff / 60);
console.error('elapsed ', min, 'min ', diff - min*60, 'sec');
exit;
}
return;
}
received = received + 1;
console.log('request ', received, 'statusCode:', new Date(), response && response.statusCode); // Print the response status code if a response was received
//console.log('body:', body); // Print the HTML for the Google homepage.
});
}, 1000);
works perfectly and runs forever on a non-haproxy port, but only runs for approx 5 min on a port behind haproxy, it usually gets to 277 request responses each time before hanging up and timing out.
The "exit()" function is just a forced crash for testing.
I've tried adjusting some timeouts on haproxy, but to no avail. And each one has no impact on the production instances that just keep working fine.
I'm running these dev versions on a mac pro 2013 with latest OS. and tried various versions of node.
Any thoughts what it could be or how to debug further?
oh, and they all server web sockets as well as http requests.
Here is one example of a haproxy config that I am trying (relevant sections):
global
log 127.0.0.1 local2
...
nbproc 1
daemon
defaults
mode http
log global
option httplog
option dontlognull
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 4s
timeout server 5s
timeout http-keep-alive 4s
timeout check 4s
timeout tunnel 1h
maxconn 3000
frontend wwws
bind *:443 ssl crt /etc/haproxy/certs/ no-sslv3
option http-server-close
option forwardfor
reqadd X-Forwarded-Proto:\ https
reqadd X-Forwarded-Port:\ 443
http-request set-header X-Client-IP %[src]
# set HTTP Strict Transport Security (HTST) header
rspadd Strict-Transport-Security:\ max-age=15768000
acl host_working hdr_beg(host) -i working.
use_backend Working if host_working
default_backend BrokenOnMac
backend Working
balance roundrobin
server working_1 1.2.3.4:8456 check
backend BrokenOnMac
balance roundrobin
server broken_1 2.3.4.5:8456 check
So if you go to https://working.blahblah.blah it works forever, but the backend for https://broken.blahblah.blah locks up and stops responding after 5 minutes (including direct curl requests bypassing haproxy).
BUT if I run the EXACT same code on a different port, it responds forever to any direct curl request.
The "production" servers that are working are on various OSes like Centos. On my Mac Pro, I run the tests. The test code works on the Mac on a port NOT front-ended by haproxy. The same test code hangs up after 5 minutes on the Mac when it has haproxy in front.
So the precise configuration that fails is:
Mac Pro + any node express app + frontended by haproxy.
If I change anything, like run the code on Centos or make sure there is no haproxy, then the code works perfectly.
So given it only stopped working recently, then is it the latest patch for OSX Monterey (12.6) maybe somehow interfering with the app socket when it gets a certain condition from haproxy? Seems highly unlikely, but the most logical explanation I can come up with.

(PERL-> NODEJS) 500: Server closed connection without sending any data back

My Perl script talks to a Node.js server and sends the commands that the Node.js server needs to execute. While some commands take less time, others take a lot longer. While the command is executing on the server, there is silence on the connection. After a while, I receive the error: 500: Server closed connection without sending any data back
During this error, the command is still executing on the server and the desired results are obtained (if you check the server logs). My problem is that I don't want the connection to reset as there are other follow on commands that need to run after these long running commands. Some commands might take 20 mins
Perl Side Code:
my $ua = LWP::UserAgent->new;
$ua->timeout(12000);
my $uri = URI->new('http://server');
my $json = JSON->new;
my $data_to_json = {DATA};
my $json_content = $json->encode($data_to_json);
# set custom HTTP request header fields
my $req = HTTP::Request->new(POST => $uri);
my $resp = $ua->request($req);
my $message = $resp->decoded_content;
NodeJS Code
var express = require('express');
var http = require('http');
var app = express();
app.use(express.json());
var port = process.env.PORT || 8080;
app.get('<API URL>', function (req, res) {
<get all the passed arguments>
<send output to the console>
});
app.post('<API URL>', function(req, res) {
.
.
.
req.connection.setTimeout(0);
var exec = require('child_process').exec;
var child = exec(command);
}
// start the server
const server = app.listen(port);
server.timeout = 12000;
console.log('Server started! Listening on port' + port);
I have tried to add the timeout for the server using server.timeout and req.connection.setTimeout(0);.
How do I make sure that the connection is not broken?
It's generally a bad idea to make a web worker carry out long running tasks. It ties up the worker and you end up having problems like this.
Instead, have the web worker add a job to some sort of job queue (Perl's Minion is really nice). The queue operates independently of the web server. Clients can poll the server to check on the status of a job and get the output or artifacts when it's complete.
Another advantage of a proper job queue is that you can restart jobs if they fail. The queue knows the job was there. As you've seen, a broken web connection means that it fails and you've probably lost track of those inputs.

Websocket closing after 60 seconds of being idle while connection node server using AWS ELB

I have one node server running on EC2 instance and client is also running on same EC2 instance, Client open websocket connection to communicate node server, it is working in QA and Dev AWS environment but same web connection is getting close after 60 seconds of being idle in prod environment ,I am running client and node server behind ELB in aws environment.
Client Code:
ws = new WebSocket('ws://localhost:8443');
ws.onclose = function () {
console.log("Websocket connection has been closed.");
clientObj.emit('LogoffSuccess', 'LogoffSuccessfully');
};
ws.onerror=function(event)
{
console.log(event.data);
};
ws.addEventListener('open', function (event) {
console.log('Websocket connection has been opened');
ws.send(JSON.stringify(loginCreds));
});
Node server Code below:
const wss = new WebSocket.Server({ server: app });
const clients = {};
const idMap = {};
wss.on(`connection`, ws => {
const headers = ws.upgradeReq.headers;
const host = headers.host;
const key = ws.upgradeReq.headers[`sec-websocket-key`];
ctiServer.on(`responseMessage`, message => {
clients[message.AgentId].send(JSON.stringify(message));
});
ws.on(`message`, message => {
log.info(`Message received. Host: ${host}, Msg: ${message}`);
if (JSON.parse(message).EventName === `Login`) {
clients[JSON.parse(message).AgentId] = ws;
idMap[key] = JSON.parse(message).AgentId;
}
ctiServer.processIncomingRequest(message);
});
ws.on(`close`, () => {
log.info(`Connection closed. Host: ${host}`);
const message = {
EventName: `Logoff`,
AgentId: idMap[key],
EventData: {}
};
});
});
By default, Elastic Load Balancing sets the idle timeout value to 60 seconds. Therefore, if the target doesn't send some data at least every 60 seconds while the request is in flight, the load balancer can close the front-end connection. To ensure that lengthy operations such as file uploads have time to complete, send at least 1 byte of data before each idle timeout period elapses, and increase the length of the idle timeout period as needed.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout
Note that your interests are best served by periodically sending traffic to keep the connection alive. You can set the idle timeout to up to 4000 seconds in an Application Load Balancer, but you will find that stateful intermediate network infrastructure (firewalls, NAT devices) tends to reset connections before they are actually idle for so long.
PING!
Write a ping implementation (or a nil message implementation)...
...otherwise the AWS proxy (probably nginx) will shut down the connection after a period of inactivity (60 seconds in your case, but it's a bit different on different systems).
Do you use NGINX? Their requests timeout after 60 seconds.
You can extended the timeout in the NGINX configuration file for your websockets specific location.
In your case it could look something like this when extending the timeout to an hour:
...
location / {
...
proxy_pass http://127.0.0.1:8443;
...
proxy_read_timeout 3600;
proxy_send_timeout 3600;
...
}
Also see this website for more information:
https://ubiq.co/tech-blog/increase-request-timeout-nginx/
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_send_timeout

Connecting to Websocket in OpenShift Online Next Gen Starter

I'm in the process of trying to get an application which I'd built on the old OpenShift Online 2 free service up and running on the new OpenShift Online 3 Starter, and I'm having a bit of trouble.
The application uses websocket, and in the old system all that was required was for the client to connect to my server on port 8443 (which was automatically routed to my server). That doesn't seem to work in the new setup however - the connection just times out - and I haven't been able to find any documentation about using websocket in the new system.
My first thought was that I needed an additional rout, but 8080 is the only port option available for routing as far as I can see.
The app lives here, and the connection is made on line 21 of this script with the line:
this.socket = new WebSocket( 'wss://' + this.server + ':' + this.port, 'tabletop-protocol' );
Which becomes, in practice:
this.socket = new WebSocket( 'wss://production-instanttabletop.7e14.starter-us-west-2.openshiftapps.com:8443/', 'tabletop-protocol' );
On the back end, the server setup is unchanged from what I had on OpenShift 2, aside from updating the IP and port lookup from env as needed, and adding logging to help diagnose the issues I've been having.
For reference, here's the node.js server code (with the logic trimmed out):
var http = require( "http" );
var ws = require( "websocket" ).server;
// Trimmed some others used by the logic...
var ip = process.env.IP || process.env.OPENSHIFT_NODEJS_IP || '0.0.0.0';
var port = process.env.PORT || process.env.OPENSHIFT_NODEJS_PORT || 8080;
/* FILE SERVER */
// Create a static file server for the client page
var pageHost = http.createServer( function( request, response ){
// Simple file server that seems to be working, if a bit slowly
// ...
} ).listen( port, ip );
/* WEBSOCKET */
// Create a websocket server for ongoing communications
var wsConnections = [];
wsServer = new ws( { httpServer: pageHost } );
// Start listening for events on the server
wsServer.on( 'request', function( request ){
// Server logic for the app, but nothing in here ever gets hit
// ...
} );
In another question it was suggested that nearly anything - including this -
could be related to the to the ongoing general issues with US West 2, but other related problems I was experiencing seem to have cleared, and that issue has been posted for a week with no update, so I figured I'd dig deeper into this on the assumption that it's something I'm doing wrong instead of them.
Anyone know more about this and what I need to do to make it work?

How to do graceful stop for koajs server?

There are a lot of examples of graceful stop for expressjs, how can I achieve the same for koajs?
I would like to disconnect database connections as well
I have a mongoose database connection, and 2 oracle db connection (https://github.com/oracle/node-oracledb)
I created an npm package http-graceful-shutdown (https://github.com/sebhildebrandt/http-graceful-shutdown) some time ago. This works perfectly with http, express and koa. As you want to add also your own cleanup stuff, I modified the package, so that you now can add your own cleanup function, that will be called on shutdown. So basically this package handles all http shutdown things plus calling your cleanup function (if provided in the options):
const koa = require('koa');
const gracefulShutdown = require('http-graceful-shutdown');
const app = new koa();
...
server = app.listen(...); // app can be an express OR koa app
...
// your personal cleanup function - this one takes one second to complete
function cleanup() {
return new Promise((resolve) => {
console.log('... in cleanup')
setTimeout(function() {
console.log('... cleanup finished');
resolve();
}, 1000)
});
}
// this enables the graceful shutdown with advanced options
gracefulShutdown(server,
{
signals: 'SIGINT SIGTERM',
timeout: 30000,
development: false,
onShutdown: cleanup,
finally: function() {
console.log('Server gracefulls shutted down.....')
}
}
);
I have answered a variation of "how to terminate a HTTP server" many times on different node.js support channels. Unfortunately, I couldn't recommend any of the existing libraries because they are lacking in one or another way. I have since put together a package that (I believe) is handling all the cases expected of graceful HTTP server termination.
https://github.com/gajus/http-terminator
The main benefit of http-terminator is that:
it does not monkey-patch Node.js API
it immediately destroys all sockets without an attached HTTP request
it allows graceful timeout to sockets with ongoing HTTP requests
it properly handles HTTPS connections
it informs connections using keep-alive that server is shutting down by setting a connection: close header
it does not terminate the Node.js process
Usage with Koa:
import Koa from 'koa';
import {
createHttpTerminator,
} from 'http-terminator';
const app = new Koa();
const server = app.listen();
const httpTerminator = createHttpTerminator({
server,
});
await httpTerminator.terminate();
To make sure the Oracle DB connections are closed nicely, you can use a connection pool and call pool.close() with a drainTime of 0 or greater. This will let the app relatively cleanly interrupt any operation that is currently using a connection. It allows freeing the DB end of the connections without the DB waiting for whatever timeout period to expire before it cleans itself up. Even with two connections this is a solution I'd look at, since it doesn't matter that the pool is small. You may need to set the Oracle Net out-of-band break detection as well, see Connections and High Availability.
Modern versions of node have support for AbortController, so no need for external libraries. A Simple example:
const app = new Koa();
const server = http.createServer(app.callback());
const controller = new AbortController();
server.listen({
host: 'localhost',
port: 80,
signal: controller.signal
});
// middleware... etc.
app.use(async (ctx) => {
ctx.body = 'Hello World';
});
// Later, when you want to close the server.
controller.abort();

Resources