Abandoned http requests after server.close()? - node.js

I have a vanilla nodejs server like this:
let someVar // to be set to a Promise
const getData = url => {
return new Promise((resolve, reject) => {
https.get(
url,
{ headers: { ...COMMON_REQUEST_HEADERS, 'X-Request-Time': '' + Date.now() } },
res => {
if (res.statusCode === 401) return reject(new RequestError(INVALID_KEY, res))
if (res.statusCode !== 200) return reject(new RequestError(BAD_REQUEST, res))
let json = ''
res.on('data', chunk => json += chunk)
res.on('end', () => {
try {
resolve(JSON.parse(json).data)
} catch (error) {
return reject(new RequestError(INVALID_RESPONSE, res, json))
}
})
}
).on('error', error => reject(new RequestError(FAILED, error)))
})
}
const aCallback = () =>
console.log('making api call')
someVar = getData('someApiEndpoint')
.then(data => { ... })
}
const main = () => {
const server = http.createServer(handleRequest)
anInterval = setInterval(aCallback, SOME_LENGTH_OF_TIME)
const exit = () => {
server.close(() => process.exit())
log('Server is closed')
}
process.on('SIGINT', exit)
process.on('SIGTERM', exit)
process.on('uncaughtException', (err, origin) => {
log(`Process caught unhandled exception ${err} ${origin}`, 'ERROR')
})
}
main()
I was running into a situation where I would ctrl-c and would see the Server is closed log, followed by my command prompt, but then I would see more logs printed indicting that more api calls are being made.
Calling clearInterval(anInterval) inside exit() (before server.close()) seems to have solved the issue of the interval continuing even when the server is closed, so that's good. BUT:
From these node docs:
Closes all connections connected to this server which are not sending a request or waiting for a response.
I.e., I assume server.close() will not automatically kill the http request.
What happens to the http response information when my computer / node are no longer keeping track of the variable someVar?
What are the consequences of not specifically killing the thread that made the http request (and is waiting for the response)?
Is there a best practice for cancelling the request?
What does that consist of (i.e. would I ultimately tell the API's servers 'never mind please don't send anything', or would I just instruct node to not receive any new information)?

There are a couple things you should be aware of. First off, handling SIGINT is a complicated thing in software. Next, you should never need to call process.exit() as node will always exit when it's ready. If your process doesn't exit correctly, that means there is "work being done" that you need to stop. As soon as there is no more work to be done, node will safely exit on its own. This is best explained by example. Let's start with this simple program:
const interval = setInterval(() => console.log('Hello'), 5000);
If you run this program and then press Ctrl + C (which sends the SIGINT signal), node will automatically clear the interval for you and exit (well... it's more of a "fatal" exit, but that's beyond the scope of this answer). This auto-exit behavior changes as soon as you listen for the SIGINT event:
const interval = setInterval(() => console.log('Hello'), 5000);
process.on('SIGINT', () => {
console.log('SIGINT received');
});
Now if you run this program and press Ctrl + C, you will see the "SIGINT received" message, but the process will never exit. When you listen for SIGINT, you are telling node "hey, I have some things I need to cleanup before you exit". Node will then wait for any "ongoing work" to finish before it exits. If node doesn't eventually exit on it's own, it's telling you "hey, I can see that there are some things still running - you need to stop them before I'll exit".
Let's see what happens if we clear the interval:
const interval = setInterval(() => console.log('Hello'), 5000);
process.on('SIGINT', () => {
console.log('SIGINT received');
clearInterval(interval);
});
Now if you run this program and press Ctrl + C, you will see the "SIGINT received" message and the process will exit nicely. As soon as we clear the interval, node is smart enough to see that nothing is happening, and it exits. The important lesson here is that if you listen for SIGINT, it's on you to wait for any tasks to finish, and you should never need to call process.exit().
As far as how this relates to your code, you have 3 things going on:
http server listening for requests
an interval
outgoing https.get request
When your program exits, it's on you to clean up the above items. In the most simple of circumstances, you should do the following:
close the server: server.close();
clear the interval: clearInterval(anInterval);
destroy any outgoing request: request.destroy()
You may decide to wait for any incoming requests to finish before closing your server, or you may want to listen for the 'close' event on your outgoing request in order to detect any lost connection. That's on you. You should read about the methods and events which are available in the node http docs. Hopefully by now you are starting to see how SIGINT is a complicated matter in software. Good luck.

Related

Repeated http.get requests after receiving response

I'd like to repeat an http get request sequentially (one at a time after each one is complete, not in parallel).
async function run_http() {
...
http.get(url, (res) => {
let body = "";
res.on("data", (chunk) => {
body += chunk;
});
res.on("end", async() => {
await processResponse(body)
run_http()
});
});
...
}
run_http()
This is my code, it seems to work.
However, by running 'run_http' from inside itself, recursively, am I in danger of an unintential memory leak eventually leading to an OOM crash? 'processResponse' is very large so might not be as clean as I like.
OR, does the first run, just exit out and free everything since there is no await in front of run_http()
Normally (in another language), I'd just put everything in an infinite loop.
There is no memory leak here or stack build up.
The end event is asynchronous so the stack from the calling of run_http() is already cleaned up.
The previous http.get() will be able to garbage collect anything related to it as soon as the res.on('end', ...) handler returns.
does the first run, just exit out and free everything since there is no await in front of run_http()
Yes, once you call run_http() inside the res.on('end', ...) handler, that function starts the http request and then immediately returns, allowing everything in the previous call to run_http() to be garbage collected as nothing in its scope is still active.
It might be simpler to just use a promise-version of an http request (or promisify one yourself) and then use a regular loop with await:
import got from 'got;'
const delay = (t) => new Promise(resolve => setTimeout(resolve, t));
async run_loop() {
let more = true;
while(more) {
const result = await got(url);
await processResponse(result);
// perhaps do something that either does a break;
// or sets more to false to stop the loop
await delay(someTimeBeforeNextRequest);
}
}
Note, you rarely want to hammer some http server as fast as possible with no delay between requests, so I've inserted a delay here between requests.

kill a looped task nicely from a jest test

I have a worker method 'doSomeWork' that is called in a loop, based on a flag that will be changed if a signal to terminate is received.
let RUNNING = true;
let pid;
export function getPid() {
return pid;
}
export async function doSomeWork() {
console.log("Doing some work!");
}
export const run = async () => {
console.log("starting run process with PID %s", process.pid);
pid = process.pid;
while (RUNNING) {
await doSomeWork();
}
console.log("done");
};
run()
.then(() => {
console.log("finished");
})
.catch((e) => console.error("failed", e));
process.on("SIGTERM", () => {
RUNNING = false;
});
I am happy with this and now need to write a test: I want to
trigger the loop
inject a 'SIGTERM' to the src process
give the loop a chance to finish nicely
see 'finished' in the logs to know that the run method has been killed.
here is my attempt (not working) The test code all executes, but the src loop isn't killed.
import * as main from "../src/program";
describe("main", () => {
it("a test", () => {
main.run();
setTimeout(function () {
console.log("5 seconds have passed - killing now!");
const mainProcessPid = main.getPid();
process.kill(mainProcessPid, "SIGTERM");
}, 5000);
setTimeout(function () {
console.log("5 secs of tidy up time has passed");
}, 5000);
});
});
I think the setTimeout isn't blocking the test thread, but I am not sure how to achieve this in node/TS.
sandbox at https://codesandbox.io/s/exciting-voice-goncm
update sandbox with correct environment: https://codesandbox.io/s/keen-bartik-ltjtx
any help appreciated :-)
--update---
I now see that process.kill isn't doing what I thought it was - even when I pass in the PID. will try creating a process as a child of the test process, so I can send a signal to it then. https://medium.com/#NorbertdeLangen/communicating-between-nodejs-processes-4e68be42b917
You are getting this issue because the Environment in your codesandbox is create-react-app i.e. it's a client side script and not a server-side instance of node.
Recreate you project but select as your environment node HTTP server, this will give you a node environment where the node process functions will work e.g. process.kill. This is because the node environment is run in a server-side Docker container. See here for more info on Codesandbox's environments.

Concurrent in Nodejs

I have node server like below. And i push 2 request that
almost simultaneously(with the same url = "localhost:8080/").
And my question is: "Why the server wait for 1st request handle done, then 2st request will be handle"?
Output in console of my test:
Home..
Home..
(Notes: 2nd line will be display after 12second)
- server.js:
var express = require('express')
var app = express()
app.use(express.json())
app.get('/', async(request, response) => {
try {
console.log('Home ...')
await sleep(12000)
response.send('End')
} catch (error) {
response.end('Error')
}
})
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
var server = app.listen(8080, '127.0.0.1', async () => {
var host = server.address().address
var port = server.address().port
console.log('==================== START SERVER ===================')
console.log('* Running on http://%s:%s (Press CTRL+C to quit)', host, port)
console.log('* Create time of ' + new Date() + '\n')
})
Agreed with #unclexo - understand what the blocking/non-blocking calls are and optimize your code around that. If you really want to add capacity for parallel requests, you could consider leveraging the cluster module.
https://nodejs.org/api/cluster.html
This will kick off children processes and proxy the HTTP requests to those children. Each child can block as long as it wants and not affect the other processes (unless there is some race condition between them).
"Why the server wait for 1st request handle done, then 2st request
will be handle"?
This is not right for Node.Js. You are making a synchronous and asynchronous call in your code. The asynchronous call to the sleep() method waits for 12 seconds to be run.
Now this does not mean the asynchronous call waits until the synchronous call finishes. This is synchronous behavior. Note the following examples
asyncFunction1()
asyncFunction2()
asyncFunction3()
Here whenever the 1st function starts running, then node.js releases thread/block, then 2nd function starts running, again node releases thread, so the call to 3rd function starts. Each function returns response whenever it finishes and does not wait for other's return.
NodeJs is single-threaded or has non-blocking i/o architecture. Asynchronous feature is its beauty.
You are blocking the other request by calling await at app.get('/'....
If you want the sleep method to run asynchronously between requests, wrap it in a new async function. The code will be like this:
app.get('/', (request, response) => {
try {
print()
response.send('End')
} catch (error) {
response.end('Error')
}
})
async function print() {
await sleep(12000)
console.log('Home ...')
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
This way, if you send a request and then send another request, the second request's print() function will be executed without waiting for 12 seconds from the first request. Therefore, there will be 2 Home ... printed at around 12 seconds after the request sent. To have more understanding about this, you should learn about NodeJS Event Loop.

How to deal with acks for asynchronous types of tasks

Right now I have a RabbitMQ queue setup, and I have several workers that are listening for events pushed onto this queue.
The events are essentially string urls (e.g., "https://www.youtube.com"), and it'll be processed through puppeteer.
What I'm wondering is given that puppeteer is asynchronous, is there a way for me return an ack once I've finished all the asynchronous stuff.
Right now, I think my workers listening to the queue are hanging because the ack isn't being fired.
edit -- code below is what I pretty much call within the consume part of rabbitmq. Because this is async, it kinda just goes past this operation and just immediately acks.
(async () => {
const args = {
evaluatePage: (() => ({
title: $('title').text(),
})),
persistCache: true,
cache,
onSuccess: (result => {
console.log('value for result -- ', result.result.title);
}),
};
// we need to first setup the crawler
// then we can start sending information to it
const crawler = await HCCrawler.launch(args);
crawler.queue(urls);
await crawler.onIdle();
await crawler.close();
})();

How to work around amqplib's Channel#consume odd signature?

I am writing a worker that uses amqplib's Channel#consume method. I want this worker to wait for jobs and process them as soon as they appear in the queue.
I wrote my own module to abstract away ampqlib, here are the relevant functions for getting a connection, setting up the queue and consuming a message:
const getConnection = function(host) {
return amqp.connect(host);
};
const createChannel = function(conn) {
connection = conn;
return conn.createConfirmChannel();
};
const assertQueue = function(channel, queue) {
return channel.assertQueue(queue);
};
const consume = Promise.method(function(channel, queue, processor) {
processor = processor || function(msg) { if (msg) Promise.resolve(msg); };
return channel.consume(queue, processor)
});
const setupQueue = Promise.method(function setupQueue(queue) {
const amqp_host = 'amqp://' + ((host || process.env.AMQP_HOST) || 'localhost');
return getConnection(amqp_host)
.then(conn => createChannel(conn)) // -> returns a `Channel` object
.tap(channel => assertQueue(channel, queue));
});
consumeJob: Promise.method(function consumeJob(queue) {
return setupQueue(queue)
.then(channel => consume(channel, queue))
});
My problem is with Channel#consume's odd signature. From http://www.squaremobius.net/amqp.node/channel_api.html#channel_consume:
#consume(queue, function(msg) {...}, [options, [function(err, ok) {...}]])
The callback is not where the magic happens, the message's processing should actually go in the second argument and that breaks the flow of promises.
This is how I planned on using it:
return queueManager.consumeJob(queue)
.then(msg => {
// do some processing
});
But it doesn't work. If there are no messages in the queue, the promise is rejected and then if a message is dropped in the queue nothing happens. If there is a message, only one message is processed and then the worker stalls because it exited the "processor" function from the Channel#consume call.
How should I go about it? I want to keep the queueManager abstraction so my code is easier to reason about but I don't know how to do it... Any pointers?
As #idbehold said, Promises can only be resolved once. If you want to process messages as they come in, there is no other way than to use this function. Channel#get will only check the queue once and then return; it wouldn't work for a scenario where you need a worker.
just as an option. You can present your application as a stream of some messages(or events). There is a library for this http://highlandjs.org/#examples
Your code should look like this(it isn`t a finished sample, but I hope it illustrates the idea):
let messageStream = _((push, next) => {
consume(queue, (msg) => {
push(null, msg)
})
)
// now you can operate with your stream in functional style
message.map((msg) => msg + 'some value').each((msg) => // do something with msg)
This approach provides you a lot of primitives for synchronization and transformation
http://highlandjs.org/#examples

Resources