Repeated http.get requests after receiving response - node.js

I'd like to repeat an http get request sequentially (one at a time after each one is complete, not in parallel).
async function run_http() {
...
http.get(url, (res) => {
let body = "";
res.on("data", (chunk) => {
body += chunk;
});
res.on("end", async() => {
await processResponse(body)
run_http()
});
});
...
}
run_http()
This is my code, it seems to work.
However, by running 'run_http' from inside itself, recursively, am I in danger of an unintential memory leak eventually leading to an OOM crash? 'processResponse' is very large so might not be as clean as I like.
OR, does the first run, just exit out and free everything since there is no await in front of run_http()
Normally (in another language), I'd just put everything in an infinite loop.

There is no memory leak here or stack build up.
The end event is asynchronous so the stack from the calling of run_http() is already cleaned up.
The previous http.get() will be able to garbage collect anything related to it as soon as the res.on('end', ...) handler returns.
does the first run, just exit out and free everything since there is no await in front of run_http()
Yes, once you call run_http() inside the res.on('end', ...) handler, that function starts the http request and then immediately returns, allowing everything in the previous call to run_http() to be garbage collected as nothing in its scope is still active.
It might be simpler to just use a promise-version of an http request (or promisify one yourself) and then use a regular loop with await:
import got from 'got;'
const delay = (t) => new Promise(resolve => setTimeout(resolve, t));
async run_loop() {
let more = true;
while(more) {
const result = await got(url);
await processResponse(result);
// perhaps do something that either does a break;
// or sets more to false to stop the loop
await delay(someTimeBeforeNextRequest);
}
}
Note, you rarely want to hammer some http server as fast as possible with no delay between requests, so I've inserted a delay here between requests.

Related

Node Express. Continue function execution after res.render

It is necessary to build the page and then continue the execution of the function, which takes a lot of time. But res.render waits for the function to execute no matter where it is called.
I want the page to start building without waiting for the data to be processed.
Here is my code:
let promise = new Promise((resolve, reject) => {
let wb = new Workbook();
let ws = sheet_from_array_of_arrays(public_data); //May take up to 5 seconds
wb.SheetNames.push(ws_name);
wb.Sheets[ws_name] = ws;
XLSX.writeFile(wb, '/tmp/' + name); //May take up to 10 seconds
resolve('End of promise!!!');
})
res.render('ihelp/lists/person_selection_view_data', {
data: public_data,
name
});
promise.then(answer => { console.log(answer) });
How can i do this?
The code that you want to run looks to be synchronous (I don't see any promises or callbacks related to it). If it takes multiple seconds to run, that will mean that it will block your entire app for that amount of time.
This means that asynchronous functions, like res.render(), will not complete until the processing is done. Even if you change the order:
res.render(...);
long_running_code();
Not only will this not make res.render() send back a response before long_running_code is started, it will also stop your app from responding to new incoming requests (and/or block any current requests) until it's done.
If you have CPU-intensive code that will block the event look, take a look at worker_threads, which can be used to offload CPU-intensive code to separate threads, and therefore keep your main JS thread free to handle the HTTP-part of your app.
The res.render takes a 3rd argument for a callback. If supplied, Express will not send the rendered html string automatically. Any extended process can be executed after that.
res.render('ihelp/lists/person_selection_view_data', {data: public_data, name}, function (err, html) {
res.send(html);
let wb = new Workbook();
let ws = sheet_from_array_of_arrays(public_data); //May take up to 5 seconds
wb.SheetNames.push(ws_name);
wb.Sheets[ws_name] = ws;
XLSX.writeFile(wb, '/tmp/' + name); //May take up to 10 seconds
})
The executor function (inside the Promise constructor) is executed synchronously, and if this takes too long, you can move it into another Promise:
res.render('ihelp/lists/person_selection_view_data', {
data: public_data,
name
});
Promise.resolve().then(function() {
new Promise((resolve, reject) => {
// your executor code
}).then(answer => { console.log(answer) });
});

Possible to make an event handler wait until async / Promise-based code is done?

I am using the excellent Papa Parse library in nodejs mode, to stream a large (500 MB) CSV file of over 1 million rows, into a slow persistence API, that can only take one request at a time. The persistence API is based on Promises, but from Papa Parse, I receive each parsed CSV row in a synchronous event like so: parseStream.on("data", row => { ... }
The challenge I am facing is that Papa Parse dumps its CSV rows from the stream so fast that my slow persistence API can't keep up. Because Papa is synchronous and my API is Promise-based, I can't just call await doDirtyWork(row) in the on event handler, because sync and async code doesn't mix.
Or can they mix and I just don't know how?
My question is, can I make Papa's event handler wait for my API call to finish? Kind of doing the persistence API request directly in the on("data") event, making the on() function linger around somehow until the dirty API work is done?
The solution I have so far is not much better than using Papa's non-streaming mode, in terms of memory footprint. I actually need to queue up the torrent of on("data") events, in form of generator function iterations. I could have also queued up promise factories in an array and work it off in a loop. Any which way, I end up saving almost the entire CSV file as huge collection of future Promises (promise factories) in memory, until my slow API calls have worked all the way through.
async importCSV(filePath) {
let parsedNum = 0, processedNum = 0;
async function* gen() {
let pf = yield;
do {
pf = yield await pf();
} while (typeof pf === "function");
};
var g = gen();
g.next();
await new Promise((resolve, reject) => {
try {
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
parseStream.on("data", row => {
// Received a CSV row from Papa.parse()
try {
console.log("PA#", parsedNum, ": parsed", row.filter((e, i) => i <= 2 ? e : undefined)
);
parsedNum++;
// Simulate some really slow async/await dirty work here, for example
// send requests to a one-at-a-time persistence API
g.next(() => { // don't execute now, call in sequence via the generator above
return new Promise((res, rej) => {
console.log(
"DW#", processedNum, ": dirty work START",
row.filter((e, i) => i <= 2 ? e : undefined)
);
setTimeout(() => {
console.log(
"DW#", processedNum, ": dirty work STOP ",
row.filter((e, i) => i <= 2 ? e : undefined)
);
processedNum++;
res();
}, 1000)
})
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
parseStream.on("finish", () => {
console.log(`Parsed ${parsedNum} rows`);
resolve();
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
while(!(await g.next()).done);
}
So why the rush Papa? Why not allow me to work down the file a bit slower -- the data in the original CSV file isn't gonna run away, we have hours to finish the streaming, why hammer me with on("data") events that I can't seem to slow down?
So what I really need is for Papa to become more of a grandpa, and minimize or eliminate any queuing or buffering of CSV rows. Ideally I would be able to completely sync Papa's parsing events with the speed (or lack thereof) of my API. So if it weren't for the dogma that async code can't make sync code "sleep", I would ideally send each CSV row to the API inside the Papa event, and only then return control to Papa.
Suggestions? Some kind of "loose coupling" of the event handler with the slowness of my async API is fine too. I don't mind if a few hundred rows get queued up. But when tens of thousands pile up, I will run out of heap fast.
Why hammer me with on("data") events that I can't seem to slow down?
You can, you just were not asking papa to stop. You can do this by calling stream.pause(), then later stream.resume() to make use of Node stream's builtin back-pressure.
However, there's a much nicer API to use than dealing with this on your own in callback-based code: use the stream as an async iterator! When you await in the body of a for await loop, the generator has to pause as well. So you can write
async importCSV(filePath) {
let parsedNum = 0;
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
for await (const row of parseStream) {
// Received a CSV row from Papa.parse()
const data = row.filter((e, i) => i <= 2 ? e : undefined);
console.log("PA#", parsedNum, ": parsed", data);
parsedNum++;
await dirtyWork(data);
}
console.log(`Parsed ${parsedNum} rows`);
}
importCSV('sample.csv').catch(console.error);
let processedNum = 0;
function dirtyWork(data) {
// Simulate some really slow async/await dirty work here,
// for example send requests to a one-at-a-time persistence API
return new Promise((res, rej) => {
console.log("DW#", processedNum, ": dirty work START", data)
setTimeout(() => {
console.log("DW#", processedNum, ": dirty work STOP ", data);
processedNum++;
res();
}, 1000);
});
}
Async code in JavaScript can sometimes be a little hard to grok. It's important to remember how Node operates handles concurrency.
The node process is single-threaded, but it uses a concept called an event loop. The consequence of this is that async code and callbacks are essentially equivalent representations of the same thing.
Of course, you need an async function to use await, but your callback from Papa Parse can be an async function:
parse.on("data", async row => {
await sync(row)
})
Once the await operation completes, the arrow function ends, and all references to row will be eliminated, so the garbage collector can successfully collect row, releasing that memory.
The effect this has is concurrently executing sync every time a row is parsed, so if you can only sync one record at a time, then I would recommend wrapping the sync function in a debouncer.

How to stop class/functions from continuing to execute code in Node.js

I have made a few questions about this already, but maybe this question would result in better answers(i'm bad at questions)
I have one class, called FOO, where I call an async Start function, that starts the process that the class FOO was made to do. This FOO class does a lot of different calculations, as well as posting/getting the calculations using the node.js "requets" module.
-I'm using electron UI's (by pressing buttons, that executes a function etc..) to create and Start the FOO class-
class FOO {
async Start(){
console.log("Start")
await this.GetCalculations();
await this.PostResults()
}
async PostResults(){
//REQUESTS STUFF
const response = {statusCode: 200} //Request including this.Cal
console.log(response)
//Send with IPC
//ipc.send("status", response.statusCode)
}
async GetCalculations(){
for(var i = 0; i < 10; i++){
await this.GetCalculation()
}
console.log(this.Cal)
}
async GetCalculation(){
//REQUEST STUFF
const response = {body: "This is a calculation"} //Since request module cant be used in here.
if(!this.Cal) this.Cal = [];
this.Cal.push(response)
}
}
var F1 = new FOO();
F1.Start();
Now imagine this code but with A LOT more steps and more requests ect. where it might take seconds/minutes to finish all tasks in the class FOO.
-Electron got a stop button that the user can hit when he wants the calculations to stop-
How would I go about stopping the entire class from continuing?
In some cases, the user might stop and start right after, so I have been trying to figure out a way to STOP the code from running entirely, but where the user would still be able to create a new class and start that, without the other class running in the background.
I have been thinking about "tiny-worker" module, but on the creation of the worker, it takes 1-2 seconds, and this decreases the purpose of a fast calculation program.
Hopefully, this question is better than the other ones.
Update:
Applying the logic behind the different answers I came up with this:
await Promise.race([this.cancelDeferred, new Promise( async (res, req) => {
var options ={
uri: "http://httpstat.us/200?sleep=5000"
}
const response = await request(options);
console.log(response.statusCode)
})])
But even when the
this.cancelDeferred.reject(new Error("User Stop"));
Is called, the response from the request "statuscode" still gets printed out when the request is finished.
The answares I got, shows some good logic, that I didn't know about, but the problem is that they all only stop the request, the code hanlding the request response will still execute, and in some cases trigger a new request. This means that I have to spam the Stop function until it fully stops it.
Framing the problem as a whole bunch of function calls that make serialized asynchronous operations and you want the user to be able to hit a Cancel/Stop button and cause the chain of asynchronous operations to abort (e.g. stop doing any more and bail on getting whatever eventual result it was trying to get).
There are several schemes I can think of.
1. Each operation checks some state property. You make these operations all part of some object that has a aborted state property. The code for every single asynchronous operation must check that state property after it completes. The Cancel/Stop button can be hooked up to set this state variable. When the current asynchronous operation finishes, it will abort the rest of the operation. If you are using promises for sequencing your operations (which it appears you are), then you can reject the current promise causing the whole chain to abort.
2. Create some async wrapper function that incorporates the cancel state for you automatically. If all your actual asynchronous operations are of some small group of operations (such as all using the request module), then you can create a wrapper function around whichever request operations you use that when any operation completes, it checks the state variable for you or merges it into the returned promise and if it has been stopped, it rejects the returned promise which causes the whole promise chain to abort. This has the advantage that you only have to do the if checks in one place and the rest of your code just switches to using your wrapped version of the request function instead of the regular one.
3. Put all the async steps/logic into another process that you can kill. This seems (to me) like using a sledge hammer for a small problem, but you could launch a child_process (which can also be a node.js program) to do your multi-step async operations and when the user presses stop/cancel, then you just kill the child process. Your code that is monitoring the child_process and waiting for a result will either get a final result or an indication that it was stopped. You probably want to use an actual process here rather than worker threads so you get a full and complete abort and so all memory and other resources used by that process gets properly reclaimed.
Please note that none of these solutions use any sort of infinite loop or polling loop.
For example, suppose your actual asynchronous operation was using the request() module.
You could define a high scoped promise that gets rejected if the user clicks the cancel/stop button:
function Deferred() {
let p = this.promise = new Promise((resolve, reject) => {
this.resolve = resolve;
this.reject = reject;
});
this.then = this.promise.then.bind(p);
this.catch = this.promise.catch.bind(p);
this.finally = this.promise.finally.bind(p);
}
// higher scoped variable that persists
let cancelDeferred = new Deferred();
// function that gets called when stop button is hit
function stop() {
// reject the current deferred which will cause
// existing operations to cancel
cancelDeferred.reject(new Error("User Stop"));
// put a new deferred in place for future operations
cancelDeferred = new Deferred();
}
const rp = require('request-promise');
// wrapper around request-promise
function rpWrap(options) {
return Promise.race([cancelDeferred, rp(options)]);
}
Then, you just call rpWrap() everywhere instead of calling rp() and it will automatically reject if the stop button is hit. You need to then code your asynchronous logic so that if any reject, it will abort (which is generally the default and automatic behavior for promises anywa).
Asynchronous functions do not run code in a separate thread, they just encapsulate an asynchronous control flow in syntactic sugar and return an object that represents its completion state (i.e. pending / resolved / rejected).
The reason for making this distinction is that once you start the control flow by calling the async function, it must continue until completion, or until the first uncaught error.
If you want to be able to cancel it, you must declare a status flag and check it at all or some sequence points, i.e. before an await expression, and return early (or throw) if the flag is set. There are three ways to do this.
You can provide a cancel() function to the caller which will be able set the status.
You can accept an isCancelled() function from the caller which will return the status, or conditionally throw based on the status.
You can accept a function that returns a Promise which will throw when cancellation is requested, then at each of your sequence points, change await yourAsyncFunction(); to await Promise.race([cancellationPromise, yourAsyncFunction()]);
Below is an example of the last approach.
async function delay (ms, cancellationPromise) {
return Promise.race([
cancellationPromise,
new Promise(resolve => {
setTimeout(resolve, ms);
})
]);
}
function cancellation () {
const token = {};
token.promise = new Promise((_, reject) => {
token.cancel = () => reject(new Error('cancelled'));
});
return token;
}
const myCancellation = cancellation();
delay(500, myCancellation.promise).then(() => {
console.log('finished');
}).catch(error => {
console.log(error.message);
});
setTimeout(myCancellation.cancel, Math.random() * 1000);

Getting a JSON from a website Node JS

So I'm fairly new to node js, and am having trouble wrapping my head around asynchronous programming. I'm trying to get a JSON from a website and pass it to a variable for use later, to test I have been using this code:
var https = require("https");
var a;
function getter(url){
var request = https.get(url, function(response){
var body = "";
response.on("data", function(chunk){
body += chunk;
});
response.on("end", function(){
if(response.statusCode === 200){
try{
a = JSON.parse(body);
}catch(err){
console.log(err);
}
}
})
})
};
getter('https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY');
console.log(a);
When I run this I get a as undefined, which seems to make sense from what I've read. But I'm unclear as to what to do from here. How would I go about passing this JSON into a variable?
http.get is asynchronous and executes the event handlers when the events occur. When you call getter() this function immediately returns, ie it does not wait for the events and the next statement console.log(a) is executed.
Furthermore, js is single threaded, and the current execution stack is never interrupted for any other event/callback or whatsoever. So the event handlers can only run if the current execution has come to an end, ie contains noch more statements. Thus, your console.log() will always be executed before any eventhandler of the request, thus a is still undefined.
If you want to continue after the request finished, you have to do it from the eventhandler.
See this excellent presentation for some more details https://youtu.be/8aGhZQkoFbQ

nodejs concurrency synchronous execution

I've two WebSockets getting data asynchronously, every time I get some message from the sockets I execute some code in CompareData.
The problem is that CompareData should be executed synchronously, or (better) only if it is not already running
This is my code:
function CompareData(data) {
console.log('data ', data);
AsyncFunction();
};
ws1 = new WebSocket(WS1_URL);
ws2 = new WebSocket(WS2_URL);
ws1.on('message', (data) => {
CompareData(data);
});
ws2.on('message', (data) => {
CompareData(data);
});
Can you help me, please? I'm very new to NodeJs
Node.js is single threaded. So you don't really get true concurrency issues occurring in Node programs as you might in other languages. In your example, there can only be at most one WebSocket callback for CompareData occurring at any given time.
You should not make synchronous call in node.js but you can make those call sequential. See below example might be helpful.
var messages = [];
var inProgress = false;
function CompareData(data) {
return new Promise((resolve, reject) => {
// do some stuff and resolve
setTimeout(() => {
resolve(data);
}, 1000);
});
};
const start = async () => {
if (!inProgress) {
if (messages.length !== 0) {
inProgress = true;
try {
const data = await CompareData(messages.shift());
console.log(data);
} catch (error) {
console.log(error);
}
inProgress = false;
await start();
}else{
console.log('Process Done');
}
}
}
const handler = (data) => {
messages.push(data);
start();
}
handler(1);
handler(2);
handler(3);
handler(4);
// ws1 = new WebSocket(WS1_URL);
// ws2 = new WebSocket(WS2_URL);
// ws1.on('message', handler);
// ws2.on('message', handler);
You should use some mutex in order to avoid that two async operations of compareData are executed at the same time, like node-mutex or mutexify.
My suggestions are:
First of all, you need to know when CompareData is finished. Reorganize your code to use promises or callbacks. If you're using third-party async functions, I'm almost sure they provide some feedback on completion - This is a must have in async world
Add inProgress = false flag somewhere to serve for you as simple lock. As someone posted, JS is single-threaded and you're guaranteed that your code won't get interrupted in the middle of operation. Thanks to that you can use really simple locks instead of complicated os-based mutexes known from multithreaded
langs.
In ws.on(...) check if inProgress is set. If not, lock it and run CompareData
In CompareData completion callback or on promise resolution set inProgress back to false, so you're no longer ignoring incoming data.
If you can simply discard the data, there is no need to complicate this scenario with extra queues, mutexes, etc.
If you need to serve it all, then queue incoming data and serve next piece after completion callback is fired.
This is basically what Rahul's suggests, but he uses features that are not established in current version of standard, so don't use it if you're not transpiling your code.

Resources