nodejs functional programming with generators and promises - node.js

Summary
Is functional programming in node.js general enough? can it be used to do a real-world problem of handling small bulks of db records without loading all records in memory using toArray (thus going out of memory). You can read this criticism for background. We want to demonstrate Mux and DeMux and fork/tee/join capabilities of such node.js libraries with async generators.
Context
I'm questioning the validity and generality of functional programming in node.js using any functional programming tool (like ramda, lodash, and imlazy) or even custom.
Given
Millions of records from a MongoDB cursor that can be iterated using await cursor.next()
You might want to read more about async generators and for-await-of.
For fake data one can use (on node 10)
function sleep(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
async function* getDocs(n) {
for(let i=0;i<n;++i) {
await sleep(1);
yield {i: i, t: Date.now()};
}
}
let docs=getDocs(1000000);
Wanted
We need
first document
last document
number of documents
split into batches/bulks of n documents and emit a socket.io event for that bulk
Make sure that first and last documents are included in the batches and not consumed.
Constraints
The millions of records should not be loaded into ram, one should iterate on them and hold at most only a batch of them.
The requirement can be done using usual nodejs code, but can it be done using something like applyspec as in here.
R.applySpec({
first: R.head(),
last: R.last(),
_:
R.pipe(
R.splitEvery(n),
R.map( (i)=> {return "emit "+JSON.stringify(i);})
)
})(input)

To show how this could be modeled with vanilla JS, we can introduce the idea of folding over an async generator that produces things that can be combined together.
const foldAsyncGen = (of, concat, empty) => (step, fin) => async asyncGen => {
let acc = empty
for await (const x of asyncGen) {
acc = await step(concat(acc, of(x)))
}
return await fin(acc)
}
Here the arguments are broken up into three parts:
(of, concat, empty) expects a function to produce "combinable" thing, a function that will combine two "combinable" things and an empty/initial instance of a "combinable" thing
(step, fin) expects a function that will take a "combinable" thing at each step and produce a Promise of a "combinable" thing to be used for the next step and a function that will take the final "combinable" thing after the generator has exhausted and produce a Promise of the final result
async asyncGen is the async generator to process
In FP, the idea of a "combinable" thing is known as a Monoid, which defines some laws that detail the expected behaviour of combining two of them together.
We can then create a Monoid that will be used to carry through the first, last and batch of values when stepping through the generator.
const Accum = (first, last, batch) => ({
first,
last,
batch,
})
Accum.empty = Accum(null, null, []) // an initial instance of `Accum`
Accum.of = x => Accum(x, x, [x]) // an `Accum` instance of a single value
Accum.concat = (a, b) => // how to combine two `Accum` instances together
Accum(a.first == null ? b.first : a.first, b.last, a.batch.concat(b.batch))
To capture the idea of flushing the accumulating batches we can create another function that takes an onFlush function that will perform some action in a returned Promise with the values being flushed, and a size n of when to flush the batch.
Accum.flush = onFlush => n => acc =>
acc.batch.length < n ? Promise.resolve(acc)
: onFlush(acc.batch.slice(0, n))
.then(_ => Accum(acc.first, acc.last, acc.batch.slice(n)))
We can also now define how we can fold over the Accum instances.
Accum.foldAsyncGen = foldAsyncGen(Accum.of, Accum.concat, Accum.empty)
With the above utilities defined, we can now use them to model your specific problem.
const emit = batch => // This is an analog of where you would emit your batches
new Promise((resolve) => resolve(console.log(batch)))
const flushEmit = Accum.flush(emit)
// flush and emit every 10 items, and also the remaining batch when finished
const fold = Accum.foldAsyncGen(flushEmit(10), flushEmit(0))
And finally run with your example.
fold(getDocs(100))
.then(({ first, last })=> console.log('done', first, last))

I'm not sure it's fair to imply that functional programming was going to offer any advantages over imperative programming in term of performance when dealing with huge amount of data.
I think you need to add another tool in your toolkit and that may be RxJS.
RxJS is a library for composing asynchronous and event-based programs by using observable sequences.
If you're not familiar with RxJS or reactive programming in general, my examples will definitely look weird but I think it would be a good investment to get familiar with these concepts
In your case, the observable sequence is your MongoDB instance that emits records over time.
I'm gonna fake your db:
var db = range(1, 5);
The range function is a RxJS thing that will emit a value in the provided range.
db.subscribe(n => {
console.log(`record ${n}`);
});
//=> record 1
//=> record 2
//=> record 3
//=> record 4
//=> record 5
Now I'm only interested in the first and last record.
I can create an observable that will only emit the first record, and create another one that will emit only the last one:
var db = range(1, 5);
var firstRecord = db.pipe(first());
var lastRecord = db.pipe(last());
merge(firstRecord, lastRecord).subscribe(n => {
console.log(`record ${n}`);
});
//=> record 1
//=> record 5
However I also need to process all records in batches: (in this example I'm gonna create batches of 10 records each)
var db = range(1, 100);
var batches = db.pipe(bufferCount(10))
var firstRecord = db.pipe(first());
var lastRecord = db.pipe(last());
merge(firstRecord, batches, lastRecord).subscribe(n => {
console.log(`record ${n}`);
});
//=> record 1
//=> record 1,2,3,4,5,6,7,8,9,10
//=> record 11,12,13,14,15,16,17,18,19,20
//=> record 21,22,23,24,25,26,27,28,29,30
//=> record 31,32,33,34,35,36,37,38,39,40
//=> record 41,42,43,44,45,46,47,48,49,50
//=> record 51,52,53,54,55,56,57,58,59,60
//=> record 61,62,63,64,65,66,67,68,69,70
//=> record 71,72,73,74,75,76,77,78,79,80
//=> record 81,82,83,84,85,86,87,88,89,90
//=> record 91,92,93,94,95,96,97,98,99,100
//=> record 100
As you can see in the output, it has emitted:
The first record
Ten batches of 10 records each
The last record
I'm not gonna try to solve your exercise for you and I'm not too familiar with RxJS to expand too much on this.
I just wanted to show you another way and let you know that it is possible to combine this with functional programming.
Hope it helps

I think I may have developed an answer for you some time ago and it's called scramjet. It's lightweight (no thousands of dependencies in node_modules), it's easy to use and it does make your code very easy to understand and read.
Let's start with your case:
DataStream
.from(getDocs(10000))
.use(stream => {
let counter = 0;
const items = new DataStream();
const out = new DataStream();
stream
.peek(1, async ([first]) => out.whenWrote(first))
.batch(100)
.reduce(async (acc, result) => {
await items.whenWrote(result);
return result[result.length - 1];
}, null)
.then((last) => out.whenWrote(last))
.then(() => items.end());
items
.setOptions({ maxParallel: 1 })
.do(arr => counter += arr.length)
.each(batch => writeDataToSocketIo(batch))
.run()
.then(() => (out.end(counter)))
;
return out;
})
.toArray()
.then(([first, last, count]) => ({ first, count, last }))
.then(console.log)
;
So I don't really agree that javascript FRP is an antipattern and I don't think I have the only answer to that, but while developing the first commits I found that the ES6 arrow syntax and async/await written in a chained fashion makes the code easily understandable.
Here's another example of scramjet code from OpenAQ specifically this line in their fetch process:
return DataStream.fromArray(Object.values(sources))
// flatten the sources
.flatten()
// set parallel limits
.setOptions({maxParallel: maxParallelAdapters})
// filter sources - if env is set then choose only matching source,
// otherwise filter out inactive sources.
// * inactive sources will be run if called by name in env.
.use(chooseSourcesBasedOnEnv, env, runningSources)
// mark sources as started
.do(markSourceAs('started', runningSources))
// get measurements object from given source
// all error handling should happen inside this call
.use(fetchCorrectedMeasurementsFromSourceStream, env)
// perform streamed save to DB and S3 on each source.
.use(streamMeasurementsToDBAndStorage, env)
// mark sources as finished
.do(markSourceAs('finished', runningSources))
// convert to measurement report format for storage
.use(prepareCompleteResultsMessage, fetchReport, env)
// aggregate to Array
.toArray()
// save fetch log to DB and send a webhook if necessary.
.then(
reportAndRecordFetch(fetchReport, sources, env, apiURL, webhookKey)
);
It describes everything that happens with every source of data. So here's my proposal up for questioning. :)

here are two solutions using RxJs and scramjet.
here is an RxJs solution
the trick was to use share() so that first() and last() won't consumer from the iterator, forkJoin was used to combine them to emit the done event with those values.
function ObservableFromAsyncGen(asyncGen) {
return Rx.Observable.create(async function (observer) {
for await (let i of asyncGen) {
observer.next(i);
}
observer.complete();
});
}
async function main() {
let o=ObservableFromAsyncGen(getDocs(100));
let s = o.pipe(share());
let f=s.pipe(first());
let e=s.pipe(last());
let b=s.pipe(bufferCount(13));
let c=s.pipe(count());
b.subscribe(log("bactch: "));
Rx.forkJoin(c, f, e, b).subscribe(function(a){console.log(
"emit done with count", a[0], "first", a[1], "last", a[2]);})
}
here is a scramjet but that is not pure (functions have side effects)
async function main() {
let docs = getDocs(100);
let first, last, counter;
let s0=Sj.DataStream
.from(docs)
.setOptions({ maxParallel: 1 })
.peek(1, (item)=>first=item[0])
.tee((s)=>{
s.reduce((acc, item)=>acc+1, 0)
.then((item)=>counter=item);
})
.tee((s)=>{
s.reduce((acc, item)=>item)
.then((item)=>last=item);
})
.batch(13)
.map((batch)=>console.log("emit batch"+JSON.stringify(batch));
await s0.run();
console.log("emit done "+JSON.stringify({first: first, last:last, counter:counter}));
}
I'll work with #michaƂ-kapracki to develop a pure version of it.

For this exact kind of problems I made this library: ramda-generators
Hopefully it's what you are looking for: lazy evaluation of streams in functional JavaScript
Only problem is that I have no idea on how to take the last element and the amount of elements from a stream without re-running the generators
A possible implementation that compute the result without parsing the whole DB in memory could be this:
Try it on repl.it
const RG = require("ramda-generators");
const R = require("ramda");
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
const getDocs = amount => RG.generateAsync(async (i) => {
await sleep(1);
return { i, t: Date.now() };
}, amount);
const amount = 1000000000;
(async (chunkSize) => {
const first = await RG.headAsync(getDocs(amount).start());
const last = await RG.lastAsync(getDocs(amount).start()); // Without this line the print of the results would start immediately
const DbIterator = R.pipe(
getDocs(amount).start,
RG.splitEveryAsync(chunkSize),
RG.mapAsync(i => "emit " + JSON.stringify(i)),
RG.mapAsync(res => ({ first, last, res })),
);
for await (const el of DbIterator())
console.log(el);
})(100);

Related

Possible to make an event handler wait until async / Promise-based code is done?

I am using the excellent Papa Parse library in nodejs mode, to stream a large (500 MB) CSV file of over 1 million rows, into a slow persistence API, that can only take one request at a time. The persistence API is based on Promises, but from Papa Parse, I receive each parsed CSV row in a synchronous event like so: parseStream.on("data", row => { ... }
The challenge I am facing is that Papa Parse dumps its CSV rows from the stream so fast that my slow persistence API can't keep up. Because Papa is synchronous and my API is Promise-based, I can't just call await doDirtyWork(row) in the on event handler, because sync and async code doesn't mix.
Or can they mix and I just don't know how?
My question is, can I make Papa's event handler wait for my API call to finish? Kind of doing the persistence API request directly in the on("data") event, making the on() function linger around somehow until the dirty API work is done?
The solution I have so far is not much better than using Papa's non-streaming mode, in terms of memory footprint. I actually need to queue up the torrent of on("data") events, in form of generator function iterations. I could have also queued up promise factories in an array and work it off in a loop. Any which way, I end up saving almost the entire CSV file as huge collection of future Promises (promise factories) in memory, until my slow API calls have worked all the way through.
async importCSV(filePath) {
let parsedNum = 0, processedNum = 0;
async function* gen() {
let pf = yield;
do {
pf = yield await pf();
} while (typeof pf === "function");
};
var g = gen();
g.next();
await new Promise((resolve, reject) => {
try {
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
parseStream.on("data", row => {
// Received a CSV row from Papa.parse()
try {
console.log("PA#", parsedNum, ": parsed", row.filter((e, i) => i <= 2 ? e : undefined)
);
parsedNum++;
// Simulate some really slow async/await dirty work here, for example
// send requests to a one-at-a-time persistence API
g.next(() => { // don't execute now, call in sequence via the generator above
return new Promise((res, rej) => {
console.log(
"DW#", processedNum, ": dirty work START",
row.filter((e, i) => i <= 2 ? e : undefined)
);
setTimeout(() => {
console.log(
"DW#", processedNum, ": dirty work STOP ",
row.filter((e, i) => i <= 2 ? e : undefined)
);
processedNum++;
res();
}, 1000)
})
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
parseStream.on("finish", () => {
console.log(`Parsed ${parsedNum} rows`);
resolve();
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
while(!(await g.next()).done);
}
So why the rush Papa? Why not allow me to work down the file a bit slower -- the data in the original CSV file isn't gonna run away, we have hours to finish the streaming, why hammer me with on("data") events that I can't seem to slow down?
So what I really need is for Papa to become more of a grandpa, and minimize or eliminate any queuing or buffering of CSV rows. Ideally I would be able to completely sync Papa's parsing events with the speed (or lack thereof) of my API. So if it weren't for the dogma that async code can't make sync code "sleep", I would ideally send each CSV row to the API inside the Papa event, and only then return control to Papa.
Suggestions? Some kind of "loose coupling" of the event handler with the slowness of my async API is fine too. I don't mind if a few hundred rows get queued up. But when tens of thousands pile up, I will run out of heap fast.
Why hammer me with on("data") events that I can't seem to slow down?
You can, you just were not asking papa to stop. You can do this by calling stream.pause(), then later stream.resume() to make use of Node stream's builtin back-pressure.
However, there's a much nicer API to use than dealing with this on your own in callback-based code: use the stream as an async iterator! When you await in the body of a for await loop, the generator has to pause as well. So you can write
async importCSV(filePath) {
let parsedNum = 0;
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
for await (const row of parseStream) {
// Received a CSV row from Papa.parse()
const data = row.filter((e, i) => i <= 2 ? e : undefined);
console.log("PA#", parsedNum, ": parsed", data);
parsedNum++;
await dirtyWork(data);
}
console.log(`Parsed ${parsedNum} rows`);
}
importCSV('sample.csv').catch(console.error);
let processedNum = 0;
function dirtyWork(data) {
// Simulate some really slow async/await dirty work here,
// for example send requests to a one-at-a-time persistence API
return new Promise((res, rej) => {
console.log("DW#", processedNum, ": dirty work START", data)
setTimeout(() => {
console.log("DW#", processedNum, ": dirty work STOP ", data);
processedNum++;
res();
}, 1000);
});
}
Async code in JavaScript can sometimes be a little hard to grok. It's important to remember how Node operates handles concurrency.
The node process is single-threaded, but it uses a concept called an event loop. The consequence of this is that async code and callbacks are essentially equivalent representations of the same thing.
Of course, you need an async function to use await, but your callback from Papa Parse can be an async function:
parse.on("data", async row => {
await sync(row)
})
Once the await operation completes, the arrow function ends, and all references to row will be eliminated, so the garbage collector can successfully collect row, releasing that memory.
The effect this has is concurrently executing sync every time a row is parsed, so if you can only sync one record at a time, then I would recommend wrapping the sync function in a debouncer.

Async write file million times cause out of memory

Below is code:
var fs = require('fs')
for(let i=0;i<6551200;i++){
fs.appendFile('file',i,function(err){
})
}
When I run this code, after a few seconds, it show:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
and yet nothing in file!
my qusetion is :
why is no byte in file?
where cause out of memory?
how to async write file in for loop no mater how large the write times?
thanks advance.
Bottom line here is that fs.appendFile() is an asynchronous call and you simply are not "awaiting" that call to complete on each loop iteration. This has a number of consequences, including but not limited to:
The callbacks keep getting allocated before they are resolved, which results in the "heap out of memory" eventually being reached.
You are contesting with a file handle, since you the function you are employing is actually opening/writing/closing the file given, and if you don't wait for each turn to do so, then you're simply going to clash.
So the simple solution here is to "wait", and some modern syntax sugar makes that easy:
const fs = require('mz/fs');
const x = 6551200;
(async function() {
try {
const fd = await fs.open('file','w');
for (let i = 0; i < x; i++) {
await fs.write(fd, `${i}\n`);
}
await fs.close(fd);
} catch(e) {
console.error(e)
} finally {
process.exit();
}
})()
That will of course take a while, but it's not going to "blow up" your system whilst it does it's work.
The very first simplified thing is to just get hold of the mz library, which already wraps common nodejs libraries with modernized versions of each function supporting promises. This will help clean up the syntax a lot as opposed to using callbacks.
The next thing to realize is what was mentioned about that fs.appendFile() in how it is "opening/writing/closing" all in one call. That's not great, so what you would typically do is simply open and then write the bytes in a loop, and when that is complete you can actually close the file handle.
That "sugar" comes in modern versions, and though "possible" with plain promise chaining, it's still not really that manageable. So if you don't actually have a nodejs environment that supports that async/await sugar or the tools to "transpile" such code, then you might alternately consider using the asyncjs libary with plain callbacks:
const Async = require('async');
const fs = require('fs');
const x = 6551200;
let i = 0;
fs.open('file','w',(err,fd) => {
if (err) throw err;
Async.whilst(
() => i < x,
callback => fs.write(fd,`${i}\n`,err => {
i++;
callback(err)
}),
err => {
if (err) throw err;
fs.closeSync(fd);
process.exit();
}
);
});
The same base principle applies as we are "waiting" for each callback to complete before continuing. the whilst() helper here allows iteration until the test condition is met, and of course does not do the next iteration until data is passed to the callback of the iterator itself.
There are other ways to approach this, but those are probably the two most sane for a "large loop" of iterations. Common approaches such as "chaining" via .reduce() are really more suited to a "reasonable" sized array of data you already have, and building arrays of such sizes here has inherent problems of it's own.
For instance, the following "works" ( on my machine at least ) but it really consumes a lot of resources to do it:
const fs = require('mz/fs');
const x = 6551200;
fs.open('file','w')
.then( fd =>
[ ...Array(x)].reduce(
(p,e,i) => p.then( () => fs.write(fd,`${i}\n`) )
, Promise.resolve()
)
.then(() => fs.close(fd))
)
.catch(e => console.error(e) )
.then(() => process.exit());
So that's really not that practical to essentially build such a large chain in memory and then allow it to resolve. You could put some "governance" on this, but the main two approaches as shown are a lot more straightforward.
For that case then you either have the async/await sugar available as it is within current LTS versions of Node ( LTS 8.x ), or I would stick with the other tried and true "async helpers" for callbacks where you were restricted to a version without that support
You can of course "promisify" any function with the last few releases of nodejs right "out of the box" as it where, as Promise has been a global thing for some time:
const fs = require('fs');
await new Promise((resolve, reject) => fs.open('file','w',(err,fd) => {
if (err) reject(err);
resolve(fd);
});
So there really is no need to import libraries just to do that, but the mz library given as example here does all of that for you. So it's really up to personal preferences on bringing in additional dependencies.
Javascript is a single threaded language, which means your code can execute one function at the time. So when you execute an async function, it will be "queued" in the stack to be executed next.
so in your code, you are sending 6551200 calls to the stack, which would of course crash your app before starting working "appendFile" on any of them.
you can achieve what you want by splitting your loop into smaller loops, use async and await functions, or iterators.
if what you are trying to achieve is as simple as your code, you can use the following:
const fs = require("fs");
function SomeTask(i=0){
fs.appendFile('file',i,function(err){
//err in the write function
if(err) console.log("Error", err);
//check if you want to continue (loop)
if(i<6551200) return SomeTask(i);
//on finish
console.log("done");
});
}
SomeTask();
In the above code, you write a single line, and when that is done, you call the next one.
This function is just for basic usage, it needs a refactor and use of Javascript Iterators for advanced usage check out Iterators and generators on MDN web docs
1 - The file is empty because none of the fs.append calls have ever finished, the Node.JS process broken before.
2 - The Node.JS heap memory is limited and stores the callback until it returns, not only the "i" variable.
3 - You could try to use promises to do that.
"use strict";
const Bluebird = require('bluebird');
const fs = Bluebird.promisifyAll(require('fs'));
let promisses = [];
for (let i = 0; i < 6551200; i++){
promisses.push(fs.appendFileAsync('file', i + '\n'));
}
Bluebird.all(promisses)
.then(data => {
console.log(data, 'End.');
})
.catch(e => console.error(e));
But no logic can avoid heap memory error for a loop this big. You could increase Node.JS Heep Memory or, the reasonable way, take chunks of data for interval:
'use strict';
const fs = require('fs');
let total = 6551200;
let interval = setInterval(() => {
fs.appendFile('file', total + '\n', () => {});
total--;
if (total < 1) {
clearInterval(interval);
}
}, 1);

what is the right way to fork a loop in node.js

So i have created server which collect data and write it into db in never ending loop.
server.listen(3001, () => {
doFullScan();
});
async function doFullScan() {
while (true) {
await collectAllData();
}
}
collectAllData() is a method which check for available projects, loop through each project collect some data and write it into db.
async function collectAllData() {
//doing soemhting
const projectNames = ['array with projects name'];
//this loop takes too much of time
for(let project in projectNames){
await collectProjectData(project);
}
//doing something
}
The problem is that whole loop is taking too much time. So i would like to speed it up by multithreading loop and use all of my computer cores on it.
How should i do it?
There is cluster library with examples on https://nodejs.org/docs/latest/api/cluster.html but i don't want to create new servers. I want to spawn childrens, which will do a task and exit after they have done its job.
So there is const { fork } = require('child_process'); but I'm not exactly sure how to make each fork run only collectProjectData() method.
You can do it natively without any third party libraries.
Right now, your for...loop is running each one after the other.
Option 1
Use Promise.all and .map
await Promise.all(projectNames.map(async(projectName) => {
await collectProjectData(projectName);
});
Note, if you use .map, it will kick off all of them, all at the same time, which might be too much if projectNames continue to grow.
This is the complete opposite of what yours is doing currently.
Option 2
There is a middle way...running batches in sequence, but items inside each batch asynchronously.
const chunk = (a, l) => a.length === 0 ? [] : [a.slice(0, l)].concat(chunk(a.slice(l), l));
const batchSize = 10;
const projectNames = ['array with projects name'];
let projectNamesInChunks = chunk(projectNames, batchSize);
for(let chunk of projectNamesInChunks){
await Promise.all(chunk.map(async(projectName) => {
await collectProjectData(projectName);
});
}
I recommend using Promise.map
http://bluebirdjs.com/docs/api/promise.map.html
that way you can control the level of concurrency as you wish like this:
await Promise.map(projectNames, collectProjectData, {concurrency: 3})

Firestore cloud function asynchronous execution with promise

I have orders collection and products collection in my application. The user can have multiple products in their single order. What I want to do is calculating the amount of each product reading through products collection and then perform the further action. Below is what I got as of now.
exports.myfunc = functions.firestore.document('collection/{collid}')
.onCreate(event => {
let data = event.data.data();
const products = data.products;
const prices = [];
_.each(products, (data1, index) => {
const weight = data1.weight;
const isLess = data1.isLess;
firebaseAdmin.firestore().collection('collection').doc(data1.productId).onSnapshot(data2 => {
let amount = weight === '1/2' ? data2.data().price1 : data2.data().price1 * weight;
amount += isLess ? 50 : 0;
prices.push(amount);
});
});
//Do some task after _.each with new total
});
But am not able to achieve synchronous task here, so that I can store actual amount for the product against its order and calculate total to store in document.
Could anyone please tell me how I achieve the above-said scenarios? How I can work along with promise and then callback?
You can map the products array to promises, like this:
var productPromises = products.map(product => {
return new Promise((resolve, reject) => {
firebaseOperation()...onSnapshot(resolve)
})
})
Promise.all(productPromises).then(results => {
// process all results at once
})
First, don't use onSnapshot() with Cloud Functions. That attaches a listener that stay listening indefinitely, until you remove it. That's not what you want at all, because functions can't execute indefinitely.
Instead, use get(), which returns a promise when the fetch is complete.
Also, you could consider accumulating all the documents you want to access into an array and use getAll() (with the spread operator on the array) to fetch them all.

Should loops be avoided in Node.JS or is there a special way to handle them?

Loops are blocking. They seem indifferent to the idea of Node.JS. How to handle the flow where a for loop or a while loop seems to be the best option.
For example, if I want to print a table of a random number upto number * 1000, I would want to use the for loop. Is there a special way to handle this in Node.JS?
Loops are not per se bad, but it depends on the situation. In most cases you will need to do some async stuff inside loops though.
So my personal preference is to not use loops at all but instead go with the functional counterparts (forEach/map/reduce/filter). This way my code base stays consistent (and a sync loop is easily changed to an async one if needed).
const myArr = [1, 2, 3];
// sync loops
myArr.forEach(syncLogFunction);
console.log('after sync loop');
function syncLogFunction(entry) {
console.log('sync loop', entry);
}
// now we want to change that into an async operation:
Promise.all(myArr.map(asyncLogFunction))
.then(() => console.log('after async loop'));
function asyncLogFunction(entry) {
console.log('async loop', entry);
return new Promise(resolve => setTimeout(resolve, 100));
}
Notice how easily you can change between sync and async versions, the structure stays almost the same.
Hope this helps a bit.
If you are doing loops on data in memory (for example, you want to go through an array and add a prop to all objects), loops will work normally, but if you need to do something inside the loop like save values to a DB, you will run into some issues.
I realize this isn't exactly the answer, but it's a suggestion that may help someone. I found one of the easiest ways to deal with this issue is using rate limiter with a forEach (I don't like really promises). This also gives the added benefit of having the option to process things in parallel, but only move on when everything is done:
https://github.com/jhurliman/node-rate-limiter
var RateLimiter = require('limiter').RateLimiter;
var limiter = new RateLimiter(1, 5);
exports.saveFile = function (myArray, next) {
var completed = 0;
var totalFiles = myArray.length;
myArray.forEach(function (item) {
limiter.removeTokens(1, function () {
//call some async function
saveAndLog(item, function (err, result) {
//check for errors
completed++;
if (completed == totalFiles) {
//call next function
exports.process();
}
});
});
});
};

Resources