Limit number of tasks processed simultaneously in loop - node.js

I define a function named fn which I then run in the background as part of a bigger service.
The function fn retrieves messages from a message queue one by one, and processes each message. Processing of each message can take between 1 and 10 minutes (longProcess()).
Using the following code, the for await loop waits for new messages and processes each one, and then fetches a new message once the processing is complete.
const fn = async (subscription: AsyncIterable) => {
subscription.pullOne();
for await (const msg of subscription) {
await longProcess(msg);
subscription.pullOne();
}
subscription.close();
};
fn(subscription).then(() => {});
If I remove the await from before longProcess(msg), messages are sent to be processed as soon as they are received, which is what I want, but I only want a maximum of 5 messages to be processed simultaneously.
I don't want any more messages to be pulled before the current messages are done processing (so that other subscribers may pull and process them).
This question deals with a very similar case, but I can't seem to find a solution that actually works and provides an elegant solution.
I tried using the bottleneck library by defining a concurrency limit, but I can't figure out how to stop the loop from fetching more messages before the active processing is finished.
const limiter = new Bottleneck({
maxConcurrent: 5,
});
const fn = async (subscription: AsyncIterable) => {
subscription.pullOne();
for await (const msg of subscription) {
limiter.schedule(() => longProcess(msg));
subscription.pullOne();
}
subscription.close();
};

You can try processing them in batches/chunks of <= 5 items at a time
// helper function
async function take(aIterable, count) {
const res = [];
let done = false
for(let i = 0; i < count; i++) {
const next = await aIterable.next()
if(!next.done) res.push(next.value)
done = next.done;
}
return [res, done];
}
const fn = async (subscription) => {
subscription.pullOne();
let done = false;
while (!done) {
let [batch, _done] = await take(subscription, 5);
done = _done;
await Promise.allSettled(batch.map(msg => longProcess(msg)));
// subscription.pullOne();
}
subscription.close();
};

Related

How to loop many http requests with axios in node.js

I have an array of users where each user has an IP address.
I have an API that I send an IP as a request and it returns a county code that belongs to this IP.
In order to get a country code to each user I need to send separate request to each user.
In my code I do async await but it takes about 10 seconds until I get all the responses, if I don't do the async await, I don’t get the country codes at all.
My code:
async function getAllusers() {
let allUsersData = await usersDao.getAllusers();
for (let i = 0; i < allUsersData.length; i++) {
let data = { ip: allUsersData[i].ip };
let body = new URLSearchParams(data);
await axios
.post("http://myAPI", body)
.then((res) => {
allUsersData[i].countryCode = res.data.countryCode;
});
}
return allUsersData;
}
You can use Promise.all to make all your requests once instead of making them one by one.
let requests = [];
for (let i = 0; i < allUsersData.length; i++) {
let data = { ip: allUsersData[i].ip };
let body = new URLSearchParams(data);
requests.push(axios.post("http://myAPI", body)); // axios.post returns a Promise
}
try {
const results = await Promise.all(requests);
// results now contains each request result in the same order
// Your logic here...
}
catch (e) {
// Handles errors
}
If you're just trying to get all the results faster, you can request them in parallel and know when they are all done with Promise.all():
async function getAllusers() {
let allUsersData = await usersDao.getAllusers();
await Promise.all(allUsersData.map((userData, index) => {
let body = new URLSearchParams({ip: userData.ip});
return axios.post("http://myAPI", body).then((res) => {
allUsersData[index].countryCode = res.data.countryCode;
});
}));
return allUsersData;
}
Note, I would not recommend doing it this way if the allUsersData array is large (like more than 20 long) because you'll be raining a lot of requests on the target server and it may either impeded its performance or you may get rate limited or even refused service. In that case, you'd need to send N requests at a time (like perhaps 5) using code like this pMap() here or mapConcurrent() here.

Using batch to recursively update documents only works on small collection

I have a collection of teams containing around 80 000 documents. Every Monday I would like to reset the scores of every team using firebase cloud functions. This is my function:
exports.resetOrgScore = functions.runWith(runtimeOpts).pubsub.schedule("every monday 00:00").timeZone("Europe/Oslo").onRun(async (context) => {
let batch = admin.firestore().batch();
let count = 0;
let overallCount = 0;
const orgDocs = await admin.firestore().collection("teams").get();
orgDocs.forEach(async(doc) => {
batch.update(doc.ref, {score:0.0});
if (++count >= 500 || ++overallCount >= orgDocs.docs.length) {
await batch.commit();
batch = admin.firestore().batch();
count = 0;
}
});
});
I tried running the function in a smaller collection of 10 documents and it's working fine, but when running the function in the "teams" collection it returns "Cannot modify a WriteBatch that has been committed". I tried returning the promise like this(code below) but that doesn't fix the problem. Thanks in advance :)
return await batch.commit().then(function () {
batch = admin.firestore().batch();
count = 0;
return null;
});
There are three problems in your code:
You use async/await with forEach() which is not recommended: The problem is that the callback passed to forEach() is not being awaited, see more explanations here or here.
As detailed in the error you "Cannot modify a WriteBatch that has been committed". With await batch.commit(); batch = admin.firestore().batch(); it's exactly what you are doing.
As important, you don't return the promise returned by the asynchronous methods. See here for more details.
You'll find in the doc (see Node.js tab) a code which allows to delete, by recursively using a batch, all the docs of a collection. It's easy to adapt it to update the docs, as follows. Note that we use a dateUpdated flag to select the docs for each new batch: with the original code, the docs were deleted so no need for a flag...
const runtimeOpts = {
timeoutSeconds: 540,
memory: '1GB',
};
exports.resetOrgScore = functions
.runWith(runtimeOpts)
.pubsub
.schedule("every monday 00:00")
.timeZone("Europe/Oslo")
.onRun((context) => {
return new Promise((resolve, reject) => {
deleteQueryBatch(resolve).catch(reject);
});
});
async function deleteQueryBatch(resolve) {
const db = admin.firestore();
const snapshot = await db
.collection('teams')
.where('dateUpdated', '==', "20210302")
.orderBy('__name__')
.limit(499)
.get();
const batchSize = snapshot.size;
if (batchSize === 0) {
// When there are no documents left, we are done
resolve();
return;
}
// Delete documents in a batch
const batch = db.batch();
snapshot.docs.forEach((doc) => {
batch.update(doc.ref, { score:0.0, dateUpdated: "20210303" });
});
await batch.commit();
// Recurse on the next process tick, to avoid
// exploding the stack.
process.nextTick(() => {
deleteQueryBatch(resolve);
});
}
Note that the above Cloud Function is configured with the maximum value for the time out, i.e. 9 minutes.
If it appears that all your docs cannot be updated within 9 minutes, you will need to find another approach, for example using the Admin SDK from one of your server, or cutting the work into pieces and run the CF several times.

node js non blocking for loop

Please check if my understanding about the following for loop is correct.
for(let i=0; i<1000; i){
sample_function(i, function(result){});
}
The moment the for loop is invoked, 1000 events of sample_function will be qued in the event loop. After about 5 seconds a user gives a http request, which is qued after those "1000 events".
Usually this would not be a problem because the loop is asynchronous.
But lets say that this sample_function is a CPU intensive function. Therefore the "1000 events" are completed consecutively and each take about 1 second.
As a result, the for loop will block for about 1000 seconds.
Would there be a way to solve such problem? For example would it be possible to let the thread take a "break" every 10 loops? and allow other new ques to pop in between? If so how would I do it?
Try it this:
for(let i=0; i<1000; i++)
{
setTimeout(sample_function, 0, i, function(result){});
}
or
function sample_function(elem, index){..}
var arr = Array(1000);
arr.forEach(sample_function);
There is a technique called partitioning which you can read about in the NodeJs's document, But as the document states:
If you need to do something more complex, partitioning is not a good option. This is because partitioning uses only the Event Loop, and you won't benefit from multiple cores almost certainly available on your machine.
So you can also use another technique called offloading, e.g. using worker threads or child processes which also have certain downsides like having to serialize and deserialize any objects that you wish to share between the event loop (current thread) and a worker thread or a child process
Following is an example of partitioning that I came up with which is in the context of an express application.
const express = require('express');
const crypto = require('crypto');
const randomstring = require('randomstring');
const app = express();
const port = 80;
app.get('/', async (req, res) => {
res.send('ok');
})
app.get('/block', async (req, res) => {
let result = [];
for (let i = 0; i < 10; ++i) {
result.push(await block());
}
res.send({result});
})
app.listen(port, () => {
console.log(`Listening on port ${port}`);
console.log(`http://localhost:${port}`);
})
/* takes around 5 seconds to run(varies depending on your processor) */
const block = () => {
//promisifying just to get the result back to the caller in an async way, this is not part of the partitioning technique
return new Promise((resolve, reject) => {
/**
* https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#partitioning
* using partitioning techinique(using setImmediate/setTimeout) to prevent a long running operation
* to block the eventloop completely
* there will be a breathing period between each time block is called
*/
setImmediate(() => {
let hash = crypto.createHash("sha256");
const numberOfHasUpdates = 10e5;
for (let iter = 0; iter < numberOfHasUpdates; iter++) {
hash.update(randomstring.generate());
}
resolve(hash);
})
});
}
There are two endpoints / and /block, if you hit /block and then hit / endpoint, what happens is that the / endpoint will take around 5 seconds to give back response(during the breathing space(the thing that you call it a "break"))
If setImmediate was not used, then the / endpoint would respond to a request after approximately 10 * 5 seconds(10 being the number of times block function is called in the for-loop)
Also you can do partitioning using a recursive approach like this:
/**
*
* #param items array we need to process
* #param chunk a number indicating number of items to be processed on each iteration of event loop before the breathing space
*/
function processItems(items, chunk) {
let i = 0;
const process = (done) => {
let currentChunk = chunk;
while (currentChunk > 0 && i < items?.length) {
--currentChunk;
syncBlock();
++i;
}
if (i < items?.length) {
setImmediate(process);//the key is to schedule the next recursive call (by passing the function to setImmediate) instead of doing a recursive call (by simply invoking the process function)
}
}
process();
}
And if you need to get back the data processed you can promisify it like this:
function processItems(items, chunk) {
let i = 0;
let result = [];
const process = (done) => {
let currentChunk = chunk;
while (currentChunk > 0 && i < items?.length) {
--currentChunk;
const returnedValue = syncBlock();
result.push(returnedValue);
++i;
}
if (i < items?.length) {
setImmediate(() => process(done));
} else {
done && done(result);
}
}
const promisified = () => new Promise((resolve) => process(resolve));
return promisified();
}
And you can test it by adding this route handler to the other route handlers provided above:
app.get('/block2', async (req, res) => {
let result = [];
let arr = [];
for (let i = 0; i < 10; ++i) {
arr.push(i);
}
result = await processItems(arr, 1);
res.send({ result });
})

How can we load all messages from a single discord channel?

I'm currently working on a self-bot that fetches all images from a channel and then downloads them: when I use my self-bot, the bot doesn't fetch messages that aren't loaded by the client and we can't load all of the messages simultaneously. Is there a way to do that? Something like a command to load all messages from a channel and then do multiple .fetchMessages() to get them all?
Self-Bots might be against the ToS, but iterating through messages in a channel is not, as far as I know. So...
Here's a snippet that will fetch all messages using the new js async generators functionality for efficiency
The snippet:
async function * messagesIterator (channel) {
let before = null
let done = false
while (!done) {
const messages = await channel.messages.fetch({ limit: 100, before })
if (messages.size > 0) {
before = messages.lastKey()
yield messages
} else done = true
}
}
async function * loadAllMessages (channel) {
for await (const messages of messagesIterator(channel)) {
for (const message of messages.values()) yield message
}
}
How it's used:
client.on('ready', async () => {
const targetChannel = client.guilds.cache.first().channels.cache.find(x => x.name === 'test')
// Iterate through all the messages as they're pulled
for await (const message of loadAllMessages(targetChannel)) {
console.log(message.content)
}
})
We can't since it's against ToS. :/ (even if it's a bot I think)

Twilio Node JS - filter sms per phone number

I would like to filter sms per phone number and date the SMS was sent using REST API, however the output of the following code is not available outside of client.messages.each() block.
Please advise how I can use the latest sms code sent to the filtered number:
const filterOpts = {
to: '+13075550185',
dateSent: moment().utc().format('YYYY-MM-DD')
};
let pattern = /([0-9]{1,})$/;
let codeCollection = [];
client.messages.each(filterOpts, (record) => {
codeCollection.push(record.body.match(pattern)[0]);
console.log(record.body.match(pattern)[0], record.dateSent);
});
console.log(codeCollection,'I get an empty array here');//how to get
the latest sms and use it
doSomethingWithSMS(codeCollection[0]);
Twilio developer evangelist here.
The each function doesn't actually return a Promise. You can run a callback function after each has completed streaming results by passing it into the options as done like this:
const codeCollection = [];
const pattern = /([0-9]{1,})$/;
const filterOpts = {
to: '+13075550185',
dateSent: moment().utc().format('YYYY-MM-DD'),
done: (err) => {
if (err) { console.error(err); return; }
console.log(codeCollection);
doSomethingWithSMS(codeCollection[0]);
}
};
client.messages.each(filterOpts, (record) => {
codeCollection.push(record.body.match(pattern)[0]);
console.log(record.body.match(pattern)[0], record.dateSent);
});
Let me know if that helps at all.
Do you have access to the length of the array of messages? If so, you can do something like this
const filterOpts = {
to: '+13075550185',
dateSent: moment().utc().format('YYYY-MM-DD')
};
let pattern = /([0-9]{1,})$/;
let codeCollection = [];
var i = 0
client.messages.each(filterOpts, (record) => {
if (i < messages.length){
codeCollection.push(record.body.match(pattern)[0]);
console.log(record.body.match(pattern)[0], record.dateSent);
i++;
else {
nextFunction(codeCollection);
}
});
function nextFunction(codeCollection){
console.log(codeCollection,'I get an empty array here');
doSomethingWithSMS(codeCollection[0]);
}
messages.each() is running asynchronously, so your main thread moves on to the next call while the client.messages() stuff runs on a background thread. So, nothing has been pushed to codeCollection by the time you've tried to access it. You need to somehow wait for the each() to finish before moving on. Twilio client uses backbone style promises, so you can just add another .then() link to the chain, like below. You could also use a library like async which lets you use await to write asynchronous code in a more linear looking fashion.
const filterOpts = {
to: '+13075550185',
dateSent: moment().utc().format('YYYY-MM-DD')
};
let pattern = /([0-9]{1,})$/;
let codeCollection = [];
client.messages.each(filterOpts, (record) => {
codeCollection.push(record.body.match(pattern)[0]);
console.log(record.body.match(pattern)[0], record.dateSent);
}).then(
function() {
console.log(codeCollection,'I get an empty array here');
if( codeCollection.count > 0 ) doSomethingWithSMS(codeCollection[0]);
}
);

Resources