We are using NodeJS experimental workers to accomplish some CPU intensive tasks. These tasks are kicked off through messages the parentPort message passing. During the operation of the threads, they need to persist data to a database which is an asynchronous operation backed by promises.
What we are seeing is that the parentPort messages keep being sent to handler function while we doing the asynchronous operations.
An example of the code we are doing:
const { parentPort, Worker, isMainThread } = require('worker_threads');
if (isMainThread) {
const worker = new Worker(__filename);
const i = [1, 2, 3, 4, 5, 6, 7, 8, 9];
for (const x of i) {
worker.postMessage({ idx: x });
}
} else {
parentPort.on('message', async (value) => {
await testAsync(value);
});
}
async function testAsync(value) {
return new Promise((resolve) => {
console.log(`Starting wait for ${value.idx}`);
setTimeout(() => {
console.log(`Complete resolve for ${value.idx}`);
resolve();
if(value.idx == 9) {
setTimeout(() => process.exit(0), 2000);
}
}, 500);
});
}
In the above example we are seeing the Starting wait for ... print before any the the Complete resolve ... messages appear. With the async-await we were expecting the the event handler to wait for the resolved promise before processing the new event. In the real example, the db connection may fail, which throws an exception so we want to ensure that the current message has been fully processed before accepting a new one.
Are we doing anything wrong here?
If not, is there anyway of accomplishing the desired goal of processing the event in order?
It seems that you want to enqueue the messages, and only process one thing at a time.
parentPort.on('message', () => {} is an event listener, when the event is triggered, it won't wait until the previous asyncrhonous operation inside the callback is done.
So, if you trigger 'message' a thousand times, testAsync will be executed a thousand times, without waiting.
You need to implement a queue in the worker, and limit the concurrency. There are multiple promise queue packages in NPM.
I will use p-queue in this example.
const PQueue = require('p-queue');
const { parentPort, Worker, isMainThread } = require('worker_threads');
if (isMainThread) {
const worker = new Worker(__filename);
const i = [1, 2, 3, 4, 5, 6, 7, 8, 9];
for (const x of i) {
worker.postMessage({ idx: x });
}
} else {
const queue = new PQueue({ concurrency: 1 }); // set concurrency
parentPort.on('message', value => {
queue.add(() => testAsync(value));
});
}
async function testAsync(value) {
return new Promise((resolve) => {
console.log(`Starting wait for ${value.idx}`);
setTimeout(() => {
console.log(`Complete resolve for ${value.idx}`);
resolve();
if(value.idx == 9) {
setTimeout(() => process.exit(0), 2000);
}
}, 500);
});
}
Now the output will be:
starting wait for 1
complete resolve for 1
starting wait for 2
complete resolve for 2
starting wait for N
complete resolve for N
Related
I'm writing an HTTP API with expressjs in Node.js and here is what I'm trying to achieve:
I have a regular task that I would like to run regularly, approx every minute. This task is implemented with an async function named task.
In reaction to a call in my API I would like to have that task called immediately as well
Two executions of the task function must not be concurrent. Each execution should run to completion before another execution is started.
The code looks like this:
// only a single execution of this function is allowed at a time
// which is not the case with the current code
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
// call task regularly
setIntervalAsync(async () => {
await task("ticker");
}, 5000) // normally 1min
// call task immediately
app.get("/task", async (req, res) => {
await task("trigger");
res.send("ok");
});
I've put a full working sample project at https://github.com/piec/question.js
If I were in go I would do it like this and it would be easy, but I don't know how to do that with Node.js.
Ideas I have considered or tried:
I could apparently put task in a critical section using a mutex from the async-mutex library. But I'm not too fond of adding mutexes in js code.
Many people seem to be using message queue libraries with worker processes (bee-queue, bullmq, ...) but this adds a dependency to an external service like redis usually. Also if I'm correct the code would be a bit more complex because I need a main entrypoint and an entrypoint for worker processes. Also you can't share objects with the workers as easily as in a "normal" single process situation.
I have tried RxJs subject in order to make a producer consumer channel. But I was not able to limit the execution of task to one at a time (task is async).
Thank you!
You can make your own serialized asynchronous queue and run the tasks through that.
This queue uses a flag to keep track of whether it's in the middle of running an asynchronous operation already. If so, it just adds the task to the queue and will run it when the current operation is done. If not, it runs it now. Adding it to the queue returns a promise so the caller can know when the task finally got to run.
If the tasks are asynchronous, they are required to return a promise that is linked to the asynchronous activity. You can mix in non-asynchronous tasks too and they will also be serialized.
class SerializedAsyncQueue {
constructor() {
this.tasks = [];
this.inProcess = false;
}
// adds a promise-returning function and its args to the queue
// returns a promise that resolves when the function finally gets to run
add(fn, ...args) {
let d = new Deferred();
this.tasks.push({ fn, args: ...args, deferred: d });
this.check();
return d.promise;
}
check() {
if (!this.inProcess && this.tasks.length) {
// run next task
this.inProcess = true;
const nextTask = this.tasks.shift();
Promise.resolve(nextTask.fn(...nextTask.args)).then(val => {
this.inProcess = false;
nextTask.deferred.resolve(val);
this.check();
}).catch(err => {
console.log(err);
this.inProcess = false;
nextTask.deferred.reject(err);
this.check();
});
}
}
}
const Deferred = function() {
if (!(this instanceof Deferred)) {
return new Deferred();
}
const p = this.promise = new Promise((resolve, reject) => {
this.resolve = resolve;
this.reject = reject;
});
this.then = p.then.bind(p);
this.catch = p.catch.bind(p);
if (p.finally) {
this.finally = p.finally.bind(p);
}
}
let queue = new SerializedAsyncQueue();
// utility function
const sleep = function(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
// only a single execution of this function is allowed at a time
// so it is run only via the queue that makes sure it is serialized
async function task(reason: string) {
function runIt() {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
return queue.add(runIt);
}
// call task regularly
setIntervalAsync(async () => {
await task("ticker");
}, 5000) // normally 1min
// call task immediately
app.get("/task", async (req, res) => {
await task("trigger");
res.send("ok");
});
Here's a version using RxJS#Subject that is almost working. How to finish it depends on your use-case.
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
const run = new Subject<string>();
const effect$ = run.pipe(
// Limit one task at a time
concatMap(task),
share()
);
const effectSub = effect$.subscribe();
interval(5000).subscribe(_ =>
run.next("ticker")
);
// call task immediately
app.get("/task", async (req, res) => {
effect$.pipe(
take(1)
).subscribe(_ =>
res.send("ok")
);
run.next("trigger");
});
The issue here is that res.send("ok") is linked to the effect$ streams next emission. This may not be the one generated by the run.next you're about to call.
There are many ways to fix this. For example, you can tag each emission with an ID and then wait for the corresponding emission before using res.send("ok").
There are better ways too if calls distinguish themselves naturally.
A Clunky ID Version
Generating an ID randomly is a bad idea, but it gets the general thrust across. You can generate unique IDs however you like. They can be integrated directly into the task somehow or can be kept 100% separate the way they are here (task itself has no knowledge that it's been assigned an ID before being run).
interface IdTask {
taskId: number,
reason: string
}
interface IdResponse {
taskId: number,
response: any
}
async function task(reason: string) {
console.log("do thing because %s...", reason);
await sleep(1000);
console.log("done");
}
const run = new Subject<IdTask>();
const effect$: Observable<IdResponse> = run.pipe(
// concatMap only allows one observable at a time to run
concatMap((eTask: IdTask) => from(task(eTask.reason)).pipe(
map((response:any) => ({
taskId: eTask.taskId,
response
})as IdResponse)
)),
share()
);
const effectSub = effect$.subscribe({
next: v => console.log("This is a shared task emission: ", v)
});
interval(5000).subscribe(num =>
run.next({
taskId: num,
reason: "ticker"
})
);
// call task immediately
app.get("/task", async (req, res) => {
const randomId = Math.random();
effect$.pipe(
filter(({taskId}) => taskId == randomId),
take(1)
).subscribe(_ =>
res.send("ok")
);
run.next({
taskId: randomId,
reason: "trigger"
});
});
I am using events to listen to a data change stream https://docs.mongodb.com/manual/changeStreams
Each event that comes in is handled via an aync handleChange function. The first part of the function i do the normal processing of the data and the second part of the function i want to store a resumeToken.
This resumeToken is important because it allows me to restart my listener incase there is a crash.
However i have a problem - when i am processing each event with an async handler, the resume token in the end that gets stored is the last event that was processed and not emitted. This means the resumeToken may lag behind and when i restart the process, it may resume at point which it has already previously processed.
The code below is the smallest isolated example of what is happening.
const EventEmitter = require('events');
const myEmitter = new EventEmitter();
let resumeToken; // Stored in external db;
// simulate an update to resumeToken where time taken could be different
const updateResumeToken = async (token) => {
const randomTime = Math.floor(Math.random() * 10) * 1000;
await new Promise(resolve => setTimeout(() => {
resumeToken = token;
resolve();
}, randomTime));
}
const handleChange = async (data) => {
console.log("current", data, resumeToken)
await updateResumeToken(data.value);
console.log("new", data, resumeToken);
}
myEmitter.on('change', handleChange);
myEmitter.emit('change', {"token": "first", value: 1} );
myEmitter.emit('change', {"token": "second", value: 2});
myEmitter.emit('change', {"token": "third", value: 3});
https://replit.com/join/btniqwwkzl-kaykhan
Late answer but you can transform the event emitter into an async iterable and process events sequentially. event.on is available from Node.js v12.16
import { EventEmitter, on } from "node:events";
const myEmitter = new EventEmitter();
let resumeToken;
const updateResumeToken = async (token) => {
const randomTime = Math.floor(Math.random() * 10) * 1000;
await new Promise(resolve => setTimeout(() => {
resumeToken = token;
resolve();
}, randomTime));
}
setTimeout(() => {
myEmitter.emit("change", { token: "first", value: 1 });
myEmitter.emit("change", { token: "second", value: 2 });
myEmitter.emit("change", { token: "third", value: 3 });
}, 1000);
// Process events sequentially. `updateResumeToken` is awaited before processing next event
for await (const [data] of on(myEmitter, "change")) {
console.log("current", data, resumeToken);
await updateResumeToken(data.value);
console.log("new", data, resumeToken);
}
Hope this helps others who stumble upon this.
I used an EventEmitter as the following:
const EventEmitter = require('events');
const myEmitter = new EventEmitter();
function c2(num) {
return new Promise((resolve) => {
resolve(`c2: ${num}`);
});
}
// eslint-disable-next-line no-underscore-dangle
// eslint-disable-next-line no-console
const doSomeStuff = async (number) => {
try {
console.log(`doSomeStuff: ${number}`);
const r2 = await c2(number);
console.log(r2);
} catch (err) {
throw err;
}
};
myEmitter.on('eventOne', async (n) => {
await doSomeStuff(n);
});
myEmitter.emit('eventOne', 1);
myEmitter.emit('eventOne', 2);
myEmitter.emit('eventOne', 3);
myEmitter.emit('eventOne', 4);
I expect a result of
doSomeStuff: 1
c2: 1
doSomeStuff: 2
c2: 2
doSomeStuff: 3
c2: 3
doSomeStuff: 4
c2: 4
However the output shows me:
doSomeStuff: 1
doSomeStuff: 2
doSomeStuff: 3
doSomeStuff: 4
c2: 1
c2: 2
c2: 3
c2: 4
As per my understanding, EventEmitter invokes the event callback function synchronously, however for some reason the callback function has not finished executing before the next callback function is invoked. I think I'm missing something very fundamental here.
The event handler doesn't care about the async nature of the function. In fact, it doesn't care about the return value at all. It will just call it as soon as it can whenever it hears an event and will keep firing every time it hears an event. It won't care if it is already running a function or not.
myEmitter.on('eventOne', async (n) => {
await doSomeStuff(n);
});
Is effectively the exact same as if you didn't have the async/await:
myEmitter.on('eventOne', (n) => {
doSomeStuff(n);
});
Hypothetically, you could adjust your code a little such that the you do get the output that you are expecting. However, you need to introduce a singular path context such that every emitter is affecting 1 thing instead of every event triggering its own instance of the doSomeStuff. Here is an example using generator functions:
// EventEmitter Polyfill
class EventEmitter {
constructor() {this._listeners = new Map();}
on(e, cb) {this._listeners.set(e, [...(this._listeners.get(e) || []), cb]);}
emit(e, payload) {for (const listener of (this._listeners.get(e) || [])) listener(payload);}
}
const myEmitter = new EventEmitter();
function c2(num) {
return new Promise(resolve => {
resolve(`c2: ${num}`);
});
}
async function* doSomeStuff() {
while (true) {
try {
const number = yield;
console.log(`doSomeStuff: ${number}`);
const r2 = await c2(number);
console.log(r2);
} catch (err) {
throw err;
}
}
}
const someStuff = doSomeStuff();
someStuff.next(); // Start it
myEmitter.on("eventOne", n => {
someStuff.next(n);
});
myEmitter.emit("eventOne", 1);
myEmitter.emit("eventOne", 2);
myEmitter.emit("eventOne", 3);
myEmitter.emit("eventOne", 4);
following the logic above, here is my opinions:
//You emitted event asynchronous, so the listeners will be called asynchronous in [A] scope
myEmitter.emit('eventOne', 1); // all
myEmitter.emit('eventOne', 2); // of
myEmitter.emit('eventOne', 3); // us
myEmitter.emit('eventOne', 4); // start at almost the same time.
// In listener:
await doSomeStuff(n); // I run synchronously inside [event 1] scope, not [A] scope
await doSomeStuff(n); // I run synchronously inside [event 2] scope, not [A] scope
await doSomeStuff(n); // I run synchronously inside [event 3] scope, not [A] scope
await doSomeStuff(n); // I run synchronously inside [event 4] scope, not [A] scope
and the rest will be resolve asynchronously.
brief: events are emitted ASYNCHRONOUSLY so that no reason for listeners listen them SYNCHRONOUSLY.
function first(){
console.log('first')
}
function second(){
console.log('second')
}
let interval = async ()=>{
await setInterval(first,2000)
await setInterval(second,2000)
}
interval();
Imagine that I have this code above.
When I run it, first() and second() will be called at the same time; how do I call second() after first)() returns some data, for example, if first() is done, only then call second()?
Because first() in my code will be working with a big amount of data and if this 2 functions will be calling at the same time, it will be hard for the server.
How do I call second() each time when first() will return some data?
As mentioned above setInterval does not play well with promises if you do not stop it. In case you clear the interval you can use it like:
async function waitUntil(condition) {
return await new Promise(resolve => {
const interval = setInterval(() => {
if (condition) {
resolve('foo');
clearInterval(interval);
};
}, 1000);
});
}
Later you can use it like
const bar = waitUntil(someConditionHere)
You have a few problems:
Promises may only ever resolve once, setInterval() is meant to call the callback multiple times, Promises do not support this case well.
Neither setInterval(), nor the more appropriate setTimeout() return Promises, therefore, awaiting on them is pointless in this context.
You're looking for a function that returns a Promise which resolves after some times (using setTimeout(), probably, not setInterval()).
Luckily, creating such a function is rather trivial:
async function delay(ms) {
// return await for better async stack trace support in case of errors.
return await new Promise(resolve => setTimeout(resolve, ms));
}
With this new delay function, you can implement your desired flow:
function first(){
console.log('first')
}
function second(){
console.log('second')
}
let run = async ()=>{
await delay(2000);
first();
await delay(2000)
second();
}
run();
setInterval doesn't play well with promises because it triggers a callback multiple times, while promise resolves once.
It seems that it's setTimeout that fits the case. It should be promisified in order to be used with async..await:
async () => {
await new Promise(resolve => setTimeout(() => resolve(first()), 2000));
await new Promise(resolve => setTimeout(() => resolve(second()), 2000));
}
await expression causes async to pause until a Promise is settled
so you can directly get the promise's result without await
for me, I want to initiate Http request every 1s
let intervalid
async function testFunction() {
intervalid = setInterval(() => {
// I use axios like: axios.get('/user?ID=12345').then
new Promise(function(resolve, reject){
resolve('something')
}).then(res => {
if (condition) {
// do something
} else {
clearInterval(intervalid)
}
})
}, 1000)
}
// you can use this function like
testFunction()
// or stop the setInterval in any place by
clearInterval(intervalid)
You could use an IFFE. This way you could escape the issue of myInterval not accepting Promise as a return type.
There are cases where you need setInterval, because you want to call some function unknown amount of times with some interval in between.
When I faced this problem this turned out to be the most straight-forward solution for me. I hope it help someone :)
For me the use case was that I wanted to send logs to CloudWatch but try not to face the Throttle exception for sending more than 5 logs per second. So I needed to keep my logs and send them as a batch in an interval of 1 second. The solution I'm posting here is what I ended up using.
async function myAsyncFunc(): Promise<string> {
return new Promise<string>((resolve) => {
resolve("hello world");
});
}
function myInterval(): void {
setInterval(() => {
void (async () => {
await myAsyncFunc();
})();
}, 5_000);
}
// then call like so
myInterval();
Looked through all the answers but still didn't find the correct one that would work exactly how the OP is asked. This is what I used for the same purpose:
async function waitInterval(callback, ms) {
return new Promise(resolve => {
let iteration = 0;
const interval = setInterval(async () => {
if (await callback(iteration, interval)) {
resolve();
clearInterval(interval);
}
iteration++;
}, ms);
});
}
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
(async () => {
console.log('start');
await waitInterval(first, 1000);
await waitInterval(second, 1000);
console.log('finish');
})()
In my example, I also put interval iteration count and the timer itself, just in case the caller would need to do something with it. However, it's not necessary
In my case, I needed to iterate through a list of images, pausing in between each, and then a longer pause at the end before re-looping through.
I accomplished this by combining several techniques from above, calling my function recursively and awaiting a timeout.
If at any point another trigger changes my animationPaused:boolean, my recursive function will exit.
const loopThroughImages = async() => {
for (let i=0; i<numberOfImages; i++){
if (animationPaused) {
return;
}
this.updateImage(i);
await timeout(700);
}
await timeout(1000);
loopThroughImages();
}
loopThroughImages();
Async/await do not make the promises synchronous.
To my knowledge, it's just a different syntax for return Promise and .then().
Here i rewrote the async function and left both versions, so you can see what it really does and compare.
It's in fact a cascade of Promises.
// by the way no need for async there. the callback does not return a promise, so no need for await.
function waitInterval(callback, ms) {
return new Promise(resolve => {
let iteration = 0;
const interval = setInterval(async () => {
if (callback(iteration, interval)) {
resolve();
clearInterval(interval);
}
iteration++;
}, ms);
});
}
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
// async function with async/await, this code ...
(async () => {
console.log('start');
await waitInterval(first, 1000);
await waitInterval(second, 1000);
console.log('finish');
})() //... returns a pending Promise and ...
console.log('i do not wait');
// ... is kinda identical to this code.
// still asynchronous but return Promise statements with then cascade.
(() => {
console.log('start again');
return waitInterval(first, 1000).then(() => {
return waitInterval(second, 1000).then(() => {
console.log('finish again');
});
});
})(); // returns a pending Promise...
console.log('i do not wait either');
You can see the two async functions both execute at the same time.
So using promises around intervals here is not very useful, as it's still just intervals, and promises changes nothing, and make things confusing...
As the code is calling callbacks repeatedly into an interval, this is, i think, a cleaner way:
function first(i) {
console.log(`first: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function second(i) {
console.log(`second: ${i}`);
// If the condition below is true the timer finishes
return i === 5;
}
function executeThroughTime(...callbacks){
console.log('start');
let callbackIndex = 0; // to track current callback.
let timerIndex = 0; // index given to callbacks
let interval = setInterval(() =>{
if (callbacks[callbackIndex](timerIndex++)){ // callback return true when it finishes.
timerIndex = 0; // resets for next callback
if (++callbackIndex>=callbacks.length){ // if no next callback finish.
clearInterval(interval);
console.log('finish');
}
}
},1000)
}
executeThroughTime(first,second);
console.log('and i still do not wait ;)');
Also, this solution execute a callback every secondes.
if the callbacks are async requests that takes more than one sec to resolve, and i can't afford for them to overlap, then, instead of doing iterative call with repetitive interval, i would get the request resolution to call the next request (through a timer if i don't want to harass the server).
Here the "recursive" task is called lTask, does pretty much the same as before, except that, as i do not have an interval anymore, i need a new timer each iteration.
// slow internet request simulation. with a Promise, could be a callback.
function simulateAsync1(i) {
console.log(`first pending: ${i}`);
return new Promise((resolve) =>{
setTimeout(() => resolve('got that first big data'), Math.floor(Math.random()*1000)+ 1000);//simulate request that last between 1 and 2 sec.
}).then((result) =>{
console.log(`first solved: ${i} ->`, result);
return i==2;
});
}
// slow internet request simulation. with a Promise, could be a callback.
function simulateAsync2(i) {
console.log(`second pending: ${i}`);
return new Promise((resolve) =>{
setTimeout(() => resolve('got that second big data'), Math.floor(Math.random()*1000) + 1000);//simulate request that last between 1 and 2 sec.
}).then((result) =>{ // promise is resolved
console.log(`second solved: ${i} ->`,result);
return i==4; // return a promise
});
}
function executeThroughTime(...asyncCallbacks){
console.log('start');
let callbackIndex = 0;
let timerIndex = 0;
let lPreviousTime = Date.now();
let lTask = () => { // timeout callback.
asyncCallbacks[callbackIndex](timerIndex++).then((result) => { // the setTimeout for the next task is set when the promise is solved.
console.log('result',result)
if (result) { // current callback is done.
timerIndex = 0;
if (++callbackIndex>=asyncCallbacks.length){//are all callbacks done ?
console.log('finish');
return;// its over
}
}
console.log('time elapsed since previous call',Date.now() - lPreviousTime);
lPreviousTime = Date.now();
//console.log('"wait" 1 sec (but not realy)');
setTimeout(lTask,1000);//redo task after 1 sec.
//console.log('i do not wait');
});
}
lTask();// no need to set a timer for first call.
}
executeThroughTime(simulateAsync1,simulateAsync2);
console.log('i do not wait');
Next step would be to empty a fifo with the interval, and fill it with web request promises...
I'm writing a Node AWS Lambda function that queries around 5,000 items from my DB and sends them via messages into an AWS SQS queue.
My local environment involves me running my lambda with AWS SAM local, and emulating AWS SQS with GoAWS.
An example skeleton of my Lambda is:
async run() {
try {
const accounts = await this.getAccountsFromDB();
const results = await this.writeAccountsIntoQueue(accounts);
return 'I\'ve written: ' + results + ' messages into SQS';
} catch (e) {
console.log('Caught error running job: ');
console.log(e);
return e;
}
}
There are no performance issues with my getAccountsFromDB() function and it runs almost instantly, returning me an array of 5,000 accounts.
My writeAccountsIntoQueue function looks like:
async writeAccountsIntoQueue(accounts) {
// Extract the sqsClient and queueUrl from the class
const { sqsClient, queueUrl } = this;
try {
// Create array of functions to concurrenctly call later
let promises = accounts.map(acc => async () => await sqsClient.sendMessage({
QueueUrl: queueUrl,
MessageBody: JSON.stringify(acc),
DelaySeconds: 10,
})
);
// Invoke the functions concurrently, using helper function `eachLimit`
let writtenMessages = await eachLimit(promises, 3);
return writtenMessages;
} catch (e) {
console.log('Error writing accounts into queue');
console.log(e);
return e;
}
}
My helper, eachLimit looks like:
async function eachLimit (funcs, limit) {
let rest = funcs.slice(limit);
await Promise.all(
funcs.slice(0, limit).map(
async (func) => {
await func();
while (rest.length) {
await rest.shift()();
}
}
)
);
}
To the best of my understanding, it should be limiting concurrent executions to limit.
Additionally, I've wrapped the AWS SDK SQS client to return an object with a sendMessage function that looks like:
sendMessage(params) {
const { client } = this;
return new Promise((resolve, reject) => {
client.sendMessage(params, (err, data) => {
if (err) {
console.log('Error sending message');
console.log(err);
return reject(err);
}
return resolve(data);
});
});
}
So nothing fancy there, just Promisifying a callback.
I've got my lambda set up to timeout after 300 seconds, and the lambda always times out, and if it doesn't it ends abruptly and misses some final logging that should go on, which makes me thing it may even be erroring somewhere, silently. When I check the SQS queue I'm missing around 1,000 entries.
I can see a couple of issues in your code,
First:
let promises = accounts.map(acc => async () => await sqsClient.sendMessage({
QueueUrl: queueUrl,
MessageBody: JSON.stringify(acc),
DelaySeconds: 10,
})
);
You're abusing async / await. Always bear in mind await will wait until your promise is resolved before continuing with the next one, in this case whenever you map the array promises and call each function item it will wait for the promise wrapped by that function before continuing, which is bad. Since you're only interested in getting the promises back, you could simply do this instead:
const promises = accounts.map(acc => () => sqsClient.sendMessage({
QueueUrl: queueUrl,
MessageBody: JSON.stringify(acc),
DelaySeconds: 10,
})
);
Now, for the second part, your eachLimit implementation looks wrong and very verbose, I've refactored it with help of es6-promise-pool to handle the concurrency limit for you:
const PromisePool = require('es6-promise-pool')
function eachLimit(promiseFuncs, limit) {
const promiseProducer = function () {
while(promiseFuncs.length) {
const promiseFunc = promiseFuncs.shift();
return promiseFunc();
}
return null;
}
const pool = new PromisePool(promiseProducer, limit)
const poolPromise = pool.start();
return poolPromise;
}
Lastly, but very important, have a look at SQS Limits, SQS FIFO has up to 300 sends / sec. Since you are processing 5k items, you could probably up your concurrency limit to 5k / (300 + 50) , approx 15. The 50 could be any positive number, just to move away from the limit a bit.
Also, considering using SendMessageBatch which you could have much more throughput and reach 3k sends / sec.
EDIT
As I suggested above, using sendMessageBatch the throughput is much better, so I've refactored the code mapping your promises to support sendMessageBatch:
function chunkArray(myArray, chunk_size){
var index = 0;
var arrayLength = myArray.length;
var tempArray = [];
for (index = 0; index < arrayLength; index += chunk_size) {
myChunk = myArray.slice(index, index+chunk_size);
tempArray.push(myChunk);
}
return tempArray;
}
const groupedAccounts = chunkArray(accounts, 10);
const promiseFuncs = groupedAccounts.map(accountsGroup => {
const messages = accountsGroup.map((acc,i) => {
return {
Id: `pos_${i}`,
MessageBody: JSON.stringify(acc),
DelaySeconds: 10
}
});
return () => sqsClient.sendMessageBatch({
Entries: messages,
QueueUrl: queueUrl
})
});
Then you can call eachLimit as usual:
const result = await eachLimit(promiseFuncs, 3);
The difference now is every promise processed will send a batch of messages of size n (10 in the example above).