I am building an express.js web application, and for one of the API requests I need to make multiple requests for multiple user accounts in parallel and return one object.
I tried using generators and Promise.all but I have 2 problems:
I don't run in parallel for all user accounts.
My code ends after the response has already returned.
Here is the code I wrote:
function getAccountsDetails(req, res) {
let accounts = [ '1234567890', '7856239487'];
let response = { accounts: [] };
_.forEach(accounts, Promise.coroutine(function *(accountId) {
let [ firstResponse, secondResponse, thirdResponse ] = yield Promise.all([
firstRequest(accountId),
secondRequest(accountId),
thirdRequest(accountId)
]);
let userObject = Object.assign(
{},
firstResponse,
secondResponse,
thirdResponse
);
response.accounts.push(userObject);
}));
res.json(response);
}
_.forEach is not aware of Promise.coroutine and it is not using the return values.
Since you're already using bluebird, you can use its promises aware helpers instead:
function getAccountsDetails(req, res) {
let accounts = [ '1234567890', '7856239487'];
let response = { accounts: [] };
return Promise.map(accounts, (account) => Promise.props({ // wait for object
firstResponse: firstRequest(accountId),
secondResponse: secondRequest(accountId),
thirdResponse: thirdRespones(accountId)
})).tap(r => res.json(r); // it's useful to still return the promise
}
And that should be the code in its entirety.
Coroutines are great, but they're useful for synchronizing asynchronous stuff - in your case you actually do want the concurrency features.
Related
Building a node.js CLI application. Users should choose some tasks to run and based on that, tasks should work and then spinners (using ora package) should show success and stop spin.
The issue here is spinner succeed while tasks are still going on. Which means it doesn't wait.
Tried using typical Async/Await as to have an async function and await each function under condition. Didn't work.
Tried using promise.all. Didn't work.
Tried using waterfall. Same.
Here's the code of the task runner, I create an array of functions and pass it to waterfall (Async-waterfall package) or promise.all() method.
const runner = async () => {
let tasks = [];
spinner.start('Running tasks');
if (syncOptions.includes('taskOne')) {
tasks.push(taskOne);
}
if (syncOptions.includes('taskTwo')) {
tasks.push(taskTwo);
}
if (syncOptions.includes('taskThree')) {
tasks.push(taskThree);
}
if (syncOptions.includes('taskFour')) {
tasks.push(taskFour);
}
// Option One
waterfall(tasks, () => {
spinner.succeed('Done');
});
// Option Two
Promise.all(tasks).then(() => {
spinner.succeed('Done');
});
};
Here's an example of one of the functions:
const os = require('os');
const fs = require('fs');
const homedir = os.homedir();
const outputDir = `${homedir}/output`;
const file = `${homedir}/.file`;
const targetFile = `${outputDir}/.file`;
module.exports = async () => {
await fs.writeFileSync(targetFile, fs.readFileSync(file));
};
I tried searching concepts. Talked to the best 5 people I know who can write JS properly. No clue.. What am I doing wrong ?
You don't show us all your code, but the first warning sign is that it doesn't appear you are actually running taskOne(), taskTwo(), etc...
You are pushing what look like functions into an array with code like:
tasks.push(taskFour);
And, then attempting to do:
Promise.all(tasks).then(...)
That won't do anything useful because the tasks themselves are never executed. To use Promise.all(), you need to pass it an array of promises, not an array of functions.
So, you would use:
tasks.push(taskFour());
and then:
Promise.all(tasks).then(...);
And, all this assumes that taskOne(), taskTwo(), etc... are function that return a promise that resolves/rejects when their asynchronous operation is complete.
In addition, you also need to either await Promise.all(...) or return Promise.all() so that the caller will be able to know when they are all done. Since this is the last line of your function, I'd generally just use return Promise.all(...) and this will let the caller get the resolved results from all the tasks (if those are relevant).
Also, this doesn't make much sense:
module.exports = async () => {
await fs.writeFileSync(targetFile, fs.readFileSync(file));
};
You're using two synchronous file operations. They are not asynchronous and do not use promises so there's no reason to put them in an async function or to use await with them. You're mixing two models incorrectly. If you want them to be synchronous, then you can just do this:
module.exports = () => {
fs.writeFileSync(targetFile, fs.readFileSync(file));
};
If you want them to be asynchronous and return a promise, then you can do this:
module.exports = async () => {
return fs.promises.writeFile(targetFile, await fs.promises.readFile(file));
};
Your implementation was attempting to be half and half. Pick one architecture or the other (synchronous or asynchronous) and be consistent in the implementation.
FYI, the fs module now has multiple versions of fs.copyFile() so you could also use that and let it do the copying for you. If this file was large, copyFile() would likely use less memory in doing so.
As for your use of waterfall(), it is probably not necessary here and waterfall uses a very different calling model than Promise.all() so you certainly can't use the same model with Promise.all() as you do with waterfall(). Also, waterfall() runs your functions in sequence (one after the other) and you pass it an array of functions that have their own calling convention.
So, assuming that taskOne, taskTwo, etc... are functions that return a promise that resolve/reject when their asynchronous operations are done, then you would do this:
const runner = () => {
let tasks = [];
spinner.start('Running tasks');
if (syncOptions.includes('taskOne')) {
tasks.push(taskOne());
}
if (syncOptions.includes('taskTwo')) {
tasks.push(taskTwo());
}
if (syncOptions.includes('taskThree')) {
tasks.push(taskThree());
}
if (syncOptions.includes('taskFour')) {
tasks.push(taskFour());
}
return Promise.all(tasks).then(() => {
spinner.succeed('Done');
});
};
This would run the tasks in parallel.
If you want to run the tasks in sequence (one after the other), then you would do this:
const runner = async () => {
spinner.start('Running tasks');
if (syncOptions.includes('taskOne')) {
await taskOne();
}
if (syncOptions.includes('taskTwo')) {
await taskTwo();
}
if (syncOptions.includes('taskThree')) {
await taskThree();
}
if (syncOptions.includes('taskFour')) {
await taskFour();
}
spinner.succeed('Done');
};
I'm working on a proxy that caches files and I'm trying to add some logic that prevents multiple clients from downloading the same files before the proxy has a chance to cache them.
Basically, the logic I'm trying to implement is the following:
Client 1 requests a file. The proxy checks if the file is cached. If it's not, it requests it from the server, caches it, then sends it to the client.
Client 2 requests the same file after client 1 requested it, but before the proxy has a chance to cache it. So the proxy will tell client 2 to wait a few seconds because there is already a download in progress.
A better approach would probably be to give client 2 a "try again later" message, but let's just say that's currently not an option.
I'm using Nodejs with the anyproxy library. According to the documentation, delayed responses are possible by using promises.
However, I don't really see a way to achieve what I want using Promises. From what I can tell, I could do something like this:
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return new Promise((resolve, reject) => {
setTimeout(() => { // delay
resolve({ response: responseDetail.response });
}, 10000);
});
}
}
};
But that would mean simply waiting for a maximum amount of time and hoping the download finishes by then.
And I don't want that.
I would prefer to be able to do something like this (but with Promises, somehow):
module.exports = {
*beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
var i = 0;
for(i = 0 ; i < 10 ; i++) {
JustSleep(1000);
if(!thereIsADownloadInProgressFor(requestDetail.url))
return { response: responseDetail.response };
}
}
}
};
Is there any way I can achieve this with Promises in Nodejs?
Thanks!
You can use a Map to cache your file downloads.
The mapping in Map would be url -> Promise { file }
// Map { url => Promise { file } }
const cache = new Map()
const thereIsADownloadInProgressFor = url => cache.has(url)
const getCachedFilePromise = url => cache.get(url)
const downloadFile = async url => {/* download file code here */}
const setAndReturnCachedFilePromise = url => {
const filePromise = downloadFile(url)
cache.set(url, filePromise)
return filePromise
}
module.exports = {
beforeSendRequest(requestDetail) {
if(thereIsADownloadInProgressFor(requestDetail.url)) {
return getCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
} else {
return setAndReturnCachedFilePromise(requestDetail.url).then(file => ({ response: file }))
}
}
};
You don't need to send a try again response, simply serve the same data to both requests. All you need to do is store the requests somewhere in the caching system and trigger all of them when the fetching is done.
Here's a cache implementation that does only a single fetch for multiple requests. No delays and no try-laters:
export class class Cache {
constructor() {
this.resultCache = {}; // this object is the cache storage
}
async get(key, cachedFunction) {
let cached = this.resultCache[key];
if (cached === undefined) { // No cache so fetch data
this.resultCache[key] = {
pending: [] // This is the magic, store further
// requests in this pending array.
// This way pending requests are directly
// linked to this cache data
}
try {
let result = await cachedFunction(); // Wait for result
// Once we get result we need to resolve all pending
// promises. Loop through the pending array and
// resolve them. See code below for how we store pending
// requests.. it will make sense:
this.resultCache[key].pending
.forEach(waiter => waiter.resolve(result));
// Store the result of the cache so later we don't
// have to fetch it again:
this.resultCache[key] = {
data: result
}
// Return result to original promise:
return result;
// Note: yes, this means pending promises will get triggered
// before the original promise is resolved but normally
// this does not matter. You will need to modify the
// logic if you want promises to resolve in original order
}
catch (err) { // Error when fetching result
// We still need to trigger all pending promises to tell
// them about the error. Only we reject them instead of
// resolving them:
if (this.resultCache[key]) {
this.resultCache[key].pending
.forEach((waiter: any) => waiter.reject(err));
}
throw err;
}
}
else if (cached.data === undefined && cached.pending !== undefined) {
// Here's the condition where there was a previous request for
// the same data. Instead of fetching the data again we store
// this request in the existing pending array.
let wait = new Promise((resolve, reject) => {
// This is the "waiter" object above. It is basically
// It is basically the resolve and reject functions
// of this promise:
cached.pending.push({
resolve: resolve,
reject: reject
});
});
return await wait; // await response form original request.
// The code above will cause this to return.
}
else {
// Return cached data as normal
return cached.data;
}
}
}
The code may look a bit complicated but it is actually quite simple. First we need a way to store the cached data. Normally I'd just use a regular object for this:
{ key : result }
Where the cached data is stored in the result. But we also need to store additional metadata such as pending requests for the same result. So we need to modify our cache storage:
{ key : {
data: result,
pending: [ array of requests ]
}
}
All this is invisible and transparent to code using this Cache class.
Usage:
const cache = new Cache();
// Illustrated with w3c fetch API but you may use anything:
cache.get( URL , () => fetch(URL) )
Note that wrapping the fetch in an anonymous function is important because we want the Cache.get() function to conditionally call the fetch to avoid multiple fetch being called. It also gives the Cache class flexibility to handle any kind of asynchronous operation.
Here's another example for caching a setTimeout. It's not very useful but it illustrates the flexibility of the API:
cache.get( 'example' , () => {
return new Promise((resolve, reject) => {
setTimeout(resolve, 1000);
});
});
Note that the Cache class above does not have any invalidations or expiry logic for the sake of clarity but it's fairly easy to add them. For example if you want the cache to expire after some time you can just store the timestamp along with the other cache data:
{ key : {
data: result,
timestamp: timestamp,
pending: [ array of requests ]
}
}
Then in the "no-cache" logic simply detect the expiry time:
if (cached === undefined || (cached.timestamp + timeout) < now) ...
I was trying to create an automation script for work, it is supposed to use multiple puppeteer instances to process input strings simultaneously.
the task queue and number of puppeteer instances are controlled by the package generic-pool,
strangely, when i run the script on ubuntu or debian, it seems that it fells into an infinite loop. tries to run infinite number of puppeteer instances. while when run on windows, the output was normal.
const puppeteer = require('puppeteer');
const genericPool = require('generic-pool');
const faker = require('faker');
let options = require('./options');
let i = 0;
let proxies = [...options.proxy];
const pool = genericPool.createPool({
create: async () => {
i++;
console.log(`create instance ${i}`);
if (!proxies.length) {
proxies = [...options.proxy];
}
let {control = null, proxy} = proxies.pop();
let instance = await puppeteer.launch({
headless: true,
args: [
`--proxy-server=${proxy}`,
]
});
instance._own = {
proxy,
tor: control,
numInstance: i,
};
return instance;
},
destroy: async instance => {
console.log('destroy instance', instance._own.numInstance);
await instance.close()
},
}, {
max: 3,
min: 1,
});
async function run(emails = []) {
console.log('Processing', emails.length);
const promises = emails.map(email => {
console.log('Processing', email)
pool.acquire()
.then(browser => {
console.log(`${email} handled`)
pool.destroy(browser);})
})
await Promise.all(promises)
await pool.drain();
await pool.clear();
}
let emails = [a,b,c,d,e,];
run(emails)
Output
create instance 1
Processing 10
Processing Stacey_Haley52
Processing Polly.Block
create instance 2
Processing Shanny_Hudson59
Processing Vivianne36
Processing Jayda_Ullrich
Processing Cheyenne_Quitzon
Processing Katheryn20
Processing Jamarcus74
Processing Lenore.Osinski
Processing Hobart75
create instance 3
create instance 4
create instance 5
create instance 6
create instance 7
create instance 8
create instance 9
is it because of my async functions? How can I fix it?
Appreciate your help!
Edit 1. modified according to #James suggested
The main problem you are trying to solve,
It is supposed to use multiple puppeteer instances to process input strings simultaneously.
Promise Queue
You can use a rather simple solution that involves a simple promise queue. We can use p-queue package to limit the concurrency as we wish. I used this on multiple scraping projects to always test things out.
Here is how you can use it.
// emails to handle
let emails = [a, b, c, d, e, ];
// create a promise queue
const PQueue = require('p-queue');
// create queue with concurrency, ie: how many instances we want to run at once
const queue = new PQueue({
concurrency: 1
});
// single task processor
const createInstance = async (email) => {
let instance = await puppeteer.launch({
headless: true,
args: [
`--proxy-server=${proxy}`,
]
});
instance._own = {
proxy,
tor: control,
numInstance: i,
};
console.log('email:', email)
return instance;
}
// add tasks to queue
for (let email of emails) {
queue.add(async () => createInstance(email))
}
Generic Pool Infinite Loop Problem
I removed all kind of puppeteer related code from your sample code and saw how it was still producing the infinite output to console.
create instance 70326
create instance 70327
create instance 70328
create instance 70329
create instance 70330
create instance 70331
...
Now, if you test few times, you will see it will throw the loop only if you something on your code is crashing. The culprit is this pool.acquire() promise, which is just re queuing on error.
To find what is causing the crash, use the following events,
pool.on("factoryCreateError", function(err) {
console.log('factoryCreateError',err);
});
pool.on("factoryDestroyError", function(err) {
console.log('factoryDestroyError',err);
});
There are some issues related to this:
acquire() never resolves/rejects if factory always rejects, here.
About the acquire function in pool.js, here.
.acquire() doesn't reject when resource creation fails, here.
Good luck!
You want to return from your map rather than await, also don't await inside the destroy call, return the result and you can chain these e.g.
const promises = emails.map(e => pool.acquire().then(pool.destroy));
Or alternatively, you could just get rid of destroy completely e.g.
pool.acquire().then(b => b.close())
I am adding user validation an data modification page on a node.js application.
In a synchronous universe, in a single function I would:
Lookup the original record in the database
Lookup the user in LDAP to see if they are the owner or admin
Do the logic and write the record.
In an asynchronous universe that won't work. To solve it I've built a series of hand-off functions:
router.post('/writeRecord', jsonParser, function(req, res) {
post = req.post;
var smdb = new AWS.DynamoDB.DocumentClient();
var params = { ... }
smdb.query(params, function(err,data){
if( err == null ) writeRecordStep2(post,data);
}
});
function writeRecord2( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep3( ru, post, data, ldap1 )
}
}
}
function writeRecord3( ru, post, data ){
var conn = new LDAP();
conn.search(
'ou=groups,o=amazon.com',
{ ... },
function(err,resp){
if( err == null ){
writeRecordStep4( ru, post, data, ldap1, ldap2 )
}
}
}
function writeRecordStep4( ru, post, data, ldap1, ldap2 ){
// Do stuff with collected data
}
Additionally, because the LDAP and Dynamo logic are in their own source documents, these functions are scattered tragically around the code.
This strikes me as inefficient, as well as inelegant. I'm eager to find a more natural asynchronous pattern to achieve the same result.
Any promise library should sort your issue out. My preferred choice is bluebird. In summary they help you in performing blocking operations.
If you haven't heard about bluebird then just use it. It converts all function of a module and return promise which is then-able. Simply put, it promisifies all functions.
Here is the mechanism:
Module1.someFunction() \\do your job and finally pass the return object to next call
.then() \\Use that object which is return from the first call, do your job and return the updated value
.then() \\same goes on
.catch() \\do your job when any error occurs.
Hope you understand. Here is an example:
var readFile = Promise.promisify(require("fs").readFile);
readFile("myfile.js",
"utf8").then(function(contents) {
return eval(contents);
}).then(function(result) {
console.log("The result of evaluating
myfile.js", result);
}).catch(SyntaxError, function(e) {
console.log("File had syntax error", e);
//Catch any other error
}).catch(function(e) {
console.log("Error reading file", e);
});
I could not tell from your pseudo-code exactly which async operations depend upon results from with other ones and knowing that is key to the most efficient way to code a series of asynchronous operations. If two operations do not depend upon one another, they can run in parallel which generally gets to an end result faster. I also can't tell exactly what data needs to be passed on to later parts of the async requests (too much pseudo-code and not enough real code to show us what you're really attempting to do).
So, without that level of detail, I'll show you two ways to approach this. The first runs each operation sequentially. Run the first async operation, when it's done, run the next one and accumulates all the results into an object that is passed along to the next link in the chain. This is general purpose since all async operations have access to all the prior results.
This makes use of promises built into the AWS.DynamboDB interface and makes our own promise for conn.search() (though if I knew more about that interface, it may already have a promise interface).
Here's the sequential version:
// promisify the search method
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
// utility function that does a search and adds the result to the object passed in
// returns a promise that resolves to the object
function ldapSearch(data, key) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... }).then(results => {
// put our results onto the passed in object
data[key] = results;
// resolve with the original object (so we can collect data here in a promise chain)
return data;
});
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
// The latest AWS interface gets a promise with the .promise() method
smdb.query(params).promise().then(dbresult => {
return ldapSearch({post, dbresult}, "ldap1");
}).then(result => {
// result.dbresult
// result.ldap1
return ldapSearch(result, "ldap2")
}).then(result => {
// result.dbresult
// result.ldap1
// result.ldap2
// doSomething with all the collected data here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
And, here's a parallel version that runs all three async operations at once and then waits for all three of the to be done and then has all the results at once:
// if the three async operations you show can be done in parallel
// first promisify things
const util = require('util');
LDAP.prototype.searchAsync = util.promisify(LDAP.prototype.search);
function ldapSearch(params) {
var conn = new LDAP();
return conn.searchAsync('ou=groups,o=amazon.com', { ... });
}
router.post('/writeRecord', jsonParser, function(req, res) {
let post = req.post;
let smdb = new AWS.DynamoDB.DocumentClient();
let params = { ... }
Promise.all([
ldapSearch(...),
ldapSearch(...),
smdb.query(params).promise()
]).then(([ldap1Result, ldap2Result, queryResult]) => {
// process ldap1Result, ldap2Result and queryResult here
}).catch(err => {
console.log(err);
res.status(500).send("Internal Error");
});
});
Keep in mind that due to the pseudo-code nature of the code in your question, this is also pseudo-code where implementation details (exactly what parameters you're searching for, what response you're sending, etc...) have to be filled in. This should be illustrative of promise chaining to serialize operations and the use of Promise.all() for parallelizing operations and promisifying a method that didn't have promises built in.
I am using nodejs v8+ which supports default async await style of code.
In my problem, I am trying to push all the promises into an array and then use Promise.all to with await keyword to get the responses. But the problem is, I am not able to map the promises with the keys.
Here is the example:
let users = ["user1", "user2", "user3"];
let promises = [];
for(let i = 0; i < users.length; i++){
let response = this.myApiHelper.getUsersData(users[i]);
promises.push(response);
}
let allResponses = await Promise.all(promises);
Here I get all the collected response of all the responses, but I actually want to map this by user.
For example currently I am getting data in this format:
[
{
user1 Data from promise 1
},
{
user2 Data from promise 2
},
{
user3 Data from promise 3
}
]
But I want data in this format:
[
{
"user1": Data from promise 1
},
{
"user2": Data from promise 2
},
{
"user3": Data from promise 3
}
]
I am sure there must be a way to map every promise by user, but I am not aware of.
We gonna create an array of user. Iterate over it to create an array of Promise we give to Promise.all. Then iterate on the answer to create an object that's matching the user with the associated answer from getUsersData.
Something to know about Promise.all is that the order of the returned data depends on the order of the promises given in entry of it.
const users = ['user1', 'user2', 'user3'];
const rets = await Promise.all(users.map(x => this.myApiHelper.getUsersData(x)));
const retWithUser = rets.map((x, xi) => ({
user: users[xi],
ret: x,
}));
Here you have a great tutorial about Array methods (map, filter, some...).
Though the answer accepted by me is one of the solution, which I think is perfect and I will keep as accepted answer but I would like to post my solution which I figured out later which is much simpler:
We could simply use this:
let finalData = {};
let users = ["user1", "user2", "user3"];
await Promise.all(users.map(async(eachUser) => {
finalData[user] = await this.myApiHelper.getUsersData(eachUser);
}))
console.log(finalData);
as described here
How to use Promise.all with an object as input
Here is a simple ES2015 function that takes an object with properties that might be promises and returns a promise of that object with resolved properties.
function promisedProperties(object) {
let promisedProperties = [];
const objectKeys = Object.keys(object);
objectKeys.forEach((key) => promisedProperties.push(object[key]));
return Promise.all(promisedProperties)
.then((resolvedValues) => {
return resolvedValues.reduce((resolvedObject, property, index) => {
resolvedObject[objectKeys[index]] = property;
return resolvedObject;
}, object);
});
}
And then you will use it like so:
var promisesObject = {};
users.map((element, index) => object[element] = this.myApiHelper.getUsersData(users[index]));
promisedProperties(object).then(response => console.log(response));
Note that first of all you need an object with key/promise.