AWS Lambda function flow - node.js

I'm having some issues with how my functions flow in lambda. I'm trying to grab value stored in S3, increment it, and put it back. However, my program doesn't flow how I feel it should be. I'm using async waterfall to run the flow of my functions.
Here's my code:
let AWS = require('aws-sdk');
let async = require('async');
let bucket = "MY_BUCKET";
let key = "MY_FILE.txt";
exports.handler = async (event) => {
let s3 = new AWS.S3();
async.waterfall([
download,
increment,
upload
], function (err) {
if (err) {
console.error(err);
} else {
console.log("Increment successful");
}
console.log("test4");
return null;
}
);
console.log("test5");
function download(next) {
console.log("test");
s3.getObject({
Bucket: bucket,
Key: key
},
next);
}
function increment(response, next) {
console.log("test2");
console.log(response.Body);
let newID = parseInt(response.Body, 10) + 1;
next(response.ContentType, newID);
}
function upload(contentType, data, next) {
console.log("test3");
s3.putObject({
Bucket: bucket,
Key: key,
Body: data,
ContentType: contentType
},
next);
}
};
I'm only getting test and test5 on my log. I was under the impression that after the download function, increment should run if it was okay or the callback function at the end of the waterfall should run if there was an error. The program doesn't give an error on execution and it doesn't appear to go to either function.
Could someone guide me to what I'm missing in my understanding?
EDIT: So it seems my issue was related to my function declaration. The default template declared it as async(event). I thought this was odd as usually they are declared as (event, context, callback). Switching to the later (or even just (event) without the async) fixed this. It looks like my issue is with calling the function as asynchronous. This blocked the waterfall async calls?? Can anyone elaborate on this?

Your problem is that your handler is declared as an async function, which will create a promise for you automatically, but since you are not awaiting at all your function is essentially ending synchronously.
There are a couple of ways to solve this, all of which we'll go over.
Do not use promises, use callbacks as the async library is designed to use.
Do not use the async library or callbacks and instead use async/await.
Mix both together and make your own promise and resolve/reject it manually.
1. Do not use promises
In this solution, you would remove the async keyword and add the callback parameter lambda is passing to you. Simply calling it will end the lambda, passing it an error will signal that the function failed.
// Include the callback parameter ────┐
exports.handler = (event, context, callback) => {
const params =[
download,
increment,
upload
]
async.waterfall(params, (err) => {
// To end the lambda call the callback here ──────┐
if (err) return callback(err); // error case ──┤
callback({ ok: true }); // success case ──┘
});
};
2. Use async/await
The idea here is to not use callback style but to instead use the Promise based async/await keywords. If you return a promise lambda will use that promise to handle lambda completion instead of the callback.
If you have a function with the async keyword it will automatically return a promise that is transparent to your code.
To do this we need to modify your code to no longer use the async library and to make your other functions async as well.
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const Bucket = "MY_BUCKET";
const Key = "MY_FILE.txt";
async function download() {
const params = {
Bucket,
Key
}
return s3.getObject(params).promise(); // You can await or return a promise
}
function increment(response) {
// This function is synchronous, no need for promises or callbacks
const { ContentType: contentType, Body } = response;
const newId = parseInt(Body, 10) + 1;
return { contentType, newId };
}
async function upload({ contentType: ContentType, newId: Body }) {
const params = {
Bucket,
Key,
Body,
ContentType
};
return s3.putObject(params).promise();
}
exports.handler = async (event) => {
const obj = await download(); // await the promise completion
const data = increment(obj); // call synchronously without await
await upload(data)
// The handlers promise will be resolved after the above are
// all completed, the return result will be the lambdas return value.
return { ok: true };
};
3. Mix promises and callbacks
In this approach we are still using the async library which is callback based but our outer function is promised based. This is fine but in this scenario we need to make our own promise manually and resolve or reject it in the waterfall handler.
exports.handler = async (event) => {
// In an async function you can either use one or more `await`'s or
// return a promise, or both.
return new Promise((resolve, reject) => {
const steps = [
download,
increment,
upload
];
async.waterfall(steps, function (err) {
// Instead of a callback we are calling resolve or reject
// given to us by the promise we are running in.
if (err) return reject(err);
resolve({ ok: true });
});
});
};
Misc
In addition to the main problem of callbacks vs. promises you are encountering you have a few minor issues I noticed:
Misc 1
You should be using const rather than let most of the time. The only time you should use let is if you intend to reassign the variable, and most of the time you shouldn't do that. I would challenge you with ways to write code that never requires let, it will help improve your code in general.
Misc 2
You have an issue in one of your waterfall steps where you are returning response.ContentType as the first argument to next, this is a bug because it will interpret that as an error. The signature for the callback is next(err, result) so you should be doing this in your increment and upload functions:
function increment(response, next) {
const { ContentType: contentType, Body: body } = response;
const newId = parseInt(body, 10) + 1;
next(null, { contentType, newId }); // pass null for err
}
function upload(result, next) {
const { contentType, newId } = result;
s3.putObject({
Bucket: bucket,
Key: key,
Body: newId,
ContentType: contentType
},
next);
}
If you don't pass null or undefined for err when calling next async will interpret that as an error and will skip the rest of the waterfall and go right to the completion handler passing in that error.
Misc 3
What you need to know about context.callbackWaitsForEmptyEventLoop is that even if you complete the function correctly, in one of the ways discussed above your lambda may still hang open and eventually timeout rather than successfully complete. Based on your code sample here you won't need to worry about that probably but the reason why this can happen is if you happen to have something that isn't closed properly such as a persistent connection to a database or websocket or something like that. Setting this flag to false at the beginning of your lambda execution will cause the process to exit regardless of anything keeping the event loop alive, and will force them to close ungracefully.
In the case below your lambda can do the work successfully and even return a success result but it will hang open until it timesout and be reported as an error. It can even be re-invoked over and over depending on how it's triggered.
exports.handler = async (event) => {
const db = await connect()
await db.write(data)
// await db.close() // Whoops forgot to close my connection!
return { ok: true }
}
In that case simply calling db.close() would solve the issue but sometimes its not obvious what is hanging around in the event loop and you just need a sledge hammer type solution to close the lambda, which is what context.callbackWaitsForEmptyEventLoop = false is for!
exports.handler = async (event) => {
context.callbackWaitsForEmptyEventLoop = false
const db = await connect()
await db.write(data)
return { ok: true }
}
The above will complete the lambda as soon as the function returns, killing all connections or anything else living in the event loop still.

Your function terminates before the waterfall is resolved. That is, the asynchronous calls aren't executed at all. That is why you don't see any of the console.log calls you have within the waterfall functions, and only see the one that is called synchronously immediately after the call to async.waterfall.
Not sure how well async.waterfall is supported by AWS Lambda, but since promises are natively supported and perform the same functionality (with fewer loc), you could use promises instead. Your code would look something like this:
module.exports.handler = (event,context) =>
s3.getObject({
Bucket: bucket,
Key: key
}).promise()
.then(response => ({
Body: parseInt(response.Body, 10) + 1,
ContentType: response.contentType,
}))
.then(modifiedResponse => s3.putObject({
Bucket: bucket,
Key: key,
Body: modifiedResponse.data,
ContentType: modifiedResponse.contentType}).promise())
.catch(err => console.error(err));

Related

Async - Await issue using Twitter API

I'm currently trying to practice using an API with Twitter API.
I'm using Twit package to connect to twitters API but when I try to do a get request I get
Promise { pending }
I have tried using Async-Await but I'm not sure what I'm doing wrong here.
Here is my code:
const Twit = require('twit');
const twitterAPI = require('../secrets');
//Twit uses OAuth to stablish connection with twitter
let T = new Twit({
consumer_key: twitterAPI.apiKey,
consumer_secret: twitterAPI.apiSecretKey,
access_token: twitterAPI.accessToken,
access_token_secret: twitterAPI.accessTokenSecret
})
const getUsersTweets = async (userName) => {
let params = { screen_name: userName, count: 1 }
const userTweets = await T.get('search/tweets', params, await function (err, data, response) {
if (err) {
return 'There was an Error', err.stack
}
return data
})
return userTweets
}
console.log(getUsersTweets('Rainbow6Game'));
Problem
The biggest assumption that is wrong with the sample code is that T.get is expected to eventually resolve with some data.
const userTweets = await T.get('search/tweets', params, await function (err, data, response) {
if (err) {
return 'There was an Error', err.stack
}
return data // 'data' returned from here is not necessarily going to be received in 'userTweets' variable
})
callback function provided as last argument in T.get function call doesn't have to be preceded by an 'await'.
'data' returned from callback function is not necessarily going to be received in 'userTweets' variable. It totally depends on how T.get is implemented and can not be controlled.
Reason
Thing to be noted here is that async / await works well with Promise returning functions which eventually get resolved with expected data, however, that is not guaranteed here
Relying on the result of asynchronous T.get function call will probably not work because it returns a Promise { pending } object immediately and will get resolved with no data. The best case scenario is that everything with your function will work but getUsersTweets function will return 'undefined'.
Solution
The best solution is to make sure that your getUsersTweets function returns a promise which eventually gets resolved with correct data. Following changes are suggested:
const getUsersTweets = (userName) => {
return new Promise ((resolve, reject) => {
let params = { screen_name: userName, count: 1 }
T.get('search/tweets', params, function (err, data, response) {
if (err) {
reject(err);
}
resolve(data);
})
}
}
The above function is now guaranteed to return expected data and can be used in the following way:
const printTweets = async () => {
const tweets = await getUsersTweet(userName);
console.log(tweets);
}
printTweets();
From what I can see on your code, getUserTweets is an async function, so it will eventually return a promise. I'm assuming you will use this value on another function, so you will need to use it inside an async function and use await, otherwise you will always get a promise.
const logTweets = async (username) => {
try {
const userTweets = await getUsersTweets(username);
// Do something with the tweets
console.log(userTweets);
catch (err) {
// catch any error
console.log(err);
}
}
If logging is all you want and you wrapped it inside a function in which you console.log it, you can call that function directly:
logTweets('someUsername');

AWS SQS Node.js script does not await

Trying to send several messages (from AWS SQS lambda, if that matters) but it's never waiting for the promises.
function getEndpoint(settings){
return new Promise(function(resolve, reject) {
// [...] more stuff here
}
Which is then called in a loop:
exports.handler = async (event) => {
var messages = [];
event.Records.forEach(function(messageId, body) {
//options object created from some stuff
messages.push(getEndpoint(options).then(function(response){
console.log("anything at all"); //NEVER LOGGED
}));
});
await Promise.all(messages);
};
But the await seems to be flat out skipped. I'm not sure how I'm getting Process exited before completing request with an explicit await. I have similar async await/promise setups in other scripts that work, but cannot spot what I've done wrong with this one.
You forgot to return something to lambda:
exports.handler = async (event) => {
var messages = [];
event.Records.forEach(function(messageId, body) {
//options object created from some stuff
messages.push(getEndpoint(options));
});
await Promise.all(messages);
return 'OK'
};
this should also work:
exports.handler = (event) => { // async is not mandatory here
var messages = [];
event.Records.forEach(function(messageId, body) {
//options object created from some stuff
messages.push(getEndpoint(options));
});
return Promise.all(messages); // returning a promise
};
and you could use map:
exports.handler = (event) => { // async is not mandatory here
const messages = event.Records.map(function(messageId, body) {
//options object created from some stuff
return getEndpoint(options)
});
return Promise.all(messages); // returning a promise
};
To understand why this happens, you must dive a bit into lambda's implementation: it will essentially wait for the function stack to be cleared and since you did NOT return anything at all in there, the function stack got empty right after it queued all the stuff - adding a simple return after the await call makes the fn stack to NOT be empty which means lambda will wait for it to be finished.
If you run this on standard node, your function would also return before the promises were finished BUT your node process would NOT exit until the stack was cleared. This is where lambda diverges from stock node.

What's the right way to return from an AWS Lambda function in Node.js?

I am new to AWS Lambda and there is one thing I find very confusing.
So far, I found following options how to return from a function in Node.js:
1.
exports.handler = (event, context) => {
context.succeed('ok');
}
2.
exports.handler = (event, context) => {
context.done(null, 'ok');
}
3.
exports.handler = (event, context, callback) => {
callback(null, 'ok');
}
4.
exports.handler = async event => {
return "ok";
}
How are these different? Any functionality or performance distinctions?
Can anyone explain how to terminate a function in the right way?
You're probably using Node.js 8.10, which is so far the last Node.js version supported by AWS Lambda, otherwise the last (4.) snippet woudn't work at all (due to a syntax error).
In Node.js 8.10 all the above listed variants are valid, most of them are still there only for compatibility with earlier runtime versions.
The fist two (1. and 2.) are the oldest ones and it's not recomended to use them anymore. The done(err?, res?) function is an equivalent to the later added callback(err?, res?), which was frequently used before Node.js 8.10 and you can still find a lot of code examples even in the official documentation. It's a typical callback and could be used in asynchronous processing:
exports.handler = (event, context, callback) => {
var params = {
Bucket: "examplebucket",
Key: "HappyFace.jpg"
};
s3.getObject(params, function(err, data) {
if (err) return callback(err);
callback(null, data);
});
}
Nevertheless, this function has all the drawbacks of using callbacks in general (Callback Hell).
Up Node.js 8.10 you can use Promises together with async/await syntactic sugar, which makes asynchronous computation look like synchronous. AWS JavaScript SDK added a function promise() to almost all the functions previously using callbacks. Now you can write:
exports.handler = async event => {
var params = {
Bucket: "examplebucket",
Key: "HappyFace.jpg"
};
var data = await s3.getObject(params).promise();
// here you process the already resolved data...
return data;
// or you can omit `await` here whatsoever:
// return s3.getObject(params).promise();
}
Which produces shorter and eleganter code, more readable for humans.
Of course, at the end you can choose what you like, just compare your example snippets...

Why my lambda is working only when i give a callback with some message?

I am trying to run a small snippet of lambda code where i am pushing data to S3 using firehose. Here is my snippet
const AWS = require( 'aws-sdk' );
var FIREhose = new AWS.Firehose();
exports.handler = async (event,context,callback) => {
// TODO implement
const response = {
statusCode:200,
Name:event.Name,
Value:event.Value
};
const params = {
DeliveryStreamName: 'kinesis-firehose',
Record: { Data: new Buffer(JSON.stringify(response)) }
};
FIREhose.putRecord(params, (err, data) => {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data);
});
};
Here are my events
{
"Name": "Mike",
"Value": "66"
}
When i run this lambda all i am getting response as null . Since i am not passing any callback lambda will default run the implicit callback and returns null. I see that no data is pushed to S3 bucket.
But when i add callback(null,"success") line at the end like this
FIREhose.putRecord(params, (err, data) => {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data);
});
callback(null,"success")
};
I see the data is pushed to S3. Why is that ?
Does async functions always need a callback with some text appended to it ?
Any help is appreciated ?
Thanks
The problem here is that you're mixing your node.js lambda patterns.
Either you use an asynchronous function and return or throw:
exports.handler = async (event,context,callback) => {
// code goes here.
await FIREhose.putRecord(params).promise();
return null; // or whatever result.
};
Or you use the callback approach:
exports.handler = (event,context,callback) => {
// code goes here.
FIREhose.putRecord(params)
.promise();
.then((data) => {
// do stuff with data.
// n.b. you could have used the cb instead of a promise here too.
callback(null, null); // or whatever result.
});
};
(There's a third way using context. but that's a very legacy way).
This is all due to how lambda works and detects when there's been a response.
In your first example (no callback), lambda is expecting your handler to return a promise that it has to wait to resolve/reject, which, in turn, will be the response. However, you're not returning a promise (undefined) and so there's nothing to wait for and it immediately returns- quite probably before the putRecord call has completed.
When you used callback though, you explicitly told lambda that you're using the "old" way. And the interesting thing about the callback approach is that it waits for node's event loop to complete (by default). Which means that .putRecord will probably complete.

NodeJs delay each promise within Promise.all()

I'm trying to update a tool that was created a while ago which uses nodejs (I am not a JS developer, so I'm trying to piece the code together) and am getting stuck at the last hurdle.
The new functionality will take in a swagger .json definition, compare the endpoints against the matching API Gateway on the AWS Service, using the 'aws-sdk' SDK for JS and then updates the Gateway accordingly.
The code runs fine on a small definition file (about 15 endpoints) but as soon as I give it a bigger one, I start getting tons of TooManyRequestsException errors.
I understand that this is due to my calls to the API Gateway service being too quick and a delay / pause is needed. This is where I am stuck
I have tried adding;
a delay() to each promise being returned
running a setTimeout() in each promise
adding a delay to the Promise.all and Promise.mapSeries
Currently my code loops through each endpoint within the definition and then adds the response of each promise to a promise array:
promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath));
Once the loop is finished I run this:
return Promise.all(promises)
.catch((err) => {
winston.error(err);
})
I have tried the same with a mapSeries (no luck).
It looks like the functions within the (getMethodResponse promise) are run immediately and hence, no matter what type of delay I add they all still just execute. My suspicious is that the I need to make (getMethodResponse) return a function and then use mapSeries but I cant get this to work either.
Code I tried:
Wrapped the getMethodResponse in this:
return function(value){}
Then added this after the loop (and within the loop - no difference):
Promise.mapSeries(function (promises) {
return 'a'();
}).then(function (results) {
console.log('result', results);
});
Also tried many other suggestions:
Here
Here
Any suggestions please?
EDIT
As request, some additional code to try pin-point the issue.
The code currently working with a small set of endpoints (within the Swagger file):
module.exports = (apiName, externalUrl) => {
return getSwaggerFromHttp(externalUrl)
.then((swagger) => {
let paths = swagger.paths;
let resourcePath = '';
let resourceMethod = '';
let promises = [];
_.each(paths, function (value, key) {
resourcePath = key;
_.each(value, function (value, key) {
resourceMethod = key;
let statusList = [];
_.each(value.responses, function (value, key) {
if (key >= 200 && key <= 204) {
statusList.push(key)
}
});
_.each(statusList, function (value, key) { //Only for 200-201 range
//Working with small set
promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath))
});
});
});
//Working with small set
return Promise.all(promises)
.catch((err) => {
winston.error(err);
})
})
.catch((err) => {
winston.error(err);
});
};
I have since tried adding this in place of the return Promise.all():
Promise.map(promises, function() {
// Promise.map awaits for returned promises as well.
console.log('X');
},{concurrency: 5})
.then(function() {
return console.log("y");
});
Results of this spits out something like this (it's the same for each endpoint, there are many):
Error: TooManyRequestsException: Too Many Requests
X
Error: TooManyRequestsException: Too Many Requests
X
Error: TooManyRequestsException: Too Many Requests
The AWS SDK is being called 3 times within each promise, the functions of which are (get initiated from the getMethodResponse() function):
apigateway.getRestApisAsync()
return apigateway.getResourcesAsync(resourceParams)
apigateway.getMethodAsync(params, function (err, data) {}
The typical AWS SDK documentation state that this is typical behaviour for when too many consecutive calls are made (too fast). I've had a similar issue in the past which was resolved by simply adding a .delay(500) into the code being called;
Something like:
return apigateway.updateModelAsync(updateModelParams)
.tap(() => logger.verbose(`Updated model ${updatedModel.name}`))
.tap(() => bar.tick())
.delay(500)
EDIT #2
I thought in the name of thorough-ness, to include my entire .js file.
'use strict';
const AWS = require('aws-sdk');
let apigateway, lambda;
const Promise = require('bluebird');
const R = require('ramda');
const logger = require('../logger');
const config = require('../config/default');
const helpers = require('../library/helpers');
const winston = require('winston');
const request = require('request');
const _ = require('lodash');
const region = 'ap-southeast-2';
const methodLib = require('../aws/methods');
const emitter = require('../library/emitter');
emitter.on('updateRegion', (region) => {
region = region;
AWS.config.update({ region: region });
apigateway = new AWS.APIGateway({ apiVersion: '2015-07-09' });
Promise.promisifyAll(apigateway);
});
function getSwaggerFromHttp(externalUrl) {
return new Promise((resolve, reject) => {
request.get({
url: externalUrl,
header: {
"content-type": "application/json"
}
}, (err, res, body) => {
if (err) {
winston.error(err);
reject(err);
}
let result = JSON.parse(body);
resolve(result);
})
});
}
/*
Deletes a method response
*/
function deleteMethodResponse(httpMethod, resourceId, restApiId, statusCode, resourcePath) {
let methodResponseParams = {
httpMethod: httpMethod,
resourceId: resourceId,
restApiId: restApiId,
statusCode: statusCode
};
return apigateway.deleteMethodResponseAsync(methodResponseParams)
.delay(1200)
.tap(() => logger.verbose(`Method response ${statusCode} deleted for path: ${resourcePath}`))
.error((e) => {
return console.log(`Error deleting Method Response ${httpMethod} not found on resource path: ${resourcePath} (resourceId: ${resourceId})`); // an error occurred
logger.error('Error: ' + e.stack)
});
}
/*
Deletes an integration response
*/
function deleteIntegrationResponse(httpMethod, resourceId, restApiId, statusCode, resourcePath) {
let methodResponseParams = {
httpMethod: httpMethod,
resourceId: resourceId,
restApiId: restApiId,
statusCode: statusCode
};
return apigateway.deleteIntegrationResponseAsync(methodResponseParams)
.delay(1200)
.tap(() => logger.verbose(`Integration response ${statusCode} deleted for path ${resourcePath}`))
.error((e) => {
return console.log(`Error deleting Integration Response ${httpMethod} not found on resource path: ${resourcePath} (resourceId: ${resourceId})`); // an error occurred
logger.error('Error: ' + e.stack)
});
}
/*
Get Resource
*/
function getMethodResponse(httpMethod, statusCode, apiName, resourcePath) {
let params = {
httpMethod: httpMethod.toUpperCase(),
resourceId: '',
restApiId: ''
}
return getResourceDetails(apiName, resourcePath)
.error((e) => {
logger.unimportant('Error: ' + e.stack)
})
.then((result) => {
//Only run the comparrison of models if the resourceId (from the url passed in) is found within the AWS Gateway
if (result) {
params.resourceId = result.resourceId
params.restApiId = result.apiId
var awsMethodResponses = [];
try {
apigateway.getMethodAsync(params, function (err, data) {
if (err) {
if (err.statusCode == 404) {
return console.log(`Method ${params.httpMethod} not found on resource path: ${resourcePath} (resourceId: ${params.resourceId})`); // an error occurred
}
console.log(err, err.stack); // an error occurred
}
else {
if (data) {
_.each(data.methodResponses, function (value, key) {
if (key >= 200 && key <= 204) {
awsMethodResponses.push(key)
}
});
awsMethodResponses = _.pull(awsMethodResponses, statusCode); //List of items not found within the Gateway - to be removed.
_.each(awsMethodResponses, function (value, key) {
if (data.methodResponses[value].responseModels) {
var existingModel = data.methodResponses[value].responseModels['application/json']; //Check if there is currently a model attached to the resource / method about to be deleted
methodLib.updateResponseAssociation(params.httpMethod, params.resourceId, params.restApiId, statusCode, existingModel); //Associate this model to the same resource / method, under the new response status
}
deleteMethodResponse(params.httpMethod, params.resourceId, params.restApiId, value, resourcePath)
.delay(1200)
.done();
deleteIntegrationResponse(params.httpMethod, params.resourceId, params.restApiId, value, resourcePath)
.delay(1200)
.done();
})
}
}
})
.catch(err => {
console.log(`Error: ${err}`);
});
}
catch (e) {
console.log(`getMethodAsync failed, Error: ${e}`);
}
}
})
};
function getResourceDetails(apiName, resourcePath) {
let resourceExpr = new RegExp(resourcePath + '$', 'i');
let result = {
apiId: '',
resourceId: '',
path: ''
}
return helpers.apiByName(apiName, AWS.config.region)
.delay(1200)
.then(apiId => {
result.apiId = apiId;
let resourceParams = {
restApiId: apiId,
limit: config.awsGetResourceLimit,
};
return apigateway.getResourcesAsync(resourceParams)
})
.then(R.prop('items'))
.filter(R.pipe(R.prop('path'), R.test(resourceExpr)))
.tap(helpers.handleNotFound('resource'))
.then(R.head)
.then([R.prop('path'), R.prop('id')])
.then(returnedObj => {
if (returnedObj.id) {
result.path = returnedObj.path;
result.resourceId = returnedObj.id;
logger.unimportant(`ApiId: ${result.apiId} | ResourceId: ${result.resourceId} | Path: ${result.path}`);
return result;
}
})
.catch(err => {
console.log(`Error: ${err} on API: ${apiName} Resource: ${resourcePath}`);
});
};
function delay(t) {
return new Promise(function(resolve) {
setTimeout(resolve, t)
});
}
module.exports = (apiName, externalUrl) => {
return getSwaggerFromHttp(externalUrl)
.then((swagger) => {
let paths = swagger.paths;
let resourcePath = '';
let resourceMethod = '';
let promises = [];
_.each(paths, function (value, key) {
resourcePath = key;
_.each(value, function (value, key) {
resourceMethod = key;
let statusList = [];
_.each(value.responses, function (value, key) {
if (key >= 200 && key <= 204) {
statusList.push(key)
}
});
_.each(statusList, function (value, key) { //Only for 200-201 range
promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath))
});
});
});
//Working with small set
return Promise.all(promises)
.catch((err) => {
winston.error(err);
})
})
.catch((err) => {
winston.error(err);
});
};
You apparently have a misunderstanding about what Promise.all() and Promise.map() do.
All Promise.all() does is keep track of a whole array of promises to tell you when the async operations they represent are all done (or one returns an error). When you pass it an array of promises (as you are doing), ALL those async operations have already been started in parallel. So, if you're trying to limit how many async operations are in flight at the same time, it's already too late at that point. So, Promise.all() by itself won't help you control how many are running at once in any way.
I've also noticed since, that it seems this line promises.push(getMethodResponse(resourceMethod, value, apiName, resourcePath)) is actually executing promises and not simply adding them to the array. Seems like the last Promise.all() doesn't actually do much.
Yep, when you execute promises.push(getMethodResponse()), you are calling getMethodResponse() immediately right then. That starts the async operation immediately. That function then returns a promise and Promise.all() will monitor that promise (along with all the other ones you put in the array) to tell you when they are all done. That's all Promise.all() does. It monitors operations you've already started. To keep the max number of requests in flight at the same time below some threshold, you have to NOT START the async operations all at once like you are doing. Promise.all() does not do that for you.
For Bluebird's Promise.map() to help you at all, you have to pass it an array of DATA, not promises. When you pass it an array of promises that represent async operations that you've already started, it can do no more than Promise.all() can do. But, if you pass it an array of data and a callback function that can then initiate an async operation for each element of data in the array, THEN it can help you when you use the concurrency option.
Your code is pretty complex so I will illustrate with a simple web scraper that wants to read a large list of URLs, but for memory considerations, only process 20 at a time.
const rp = require('request-promise');
let urls = [...]; // large array of URLs to process
Promise.map(urls, function(url) {
return rp(url).then(function(data) {
// process scraped data here
return someValue;
});
}, {concurrency: 20}).then(function(results) {
// process array of results here
}).catch(function(err) {
// error here
});
In this example, hopefully you can see that an array of data items are being passed into Promise.map() (not an array of promises). This, then allows Promise.map() to manage how/when the array is processed and, in this case, it will use the concurrency: 20 setting to make sure that no more than 20 requests are in flight at the same time.
Your effort to use Promise.map() was passing an array of promises, which does not help you since the promises represent async operations that have already been started:
Promise.map(promises, function() {
...
});
Then, in addition, you really need to figure out what exactly causes the TooManyRequestsException error by either reading documentation on the target API that exhibits this or by doing a whole bunch of testing because there can be a variety of things that might cause this and without knowing exactly what you need to control, it just takes a lot of wild guesses to try to figure out what might work. The most common things that an API might detect are:
Simultaneous requests from the same account or source.
Requests per unit of time from the same account or source (such as request per second).
The concurrency operation in Promise.map() will easily help you with the first option, but will not necessarily help you with the second option as you can limit to a low number of simultaneous requests and still exceed a requests per second limit. The second needs some actual time control. Inserting delay() statements will sometimes work, but even that is not a very direct method of managing it and will either lead to inconsistent control (something that works sometimes, but not other times) or sub-optimal control (limiting yourself to something far below what you can actually use).
To manage to a request per second limit, you need some actual time control with a rate limiting library or actual rate limiting logic in your own code.
Here's an example of a scheme for limiting the number of requests per second you are making: How to Manage Requests to Stay Below Rate Limiting.

Resources