I have a Node app in which there is a Gremlin client:
var Gremlin = require('gremlin');
const client = Gremlin.createClient(
443,
config.endpoint,
{
"session": false,
"ssl": true,
"user": `/dbs/${config.database}/colls/${config.collection}`,
"password": config.primaryKey
}
);
With which I then making calls to a CosmoDB to add some records using:
async.forEach(pData, function (data, innercallback) {
if (data.type == 'Full'){
client.execute("g.addV('test').property('id', \"" + data.$.id + "\")", {}, innercallback);
} else {
innercallback(null);
}
}, outercallback);
However on my Azure side there is a limit of 400 requests / second and subsequently I get the error:
ExceptionType : RequestRateTooLargeException
ExceptionMessage : Message: {"Errors":["Request rate is large"]}
Does anyone have any ideas on how I can restrict the number of requests made per second, without having to scale up on Azure (as that costs more :) )
Additionally:
I tried using
async.forEachLimit(pData, 400, function (data, innercallback) {
if (data.type == 'Full'){
client.execute("g.addV('test').property('id', \"" + data.$.id + "\")", {}, innercallback);
} else {
innercallback(null);
}
}, outercallback);
However if keep seeing RangeError: Maximum call stack size exceeded if its too high otherwise if I reduce I just get the same request rate too large exception.
Thanks.
RangeError: Maximum call stack size exceeded
That might happen because innercallback is called synchronously in else case. It should be:
} else {
process.nextTick(function() {
innercallback(null)
});
}
The call to forEachLimit looks generally correct, but you need to make sure that when a request is really fired (if block), innercallback is not called earlier than 1 second to guarantee that there are no more than 400 request in one second fired. The easiest is to delay the callback execution exactly for 1 second:
client.execute("g.addV('test').property('id', \"" + data.$.id + "\")", {},
function(err) {
setTimeout(function() { innercallback(err); }, 1000);
});
The more accurate solution would be to calculate the actual request+response time and setTimeout only for the time remaining to 1 second.
As a further improvement, looks like you can filter your pData array before doing async stuff to get rid of if...else, so eventually:
var pDataFull = pData.filter(function(data) => {
return data.type == 'Full';
});
async.forEachLimit(pDataFull, 400, function (data, innercallback) {
client.execute("g.addV('test').property('id', \"" + data.$.id +
"\")", {},
function(err) {
setTimeout(function() { innercallback(err); }, 1000);
}
);
}, outercallback);
Let's clear up something first. You don't have a 400 requests/second collection but a 400 RU/s collection. RUs stand for request units and they don't translate to a request.
Roughly:
A retrieval request for the retrieval of a document that's 1KB will cost 1 RU.
A modification for the retrieval of a document that's 1KB will cost 5 RU.
Assuming your documents are 1KB big, you can only add 80 documents per second.
Now that we have that out of the way, it sounds like async.queue() can do the trick for you.
Related
I am new to nodejs. I want to limit my external API call to 5 per minute. If I exceeds more than 5 API call per minute, I will get following error.
You have exceeded the maximum requests per minute.
This is my code. Here tickerSymbol array that is passed to scheduleTickerDetails function will be a large array with almost 100k elements in it.
public async scheduleTickerDetails(tickerSymbol: any) {
for(let i=0;i<tickerSymbol.length;i++) {
if(i%5 == 0){
await this.setTimeOutForSymbolAPICall(60000);}
await axios.get('https://api.polygon.io/v1/meta/symbols/' + tickerSymbol[i] + '/company?apiKey=' + process.env.POLYGON_API_KEY).then(async function (response: any) {
console.log("Logo : " + response.data.logo + 'TICKER :' + tickerSymbol[i]);
let logo = response.data.logo;
if (await stockTickers.updateOne({ ticker: tickerSymbol[i] }, { $set: { "logo": logo } }))
return true;
else
return false;
})
.catch(function (error: any) {
console.log("Error from symbol service file : " + error + 'symbol:'+tickerSymbol[i]);
});
}
}
/**
* Set time out for calling symbol API call
* #param minute
* #return Promise
*/
public setTimeOutForSymbolAPICall(minute:any) {
return new Promise( resolve => {
setTimeout(()=> {resolve('')} ,minute );
})
}
I want to send 1st 5 APIs first, then after a minute I need to send next 5 APIs and so on. I have created a setTimeOut fucntion for this, but sometimes in my console
Error: Request failed with status code 429 : You've exceeded the maximum requests per minute.
The for loop in JS runs immediately to completion while all your
asynchronous operations are started.
Refer this answer.
I took one of the sample functions from the Firestore documentation and was able to successfully run it from my local firebase environment. However, once I deployed to my firebase server, the function completes, but no entries are made in the firestore database. The firebase function logs show "Deadline Exceeded." I'm a bit baffled. Anyone know why this is happening and how to resolve this?
Here is the sample function:
exports.testingFunction = functions.https.onRequest((request, response) => {
var data = {
name: 'Los Angeles',
state: 'CA',
country: 'USA'
};
// Add a new document in collection "cities" with ID 'DC'
var db = admin.firestore();
var setDoc = db.collection('cities').doc('LA').set(data);
response.status(200).send();
});
Firestore has limits.
Probably “Deadline Exceeded” happens because of its limits.
See this. https://firebase.google.com/docs/firestore/quotas
Maximum write rate to a document 1 per second
https://groups.google.com/forum/#!msg/google-cloud-firestore-discuss/tGaZpTWQ7tQ/NdaDGRAzBgAJ
In my own experience, this problem can also happen when you try to write documents using a bad internet connection.
I use a solution similar to Jurgen's suggestion to insert documents in batch smaller than 500 at once, and this error appears if I'm using a not so stable wifi connection. When I plug in the cable, the same script with the same data runs without errors.
I have written this little script which uses batch writes (max 500) and only write one batch after the other.
use it by first creating a batchWorker let batch: any = new FbBatchWorker(db);
Then add anything to the worker batch.set(ref.doc(docId), MyObject);. And finish it via batch.commit().
The api is the same as for the normal Firestore Batch (https://firebase.google.com/docs/firestore/manage-data/transactions#batched-writes) However, currently it only supports set.
import { firestore } from "firebase-admin";
class FBWorker {
callback: Function;
constructor(callback: Function) {
this.callback = callback;
}
work(data: {
type: "SET" | "DELETE";
ref: FirebaseFirestore.DocumentReference;
data?: any;
options?: FirebaseFirestore.SetOptions;
}) {
if (data.type === "SET") {
// tslint:disable-next-line: no-floating-promises
data.ref.set(data.data, data.options).then(() => {
this.callback();
});
} else if (data.type === "DELETE") {
// tslint:disable-next-line: no-floating-promises
data.ref.delete().then(() => {
this.callback();
});
} else {
this.callback();
}
}
}
export class FbBatchWorker {
db: firestore.Firestore;
batchList2: {
type: "SET" | "DELETE";
ref: FirebaseFirestore.DocumentReference;
data?: any;
options?: FirebaseFirestore.SetOptions;
}[] = [];
elemCount: number = 0;
private _maxBatchSize: number = 490;
public get maxBatchSize(): number {
return this._maxBatchSize;
}
public set maxBatchSize(size: number) {
if (size < 1) {
throw new Error("Size must be positive");
}
if (size > 490) {
throw new Error("Size must not be larger then 490");
}
this._maxBatchSize = size;
}
constructor(db: firestore.Firestore) {
this.db = db;
}
async commit(): Promise<any> {
const workerProms: Promise<any>[] = [];
const maxWorker = this.batchList2.length > this.maxBatchSize ? this.maxBatchSize : this.batchList2.length;
for (let w = 0; w < maxWorker; w++) {
workerProms.push(
new Promise((resolve) => {
const A = new FBWorker(() => {
if (this.batchList2.length > 0) {
A.work(this.batchList2.pop());
} else {
resolve();
}
});
// tslint:disable-next-line: no-floating-promises
A.work(this.batchList2.pop());
}),
);
}
return Promise.all(workerProms);
}
set(dbref: FirebaseFirestore.DocumentReference, data: any, options?: FirebaseFirestore.SetOptions): void {
this.batchList2.push({
type: "SET",
ref: dbref,
data,
options,
});
}
delete(dbref: FirebaseFirestore.DocumentReference) {
this.batchList2.push({
type: "DELETE",
ref: dbref,
});
}
}
I tested this, by having 15 concurrent AWS Lambda functions writing 10,000 requests into the database into different collections / documents milliseconds part. I did not get the DEADLINE_EXCEEDED error.
Please see the documentation on firebase.
'deadline-exceeded': Deadline expired before operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully. For example, a successful response from a server could have been delayed long enough for the deadline to expire.
In our case we are writing a small amount of data and it works most of the time but loosing data is unacceptable. I have not concluded why Firestore fails to write in simple small bits of data.
SOLUTION:
I am using an AWS Lambda function that uses an SQS event trigger.
# This function receives requests from the queue and handles them
# by persisting the survey answers for the respective users.
QuizAnswerQueueReceiver:
handler: app/lambdas/quizAnswerQueueReceiver.handler
timeout: 180 # The SQS visibility timeout should always be greater than the Lambda function’s timeout.
reservedConcurrency: 1 # optional, reserved concurrency limit for this function. By default, AWS uses account concurrency limit
events:
- sqs:
batchSize: 10 # Wait for 10 messages before processing.
maximumBatchingWindow: 60 # The maximum amount of time in seconds to gather records before invoking the function
arn:
Fn::GetAtt:
- SurveyAnswerReceiverQueue
- Arn
environment:
NODE_ENV: ${self:custom.myStage}
I am using a dead letter queue connected to my main queue for failed events.
Resources:
QuizAnswerReceiverQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: ${self:provider.environment.QUIZ_ANSWER_RECEIVER_QUEUE}
# VisibilityTimeout MUST be greater than the lambda functions timeout https://lumigo.io/blog/sqs-and-lambda-the-missing-guide-on-failure-modes/
# The length of time during which a message will be unavailable after a message is delivered from the queue.
# This blocks other components from receiving the same message and gives the initial component time to process and delete the message from the queue.
VisibilityTimeout: 900 # The SQS visibility timeout should always be greater than the Lambda function’s timeout.
# The number of seconds that Amazon SQS retains a message. You can specify an integer value from 60 seconds (1 minute) to 1,209,600 seconds (14 days).
MessageRetentionPeriod: 345600 # The number of seconds that Amazon SQS retains a message.
RedrivePolicy:
deadLetterTargetArn:
"Fn::GetAtt":
- QuizAnswerReceiverQueueDLQ
- Arn
maxReceiveCount: 5 # The number of times a message is delivered to the source queue before being moved to the dead-letter queue.
QuizAnswerReceiverQueueDLQ:
Type: "AWS::SQS::Queue"
Properties:
QueueName: "${self:provider.environment.QUIZ_ANSWER_RECEIVER_QUEUE}DLQ"
MessageRetentionPeriod: 1209600 # 14 days in seconds
If the error is generate after around 10 seconds, probably it's not your internet connetion, it might be that your functions are not returning any promise. In my experience I got the error simply because I had wrapped a firebase set operation(which returns a promise) inside another promise.
You can do this
return db.collection("COL_NAME").doc("DOC_NAME").set(attribs).then(ref => {
var SuccessResponse = {
"code": "200"
}
var resp = JSON.stringify(SuccessResponse);
return resp;
}).catch(err => {
console.log('Quiz Error OCCURED ', err);
var FailureResponse = {
"code": "400",
}
var resp = JSON.stringify(FailureResponse);
return resp;
});
instead of
return new Promise((resolve,reject)=>{
db.collection("COL_NAME").doc("DOC_NAME").set(attribs).then(ref => {
var SuccessResponse = {
"code": "200"
}
var resp = JSON.stringify(SuccessResponse);
return resp;
}).catch(err => {
console.log('Quiz Error OCCURED ', err);
var FailureResponse = {
"code": "400",
}
var resp = JSON.stringify(FailureResponse);
return resp;
});
});
I'm developing an app with the following node.js stack: Express/Socket.IO + React. In React I have DataTables, wherein you can search and with every keystroke the data gets dynamically updated! :)
I use Socket.IO for data-fetching, so on every keystroke the client socket emits some parameters and the server calls then the callback to return data. This works like a charm, but it is not garanteed that the returned data comes back in the same order as the client sent it.
To simulate: So when I type in 'a', the server responds with this same 'a' and so for every character.
I found the async module for node.js and tried to use the queue to return tasks in the same order it received it. For simplicity I delayed the second incoming task with setTimeout to simulate a slow performing database-query:
Declaration:
const async = require('async');
var queue = async.queue(function(task, callback) {
if(task.count == 1) {
setTimeout(function() {
callback();
}, 3000);
} else {
callback();
}
}, 10);
Usage:
socket.on('result', function(data, fn) {
var filter = data.filter;
if(filter.length === 1) { // TEST SYNCHRONOUSLY
queue.push({name: filter, count: 1}, function(err) {
fn(filter);
// console.log('finished processing slow');
});
} else {
// add some items to the queue
queue.push({name: filter, count: filter.length}, function(err) {
fn(data.filter);
// console.log('finished processing fast');
});
}
});
But the way I receive it in the client console, when I search for abc is as follows:
ab -> abc -> a(after 3 sec)
I want it to return it like this: a(after 3sec) -> ab -> abc
My thought is that the queue runs the setTimeout and then goes further and eventually the setTimeout gets fired somewhere on the event loop later on. This resulting in returning later search filters earlier then the slow performing one.
How can i solve this problem?
First a few comments, which might help clear up your understanding of async calls:
Using "timeout" to try and align async calls is a bad idea, that is not the idea about async calls. You will never know how long an async call will take, so you can never set the appropriate timeout.
I believe you are misunderstanding the usage of queue from async library you described. The documentation for the queue can be found here.
Copy pasting the documentation in here, in-case things are changed or down:
Creates a queue object with the specified concurrency. Tasks added to the queue are processed in parallel (up to the concurrency limit). If all workers are in progress, the task is queued until one becomes available. Once a worker completes a task, that task's callback is called.
The above means that the queue can simply be used to priorities the async task a given worker can perform. The different async tasks can still be finished at different times.
Potential solutions
There are a few solutions to your problem, depending on your requirements.
You can only send one async call at a time and wait for the first one to finish before sending the next one
You store the results and only display the results to the user when all calls have finished
You disregard all calls except for the latest async call
In your case I would pick solution 3 as your are searching for something. Why would you use care about the results for "a" if they are already searching for "abc" before they get the response for "a"?
This can be done by giving each request a timestamp and then sort based on the timestamp taking the latest.
SOLUTION:
Server:
exports = module.exports = function(io){
io.sockets.on('connection', function (socket) {
socket.on('result', function(data, fn) {
var filter = data.filter;
var counter = data.counter;
if(filter.length === 1 || filter.length === 5) { // TEST SYNCHRONOUSLY
setTimeout(function() {
fn({ filter: filter, counter: counter}); // return to client
}, 3000);
} else {
fn({ filter: filter, counter: counter}); // return to client
}
});
});
}
Client:
export class FilterableDataTable extends Component {
constructor(props) {
super();
this.state = {
endpoint: "http://localhost:3001",
filters: {},
counter: 0
};
this.onLazyLoad = this.onLazyLoad.bind(this);
}
onLazyLoad(event) {
var offset = event.first;
if(offset === null) {
offset = 0;
}
var filter = ''; // filter is the search character
if(event.filters.result2 != undefined) {
filter = event.filters.result2.value;
}
var returnedData = null;
this.state.counter++;
this.socket.emit('result', {
offset: offset,
limit: 20,
filter: filter,
counter: this.state.counter
}, function(data) {
returnedData = data;
console.log(returnedData);
if(returnedData.counter === this.state.counter) {
console.log('DATA: ' + JSON.stringify(returnedData));
}
}
This however does send unneeded data to the client, which in return ignores it. Somebody any idea's for further optimizing this kind of communication? For example a method to keep old data at the server and only send the latest?
exports.updateFullCentralRecordSheet = function (req, _id, type) {
FullCentralRecordSheet.remove({_ExternalParty: _id, centralRecordType: type, centralSheetType: "Central Sheet"}, function (err) {
if (err) {
saveErrorLog(req, err);
}
let query = {"structure.externalPartyRelationships": {$elemMatch: {_ExternalParty: _id}}, disabled: {$mod: [2, 0]}, initialized: true, profitLossType: type};
let fullCentralRecordSheetObjects = [];
ProfitLossSheet.find(query).sort({profitLossDate: 1}).lean().exec(function (err, profitLossSheetObjects) {
if (err) {
saveErrorLog(req, err);
}
async.each(profitLossSheetObjects, function (profitLossSheetObject, callback) {
/// HEAVY COMPUTATION HERE
callback();
});
}, function (err) {
if (err) {
saveErrorLog(req, err);
} else {
query = {centralRecordMode: {$in: ["Payment In", "Payment Out", "Transfer", "General Out"]}, disabled: {$mod: [2, 0]}, centralRecordType: {$in: ["Split", type]}, _ExternalParty: _id, status: {$ne: "Reject"}};
CentralRecordSheet.find(query).lean().exec(function (err, centralRecordSheetObjects) {
if (err) {
saveErrorLog(req, err);
}
_.each(centralRecordSheetObjects, function (centralRecordSheetObject) {
// SOME MORE PROCESSING
});
fullCentralRecordSheetObjects = _.sortBy(fullCentralRecordSheetObjects, function (fullCentralRecordSheetObject) {
return new Date(fullCentralRecordSheetObject.centralRecordDate).getTime();
});
let runningBalance = 0;
_.each(fullCentralRecordSheetObjects, function (fullCentralRecordSheetObject) {
runningBalance = runningBalance - fullCentralRecordSheetObject.paymentIn.total + fullCentralRecordSheetObject.paymentOut.total + fullCentralRecordSheetObject.moneyIn.total - fullCentralRecordSheetObject.moneyOut.total + fullCentralRecordSheetObject.transferIn.total - fullCentralRecordSheetObject.transferOut.total;
fullCentralRecordSheetObject.balance = runningBalance;
const newFullCentralSheetRecordObject = new FullCentralRecordSheet(fullCentralRecordSheetObject);
newFullCentralSheetRecordObject.save(); // Asynchronous save
});
});
}
});
});
});
};
This is my code to process some data and save it to database. As you can see there is some computation involves in each async loop and after the loop there is final processing of data. It works fine if I pass in one _id at a time. However when I try to do the task like this
exports.refreshFullCentralRecordSheetObjects = function (req, next) {
ExternalParty.find().exec(function (err, externalPartyObjects) {
if (err) {
utils.saveErrorLog(req, err);
return next(err, null, [req.__(err.message)], []);
}
_.each(externalPartyObjects, function (externalPartyObject) {
updateFullCentralRecordSheet(req, externalPartyObject._id, "Malay");
updateFullCentralRecordSheet(req, externalPartyObject._id, "Thai");
})
return next(err, null, ["Ddd"], ["Ddd"]);
});
};
I have about 273 objects to loop through. This cause the memory fatal error. I tried to increase --max-old-space-size=16000 but it is still crashing. I used task manager to track the memory of node.exe process and it goes over 8 GB.
I am not sure why increasing memory to 16GB does not help, it is still crashing around 8GB (according to task manager). Another thing is when I try to only process 10 records instead of 273, task manager report that it is using about 500 MB. This 500 MB will not disappear unless I make another request to the server. I find this very odd because why isn't NodeJS garbage collect after it is done with processing 10 records? Those 10 records successfully processed and saved to database however the memory usage remain unchanged in task manager.
I tried using async.forEachLimit, turning my update function to be asynchronous, play around with process.nextTick() but I still have fatal error memory problem. What can I do to make sure this runs?
Another thing is when I try to only process 10 records instead of 273,
task manager report that it is using about 500 MB. This 500 MB will
not disappear unless I make another request to the server. I find this
very odd because why isn't NodeJS garbage collect after it is done
with processing 10 records? Those 10 records successfully processed
and saved to database however the memory usage remain unchanged in
task manager.
That's normal, node GC is lazy (GC is a synchronous operation, blocking the loop, so that's a good thing).
Try to paginate the query ?
I have an application where a database query returns a number of rows (typically, less than 100). For each row, I need to make an http call to get supplemental data. I'd like to fire off all of the requests, and then when the last callback completes, move on to rendering the result page.
So far, the answers to similar questions I've looked at have either been chaining the requests by making request #2 in the callback for request #1 (advantages: simple, avoids burying the server in multiple requests), or by firing all of the requests with no tracking of whether all of the requests have completed (works well in the browser where the callback updates the UI).
My current plan is to keep a counter of requests made and have the callback decrement the counter; if it reaches zero, I can call the render function. I may also need to handle the case where responses come in faster than requests are being made (not likely, but a possible edge case).
Are there are other useful patterns for this type of problem?
When using async code could roughly look like this:
var async = require('async');
results = [];
var queue = async.queue(function(row, callback) {
http.fetchResultForRow(row, function(data){
result.push(data);
callback();
});
}, 1);
queue.drain = function() {
console.log("All results loaded");
renderEverything(results);
}
database.fetch(function(rows) {
for (var i=0; i < rows.length; i++) {
queue.push(rows[i]);
}
});
If the order does not matter you also could use: map
Look around in the documenation of async, there are a lot of useful patterns.
You can implement this quite nicely with promises using the when library. Though if you want to rate limit the calls to getting the extra info you will need to do a little more work than in the async approach of TheHippo I think.
Here's an example:
when = require('when')
// This is the function that gets the extra info.
// I've added a setTimeout to show how it is async.
function get_extra_info_for_row(x, callback) {
setTimeout( function(){ return callback(null, x+10); }, 1 );
};
rows = [1,2,3,4,5];
row_promises = rows.map(
function(x) {
var defered = when.defer()
get_extra_info_for_row(x, function(err,extra_info) {
if(err) return defered.reject(err);
defered.resolve([x,extra_info]);
});
return defered.promise;
})
when.all( row_promises )
.then(
function(augmented_rows) { console.log( augmented_rows ); },
function(err) { console.log("Error", err ); }
);
This outputs
[ [ 1, 11 ], [ 2, 12 ], [ 3, 13 ], [ 4, 14 ], [ 5, 15 ] ]