How to extend time after max invalid attempt for Login (node-rate-limiter-flexible) - node.js

Basically i want to protect my login endpoint API from brute-force attack. The existing idea is when user consume max invalid attempt(suppose 5 retry) then i want to locked user and extend time for another each invalid attempt by 30 sec.
I am protecting that endpoint by node-rate-limiter-flexible package. (You can suggest best library for this)
const opts = {
points: 5, // 6 points
duration: 30, // Per second
};
const rateLimiter = new RateLimiterMemory(opts);
rateLimiter.consume(userid)
.then((rateLimiterRes) => {
// Login endpoint code
})
.catch((rateLimiterRes) => {
// Too many invalid attempts
});
Above code is working fine for max 5 invalid attempt and then blocked user for 30 second. But what i want to do that when user consumed max invalid attempt then for another each invalid attempt extend time by 30 sec. ((Means time will be gradually increase for each invalid attempt. maximum for 1 day). (Sorry for my ugly English)

Increase rateLimiterRes.msBeforeNext on 30 seconds every time userId blocked and use rateLimiter.block method to setup new duration.
rateLimiter.consume(userid)
.then((rateLimiterRes) => {
// Login endpoint code
})
.catch((rateLimiterRes) => {
const newBlockLifetimeSecs = Math.round(rateLimiterRes.msBeforeNext / 1000) + 30
rateLimiter.block(userid, newBlockLifetimeSecs)
.then(() => {
// Too many invalid attempts
})
.catch(() => {
// In case store limiter used (not in-memory)
})
});
There is also example of Fibonacci-like increasing of block duration on wiki

Related

How to handle adding up to 100K entries to Firestore database in a Node.js application

Here is my function where I am trying to save data extracted from an Excel file. I am using XLSX npm package to extract the data from Excel file.
function myFunction() {
const excelFilePath = "/ExcelFile2.xlsx"
if (fs.existsSync(path.join('uploads', excelFilePath))) {
const workbook = XLSX.readFile(`./uploads${excelFilePath}`)
const [firstSheetName] = workbook.SheetNames;
const worksheet = workbook.Sheets[firstSheetName];
const rows = XLSX.utils.sheet_to_json(worksheet, {
raw: false, // Use raw values (true) or formatted strings (false)
// header: 1, // Generate an array of arrays ("2D Array")
});
// res.send({rows})
const serviceAccount = require('./*******-d75****7a06.json');
admin.initializeApp({
credential: admin.credential.cert(serviceAccount)
});
const db = admin.firestore()
rows.forEach((value) => {
db.collection('users').doc().onSnapshot((snapShot) => {
docRef.set(value).then((respo) => {
console.log("Written")
})
.catch((reason) => {
console.log(reason.note)
})
})
})
console.log(rows.length)
}
Here is an error that I am getting and this process uses up all of my system memory:
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
It's pretty normal in Firebase/Firestore land to have errors like this when trying to add too much data at once.
Firebase Functions tend to time out, and even if you configure them to be able to run all the way to 9 minutes, they'll still time out eventually and you end up with partial data and/or errors.
Here's how I do things like this:
Write a function that writes 500 entries at a time (using batch write)
Use an entry identifier (let's call it userId), so the function knows which was the last user recorded to the database. Let's call it lastUserRecorded.
After each iteration (batch write of 500 entries), have your function record the value of lastUserRecorded inside a temporary document in the database.
When the function runs again, it should first read the value of lastUserRecorded in the db, then write a new batch of 500 users, starting AFTER that value. (it would select a new set of 500 users from your excel file, but start after the value of lastUserRecorded.
To avoid running into function timeout issues, I would schedule the function to run every minute (Cloud Scheduler trigger). This way, it's very highly likely that the function will be able to handle the batch of 500 writes, without timing out and recording partial data.
If you do it this way, 100k entries will take around 3 hours and 34 minutes to finish.

improve http request response time

I've created a nodejs script that make HTTP requests every 50ms, but it takes too long to receive response as request number grows.
how can I improve response time?
function makeRequest() {
superagent
.post('http://example.com')
.send({"test": "test"})
.set('Connection', 'keep-alive')
.then(console.log, console.log);
}
setInterval(() => makeRequest(), 50);
This is troublesome code. If your http request takes longer than 50ms to complete, then the number of active requests in flight will get larger and larger until eventually, you will consume too many system resources (sockets, memory, etc...). Things may get slower and slower or you may actually exhaust some resource and start to get errors or crash.
In addition, you don't want to be hitting the target server with thousands of simultaneous requests as it may also slow down under that type of load. This type of issue can also lead to an avalanche failure where a slight delay in the responsiveness of the response causes sudden build-up of requests which slows down the target server which leads to more build-up which quickly gets out of control and something dies. It's important to always code these types of things to avoid any sort of avalanche failure.
What I would suggest is making a new request some fixed number of ms from completion of the previous request (so there is only one request at a time in flight). Or a more complicated version would make a new request 50ms from when the previous one started, but not before the previous one finishes. This way, you'd only ever have one request in flight at a time and they would never build-up and accumulate and resource usage should stay fairly constant, not building over time, even if the target server gets slow for some reason.
Here's a way to make the next request after the completion of the previous request and no more often than once every 50ms:
function makeRequest() {
return superagent
.post('http://example.com')
.send({ "test": "test" })
.set('Connection', 'keep-alive');
}
function delay(t) {
return new Promise(resolve => {
setTimeout(resolve, t);
});
}
function run() {
const repeatTime = 50;
const startTime = Date.now();
return makeRequest().catch(err => {
console.log(err);
// decide here if you want to keep going or not
// if so, then just return
// if not, then throw
}).then(result => {
console.log(result);
let delta = Date.now() - startTime;
if (delta < repeatTime) {
// wait until at least repeatTime has passed before starting next request
return delay(repeatTime - delta).then(run);
} else {
return run();
}
}).catch(() => {
// aborted because of error
});
}
run();

Run a Cron Job every 30mins after onCreate Firestore event

I want to have a cron job/scheduler that will run every 30 minutes after an onCreate event occurs in Firestore. The cron job should trigger a cloud function that picks the documents created in the last 30 minutes-validates them against a json schema-and saves them in another collection.How do I achieve this,programmatically writing such a scheduler?
What would also be fail-safe mechanism and some sort of queuing/tracking the documents created before the cron job runs to push them to another collection.
Building a queue with Firestore is simple and fits perfectly for your use-case. The idea is to write tasks to a queue collection with a due date that will then be processed when being due.
Here's an example.
Whenever your initial onCreate event for your collection occurs, write a document with the following data to a tasks collection:
duedate: new Date() + 30 minutes
type: 'yourjob'
status: 'scheduled'
data: '...' // <-- put whatever data here you need to know when processing the task
Have a worker pick up available work regularly - e.g. every minute depending on your needs
// Define what happens on what task type
const workers: Workers = {
yourjob: (data) => db.collection('xyz').add({ foo: data }),
}
// The following needs to be scheduled
export const checkQueue = functions.https.onRequest(async (req, res) => {
// Consistent timestamp
const now = admin.firestore.Timestamp.now();
// Check which tasks are due
const query = db.collection('tasks').where('duedate', '<=', new Date()).where('status', '==', 'scheduled');
const tasks = await query.get();
// Process tasks and mark it in queue as done
tasks.forEach(snapshot => {
const { type, data } = snapshot.data();
console.info('Executing job for task ' + JSON.stringify(type) + ' with data ' + JSON.stringify(data));
const job = workers[type](data)
// Update task doc with status or error
.then(() => snapshot.ref.update({ status: 'complete' }))
.catch((err) => {
console.error('Error when executing worker', err);
return snapshot.ref.update({ status: 'error' });
});
jobs.push(job);
});
return Promise.all(jobs).then(() => {
res.send('ok');
return true;
}).catch((onError) => {
console.error('Error', onError);
});
});
You have different options to trigger the checking of the queue if there is a task that is due:
Using a http callable function as in the example above. This requires you to perform a http call to this function regularly so it executes and checks if there is a task to be done. Depending on your needs you could do it from an own server or use a service like cron-job.org to perform the calls. Note that the HTTP callable function will be available publicly and potentially, others could also call it. However, if you make your check code idempotent, it shouldn't be an issue.
Use the Firebase "internal" cron option that uses Cloud Scheduler internally. Using that you can directly trigger the queue checking:
export scheduledFunctionCrontab =
functions.pubsub.schedule('* * * * *').onRun((context) => {
console.log('This will be run every minute!');
// Include code from checkQueue here from above
});
Using such a queue also makes your system more robust - if something goes wrong in between, you will not loose tasks that would somehow only exist in memory but as long as they are not marked as processed, a fixed worker will pick them up and reprocess them. This of course depends on your implementation.
You can trigger a cloud function on the Firestore Create event which will schedule the Cloud Task after 30 minutes. This will have queuing and retrying mechanism.
An easy way is that you could add a created field with a timestamp, and then have a scheduled function run at a predefined period (say, once a minute) and execute certain code for all records where created >= NOW - 31 mins AND created <= NOW - 30 mins (pseudocode). If your time precision requirements are not extremely high, that should work for most cases.
If this doesn't suit your needs, you can add a Cloud Task (Google Cloud product). The details are specified in this good article.

How can the AWS Lambda concurrent execution limit be reached?

UPDATE
The original test code below is largely correct, but in NodeJS the various AWS services should be setup a bit differently as per the SDK link provided by #Michael-sqlbot
// manager
const AWS = require("aws-sdk")
const https = require('https');
const agent = new https.Agent({
maxSockets: 498 // workers hit this level; expect plus 1 for the manager instance
});
const lambda = new AWS.Lambda({
apiVersion: '2015-03-31',
region: 'us-east-2', // Initial concurrency burst limit = 500
httpOptions: { // <--- replace the default of 50 (https) by
agent: agent // <--- plugging the modified Agent into the service
}
})
// NOW begin the manager handler code
In planning for a new service, I am doing some preliminary stress testing. After reading about the 1,000 concurrent execution limit per account and the initial burst rate (which in us-east-2 is 500), I was expecting to achieve at least the 500 burst concurrent executions right away. The screenshot below of CloudWatch's Lambda metric shows otherwise. I cannot get past 51 concurrent executions no matter what mix of parameters I try. Here's the test code:
// worker
exports.handler = async (event) => {
// declare sleep promise
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
// return after one second
let nStart = new Date().getTime()
await sleep(1000)
return new Date().getTime() - nStart; // report the exact ms the sleep actually took
};
// manager
exports.handler = async(event) => {
const invokeWorker = async() => {
try {
let lambda = new AWS.Lambda() // NO! DO NOT DO THIS, SEE UPDATE ABOVE
var params = {
FunctionName: "worker-function",
InvocationType: "RequestResponse",
LogType: "None"
};
return await lambda.invoke(params).promise()
}
catch (error) {
console.log(error)
}
};
try {
let nStart = new Date().getTime()
let aPromises = []
// invoke workers
for (var i = 1; i <= 3000; i++) {
aPromises.push(invokeWorker())
}
// record time to complete spawning
let nSpawnMs = new Date().getTime() - nStart
// wait for the workers to ALL return
let aResponses = await Promise.all(aPromises)
// sum all the actual sleep times
const reducer = (accumulator, response) => { return accumulator + parseInt(response.Payload) };
let nTotalWorkMs = aResponses.reduce(reducer, 0)
// show me
let nTotalET = new Date().getTime() - nStart
return {
jobsCount: aResponses.length,
spawnCompletionMs: nSpawnMs,
spawnCompletionPct: `${Math.floor(nSpawnMs / nTotalET * 10000) / 100}%`,
totalElapsedMs: nTotalET,
totalWorkMs: nTotalWorkMs,
parallelRatio: Math.floor(nTotalET / nTotalWorkMs * 1000) / 1000
}
}
catch (error) {
console.log(error)
}
};
Response:
{
"jobsCount": 3000,
"spawnCompletionMs": 1879,
"spawnCompletionPct": "2.91%",
"totalElapsedMs": 64546,
"totalWorkMs": 3004205,
"parallelRatio": 0.021
}
Request ID:
"43f31584-238e-4af9-9c5d-95ccab22ae84"
Am I hitting a different limit that I have not mentioned? Is there a flaw in my test code? I was attempting to hit the limit here with 3,000 workers, but there was NO throttling encountered, which I guess is due to the Asynchronous invocation retry behaviour.
Edit: There is no VPC involved on either Lambda; the setting in the select input is "No VPC".
Edit: Showing Cloudwatch before and after the fix
There were a number of potential suspects, particularly due to the fact that you were invoking Lambda from Lambda, but your focus on consistently seeing a concurrency of 50 — a seemingly arbitrary limit (and a suspiciously round number) — reminded me that there's an anti-footgun lurking in the JavaScript SDK:
In Node.js, you can set the maximum number of connections per origin. If maxSockets is set, the low-level HTTP client queues requests and assigns them to sockets as they become available.
Here of course, "origin" means any unique combination of scheme + hostname, which in this case is the service endpoint for Lambda in us-east-2 that the SDK is connecting to in order to call the Invoke method, https://lambda.us-east-2.amazonaws.com.
This lets you set an upper bound on the number of concurrent requests to a given origin at a time. Lowering this value can reduce the number of throttling or timeout errors received. However, it can also increase memory usage because requests are queued until a socket becomes available.
...
When using the default of https, the SDK takes the maxSockets value from the globalAgent. If the maxSockets value is not defined or is Infinity, the SDK assumes a maxSockets value of 50.
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html
Lambda concurrency it not the only factor that decides how scalable your functions are. If your Lambda function is runnning within a VPC, it will require an ENI (Elastic Network Interface) which allows for ethernet traffic from and to the container (Lambda function).
It's possible your throttling occurred due to too many ENI's being requested (50 at a time). You can check this by viewing the logs of the Manager lambda function and looking for an error message when it's trying to invoke one of the child containers. If the error looks something like the following, you'll know ENI's is your issue.
Lambda was not able to create an ENI in the VPC of the Lambda function because the limit for Network Interfaces has been reached.

Azure Search .net SDK- How to use "FindFailedActionsToRetry"?

Using the Azure Search .net SDK, when you try to index documents you might get an exception IndexBatchException.
From the documentation here:
try
{
var batch = IndexBatch.Upload(documents);
indexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
// Sometimes when your Search service is under load, indexing will fail for some of the documents in
// the batch. Depending on your application, you can take compensating actions like delaying and
// retrying. For this simple demo, we just log the failed document keys and continue.
Console.WriteLine(
"Failed to index some of the documents: {0}",
String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key)));
}
How should e.FindFailedActionsToRetry be used to create a new batch to retry the indexing for failed actions?
I've created a function like this:
public void UploadDocuments<T>(SearchIndexClient searchIndexClient, IndexBatch<T> batch, int count) where T : class, IMyAppSearchDocument
{
try
{
searchIndexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
if (count == 5) //we will try to index 5 times and give up if it still doesn't work.
{
throw new Exception("IndexBatchException: Indexing Failed for some documents.");
}
Thread.Sleep(5000); //we got an error, wait 5 seconds and try again (in case it's an intermitent or network issue
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
UploadDocuments(searchIndexClient, retryBatch, count++);
}
}
But I think this part is wrong:
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
The second parameter to FindFailedActionsToRetry, named keySelector, is a function that should return whatever property on your model type represents your document key. In your example, your model type is not known at compile time inside UploadDocuments, so you'll need to change UploadsDocuments to also take the keySelector parameter and pass it through to FindFailedActionsToRetry. The caller of UploadDocuments would need to specify a lambda specific to type T. For example, if T is the sample Hotel class from the sample code in this article, the lambda must be hotel => hotel.HotelId since HotelId is the property of Hotel that is used as the document key.
Incidentally, the wait inside your catch block should not wait a constant amount of time. If your search service is under heavy load, waiting for a constant delay won't really help to give it time to recover. Instead, we recommend exponentially backing off (e.g. -- the first delay is 2 seconds, then 4 seconds, then 8 seconds, then 16 seconds, up to some maximum).
I've taken Bruce's recommendations in his answer and comment and implemented it using Polly.
Exponential backoff up to one minute, after which it retries every other minute.
Retry as long as there is progress. Timeout after 5 requests without any progress.
IndexBatchException is also thrown for unknown documents. I chose to ignore such non-transient failures since they are likely indicative of requests which are no longer relevant (e.g., removed document in separate request).
int curActionCount = work.Actions.Count();
int noProgressCount = 0;
await Polly.Policy
.Handle<IndexBatchException>() // One or more of the actions has failed.
.WaitAndRetryForeverAsync(
// Exponential backoff (2s, 4s, 8s, 16s, ...) and constant delay after 1 minute.
retryAttempt => TimeSpan.FromSeconds( Math.Min( Math.Pow( 2, retryAttempt ), 60 ) ),
(ex, _) =>
{
var batchEx = ex as IndexBatchException;
work = batchEx.FindFailedActionsToRetry( work, d => d.Id );
// Verify whether any progress was made.
int remainingActionCount = work.Actions.Count();
if ( remainingActionCount == curActionCount ) ++noProgressCount;
curActionCount = remainingActionCount;
} )
.ExecuteAsync( async () =>
{
// Limit retries if no progress is made after multiple requests.
if ( noProgressCount > 5 )
{
throw new TimeoutException( "Updating Azure search index timed out." );
}
// Only retry if the error is transient (determined by FindFailedActionsToRetry).
// IndexBatchException is also thrown for unknown document IDs;
// consider them outdated requests and ignore.
if ( curActionCount > 0 )
{
await _search.Documents.IndexAsync( work );
}
} );

Resources