Async task in firebase cloud function - node.js

I want to implement a function a by firebase cloud function.
In this function, when it received a request from client, I save the id and the score of client to a dictionary. In normally, I will save this data to firestore and response to client, but I want to save the writing operation to firestore, so I will update the data of client in dictionary and response to client. Checking every 15 minitues, I will save all data in dictionary to firestore and reset it.
This is the main flow I wrote:
var dic = {}; //save the temp data of client
var timeNow = new Date().getTime(); //get Time when function was deployed
exports.testSucCollectionQueryFunction = functions.https.onRequest( (request, response) => {
getDataInfoFromRequest(); //id, and score
dic[id] = score; //update data info
response.send(result); // only response that cloud had received request
if (currentTime - timeNow > 15 minutes) {
saveDatainDicToFireStore();
dic = {}; //reset Dictionary
}
}
I tested with the small concurrent connection from client (<5), it was still ok. But I dont know what happen with the 1000requests/second.
So, if Can you help me any ideas to understand about this problem?
Thanks for your help

So there are a few issues here.
First, response.send(result); should be the last line of code in the function. Once you call response you are sending a signal to the cloud function that you're function is complete. While the code after response might run, it isn't guaranteed. I'm referring to your if statement:
if (currentTime - timeNow > 15 minutes) {
saveDatainDicToFireStore();
dic = {}; //reset Dictionary
}
Next, cloud functions whether they are Firebase, AWS Lambda, or Azure Functions should be stateless. Depending on your workload their might be multiple containers created by the Firebase Function system and you do not have control over which instance you are caching data into. Plus, within that 15 minute period those instances might be shutdown due to inactivity and your dictionary will be lost completely.
If I were you I would redesign your architecture. I would either write to a temp location in the Firestore database that will act as a "caching" mechanism. Then setup a separate HTTP function that you will call every 15 minutes to aggregate that cached data and save it to your intended location.
Or simply write the data to that specific location right way instead of caching it at all. Finally, one other alternative is writing it to a file and saving that file in the Firebase Cloud Storage bucket for later processing, and then use that separate HTTP function from earlier to aggregate the data and write it to the Firestore.

Related

How to have more then 30sec response timeout in heroku

Guys Heroku is terminating the req if the response takes more then 30sec to return, so is there any way I can wait for as long as the response would come back?
Well the user is uploading his file and I need to do something with the file in my server and after updates are done I will give a download link to the user. But mostly it takes more then 30 sec for the server to process the file so that the user need to wait for response
From the official Heroku Helpcenter : https://devcenter.heroku.com/articles/request-timeout
The timeout value is not configurable. If your server requires longer than 30 seconds to complete a given request, we recommend moving that work to a background task or worker to periodically ping your server to see if the processing request has been finished. This pattern frees your web processes up to do more work, and decreases overall application response times.
The short answer is : No, you can't change this configuration. I suggest you investigate why your application needs more than 30 seconds to process that request. If it takes longer than 10 seconds your really should consider the steps suggested in the Heroku Help Center 👆
Your Problem
You mention you need this for file processing. I understand that file processing could easily take longer than 30 seconds. Normally what I would do is to just create some sort of task reference and keep it in a database along with a status ("processing", "finished", "failed") - also store the original file and then just end the request of the user. This shouldn't take long. Then process the task ... with another endpoint or websocket connection the user could check if the task has been fullfilled.
Use a Task Queue
The following is just a basic interpretation of a solution - it's not meant for copy & pasting as it depends on so many things.
Routes (Endpoints)
Basically you need to have 3 routes in your backend. One for uploading the file, one for downloading the processed file and one for checking the status of the task.
1. Upload
app.post('/files', /* some middleware e.g. multer */, async (req, res) => {
// This is your upload controller
// I assume at this point the file has been uploaded and
// req.file contains a reference to the uploaded file.
// create new process task and add to queue
const task = await createNewTask(req.file);
queue.push(task);
// now a task has been created, but the user
// doesn't need to wait for it to finish
// so let's end the request here.
return req.status(200).json(task);
});
2. Check Status
app.get('/task/:id', async (req, res) => {
// From uploading a file in the first step, you'll
// get back a task id. Use the task id to check on
// the status.
const task = await getTask(req.params.id);
if (!task) {
return res.status(404).end();
} else {
return res.status(200).json(task);
}
});
The task can include informations like status, progress percentage, original filename, new filename or even a download link to the processed file once it's finished. Status could be something like pending, processing, finished or failed.
3. Download
app.get('/file/:filename', (req, res) => {
return req.status(200)
.sendFile('./path/to/file/' + req.params.filename);
});
Notes
It might be a good idea to rename the incoming files with a random id like a uuid. So it's easier to work with them in the automation process. Also the random id could be used for the task id at the same time.
It's up to you how big you want to go with this. For the task queue there are many different libraries to help you out with it. It could be an in-memory queue or one that's backed with a database.

How to handle Firebase Cloud Functions infinite loops?

I have a Firebase Cloud functions which is triggered by an update to some data in a Firebase Realtime Database. When the data is updated, I want to read the data, perform some calculations on that data, and then save the results of the calculations back to the Realtime Database. It looks like this:
exports.onUpdate = functions.database.ref("/some/path").onUpdate((change) => {
const values = change.after.val();
const newValues = performCalculations(value);
return change.after.ref.update(newValues);
});
My concern is that this may create an indefinite loop of updates. I saw a note on the Cloud Firestore Triggers that says:
"Any time you write to the same document that triggered a function,
you are at risk of creating an infinite loop. Use caution and ensure
that you safely exit the function when no change is needed."
So my first question is: Does this same problem apply to the Firebase Realtime Database?
If it does, what is the best way to prevent the infinite looping?
Should I be comparing before/after snapshots, the key/value pairs, etc.?
My idea so far:
exports.onUpdate = functions.database.ref("/some/path").onUpdate((change) => {
// Get old values
const beforeValues = change.before.val();
// Get current values
const afterValues = change.after.val();
// Something like this???
if (beforeValues === afterValues) return null;
const newValues = performCalculations(afterValues);
return change.after.ref.update(newValues);
});
Thanks
Does this same problem apply to the Firebase Realtime Database?
Yes, the chance of infinite loops occurs whenever you write back to the same location that triggered your Cloud Function to run, no matter what trigger type was used.
To prevent an infinite loop, you have to detect its condition in the code. You can:
either flag the node/document after processing it by writing a value into it, and check for that flag at the start of the Cloud Function.
or you can detect whether the Cloud Function code made any effective change/improvement to the data, and not write it back to the database when there was no change/improvement.
Either of these can work, and which one to use depends on your use-case. Your if (beforeValues === afterValues) return null is a form of the second approach, and can indeed work - but that depends on details about the data that you haven't shared.

Tracking currently active users in node.js

I am building an application using node.js and socket.io. I would like to create a table of users who are actively browsing the site at any given moment, which will update dynamically.
I am setting a cookie to give each browser a unique ID, and have a mysql database of all users (whether online or not); however, I'm not sure how best to use these two pieces of information to determine who is, and who isn't, actively browsing right now.
The simplest way would seem to be to store the cookie & socket IDs in an array, but I have read that global variables (which presumably this would have to be) are generally bad, and to be avoided.
Alternatively I could create a new database table, where IDs are inserted and deleted when a socket connects/disconnects; but I'm not sure whether this would be overkill.
Is one of these methods any better than the other, or is there a way of tracking this information which I haven't thought of yet?
You can keep track of active users in memory without it being a global variable. It can simply be a module level variable. This is one of the advantages of the nodejs module system.
The reasons to put it in a database instead of memory are:
You have multiple servers so you need a centralized place to put the data
You want the data stored persistently so if the server is restarted (normally or abnormally) you will have the recent data
The reasons for not putting it directly in a database:
It's a significant load of new database operations since you have to update the data on every single incoming request.
You can sometimes get the persistence without directly using a database by logging the access to a log file and then running chron jobs that parse the logs and do bulk addition of data to the database. This has a downside in that it's not as easy to query live data (since the most recent data is sitting in databases and hasn't been parsed yet).
For an in-memory store, you could do something like this:
// middleware that keeps track of user access
let userAccessMap = new Map();
app.use((req, res, next) => {
// get userId from the cookie (substitute your own cookie logic here)
let id = id: req.cookie.userID;
let lastAccess = Date.now();
// if you want to keep track of more than just lastAccess,
// you can store an object of data here instead of just the lastAccess time
// To update it, you would get the previous object, update some properties
// in it, and then set it back in the userAccessMap
userAccessMap.set(id, lastAccess);
next();
});
// routinely clean up the userAccessMap to remove old access times
// so it doesn't just grow forever
const cleanupFrequency = 30 * 60 * 1000; // run cleanup every 30 minutes
const cleanupTarget = 24 * 60 * 60 * 1000; // clean out users who haven't been here in the last day
setInterval(() => {
let now = Date.now();
for (let [id, lastAccess] of userAccessMap.entries()) {
if (now - lastAccess > cleanupTarget) {
// delete users who haven't been here in a long time
userAccessMap.delete(id);
}
}
}, cleanupFrequncy);
// Then, create some sort of adminstrative interface (probably with some sort of access protection)
// that gives you access to the user access info
// This might even be available in a separate web server on a separate port that isn't open to the general publoic
app.get("/userAccessData", (req, res) => {
// perhaps convert this to a human readable user name by looking up the user id
// also may want to sort the data by recentAccess
res.json(Array.from(userAccessMap));
});

How can aws api gateway listen to 2 lambda functions?

My design is that api will trigger first lambda function, this function then send sns and return, sns triggers second lambda function. Now I want that api get the response from the second lambda function.
Here is the flow:
The api get the request from the user and then trigger the first lambda function, the first lambda function creates a sns and return. Now the api is at the lambda function stage and still waiting for the response from the second lambda. sns triggers the second lambda function; the second lambda function return some result and pass it to the api. api gets the response and send it back to user.
I know there is a way using sdk to get the second lambda function and set event type to make it async. But here I want to use sns, is it possible?
Need some help/advices. Thanks in advance!
You need something to share the lambda_func_2's return with lambda_func_1, the api gateway request context only return when you call callback on func1, you can not save or send the request contex to another lb_func.
My solution for this case is use Dynamodb (or every database) to share the f2's result.
F1 send data to sns, the date include a key like transactionID (uuid or timestamp). Then "wait" until F1 receive the result in table (ex: tbl_f2_result) and execute callback function with the result. Maybe query with transactionID until you receive data or only try 10 times (with time out 2s for one time, in worst case you will wait 20 seconds)
F2 has been trigged by SNS, do somthing with data include the transactionID then insert the result (success or not, error message ...) to result table(tbl_f2_result) with transactionID => result, callback finish F2.
transactionID is index key of table :D
You have to increase F1's timeout - Default is 6 seconds.
Of course you can. Lambda provides you a way to implement almost any arbitrary functionality that you want, whether it's inserting a record into your DynamoDB, reading an object from your S3 bucket, calculating the tax amount for the selected item on an e-commerce site, or simply calling an API.
Notice that here you don't need any event to call your api from the lambda, as simply you call the api directly.
As you are using Node, you can simply use an http request; something like this:
var options = {
host: YOUR_API_URL,
port: 80,
path: 'REST_API_END_POINT',
method: 'YOUR_HTTP_METHOD' //POST/GET/...
};
http.request(options, function(res) {
//Whatever you want to do with the reply...
}).end();
Below is what is possible for your problem, but requires polling.
API GTW
integration to --> Lambda1
Lambda1
create unique sha create a folder inside the bucket say s3://response-bucket/
Triggers SNS through sdk with payload having sha
Poll from the key s3://response-bucket/ ( with timeout set )
if result is placed then response is sent back from Lambda1 --> ApiGTW
if timeout then error is returned.
If success then trigger SNS for cleanup of response data in bucket with payload being SHA which will be cleaned up by another lambda.
SNS
Now the payload with the SHA is there in SNS
Lambda2
SNS triggers lambda2
pull out the unique sha from the payload
lambda result is placed in same s3://response-bucket/
exit from lambda2

How to think asynchronously with nodejs?

I just started developing nodejs. I'm confused to use async model. I believe there is a way to turn most of SYNC use cases into ASYNC way. Example, by SYNC, we load some data and wait until it returns then show them to user; by ASYNC, we load data and return, just tell the user data will be presented later. I can understand why ASYNC is used in this scenario.
But here I have a use case. I'm building an web app, allowing user to place a order (buying something). Before saving the order data into db, I want to put some user data together with order data (I'm using document NoSql db by the way). So I think by SYNC, after I get order data, I make a SYNC call to database and wait for its returned user data. After I get returned data, integrate them together and ingest into db.
I think there might be an issue if I make ASYNC call to db to query user data because user data may be returned after I save data to db. And that's not what I want.
So in this case, how can I do this thing ASYNCHRONOUSLY?
Couple of things here. First, if your application already has the user data (the user is already logged in), then this information should be stored in session so you don't have to access the DB. If you are allowing the user to register at the time of purchase, you would simply want to pass a callback function that handles saving the order into your call that saves the user data. Without knowing specifically what your code looks like, something like this is what you would be looking for.
function saveOrder(userData, orderData, callback) {
// save the user data to the DB
db.save(userData, function(rec) {
// if you need to add the user ID or something to the order...
orderData.userId = rec.id; // this would be dependent on your DB of choice
// save the order data to the DB
db.save(orderData, callback);
});
}
Sync code goes something like this. step by step - one after other. There can be ifs and loops (for) etc. all of us get it.
fetchUserDataFromDB();
integrateOrderDataAndUserData();
updateOrderData();
Think of async programming with nodejs as event driven. Like UI programming - code (function) is executed when an event occurs. E.g. On click event - framework calls back registered clickHandler.
nodejs async programming can also be thought on these lines. When db query (async) execution completes, your callback is called. When order data is updated, your callback is called. The above code goes something like this:
function nodejsOrderHandler(req,res)
{
var orderData;
db.queryAsync(..., onqueryasync);
function onqueryasync(userdata)
{
// integrate user data with order data
db.update(updateParams, onorderudpate);
}
function onorderupdate(e, r)
{
// handler error
write response.
}
}
javascript closure provides the way to keep state in variables across functions.
There is certainly much more to async programming and there are helper modules that help with basic constructs like chain, parallel, join etc as you write more involved async code. but this probably gives you a quick idea.

Resources