In reference to this answer large number of promises of how to handle a high number of queries I have regrouped the queries as shown using the 'lodash' library which works for a low number of queries however firebase returns an error
#firebase/database: FIREBASE WARNING: Exception was thrown by user callback. RangeError: Maximum call stack size exceeded
Which I know means that the arrays have grown too large however, when I try running pure Javascript Promises with a 10 ms timer the code seems to hold up to 1,000,000 as shown in that answer. I am not sure if this is a firebase or a node.js issue but given that firebase real time database can store millions of records in a JSON tree there must be a better way to process so many promises. I have largely based the approach off of these three questions, this was the original problem Find element nodes contained in another node , this approach for checking the database which requires so many reads check if data exists in firebase, and this approach for speeding up the requests Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
I am not sure if I am performing all of these reads correctly especially since it is such a high volume, thank you.
exports.postMadeByFriend = functions.https.onCall(async (data,context) => {
const mainUserID = "hJwyTHpoxuMmcJvyR6ULbiVkqzH3";
const follwerID = "Rr3ePJc41CTytOB18puGl4LRN1R2"
const otherUserID = "q2f7RFwZFoMRjsvxx8k5ryNY3Pk2"
var promises = [];
console.log("start")
var refs = [];
for(var x = 0; x < 100000; x +=1){
if (x === 999){
const ref = admin.database().ref(`Followers`).child(mainUserID).child(follwerID)
refs.push(ref);
continue;
}
const ref = admin.database().ref(`Followers`).child(mainUserID).child(otherUserID);
refs.push(ref);
}
function runQuery(ref){
return ref.once('value');
}
const batches = _.chunk(refs, 10000);
refs = [];
const results = [];
while (batches.length) {
const batch = batches.shift();
const result = await Promise.all(batch.map(runQuery));
results.push(result)
}
_.flatten(results);
console.log("results: " + JSON.stringify(results));
})
Related
I have the following code where I am using MongoDB and Nodejs. I read data from collection, perform some arithmetic operations on the data and then update the document. My issue is that when multiple requests come at the same time it causes some data to be lost. How can I avoid this?
//Read the Document
const commissionRecord = await CommissionTable.getCommissionRecord(
publicKey
);
// Check if A record Exists or Not
if (commissionRecord.responseData.exists === true) {
// Assigning values to variables
commissionLimit =
commissionRecord.responseData.data.commissionLimit;
commission = commissionRecord.responseData.data.commission;
}
// Perform Arithmetic Operations
commissionLimit = parseInt(commissionLimit) + parseInt(amount);
if (commissionLimit >= 20) {
remainder = commissionLimit % 20;
commission =
parseInt(commission) + Math.floor(commissionLimit / 20);
commissionLimit = parseInt(remainder);
}
if (commissionRecord.responseData.exists === true) {
//Update the document
const result = await CommissionTable.updateCommissionNormal(
publicKey,
commission,
commissionLimit
);
if (result.success) {
return result;
}
The problem is that when all the requests come at the same time then they all read the data together and the updates are all made based on their read data. How to solve this situation?
To avoid race conditions you can use a mutex, Nodejs has a module async-mutex
https://www.npmjs.com/package/async-mutex
The Idea is to lock all processes to not run a specific function same time, all processes will work async until calling to this function, for this function will be created queue
Edit: Removing irrelevant code to improve readability
Edit 2: Reducing example to only uploadGameRound function and adding log output with times.
I'm working on a mobile multiplayer word game and was previously using the Firebase Realtime Database with fairly snappy performance apart from the cold starts. Saving an updated game and setting stats would take at most a few seconds. Recently I made the decision to switch to using Firestore for my game data and player stats / top lists, primarily because of the more advanced queries and the automatic scaling with no need for manual sharding. Now I've got things working on Firestore, but the time it takes to save an updated game and update a number of stats is just ridiculous. I'm clocking average between 3-4 minutes before the game is updated, stats added and everything is available in the database for other clients and viewable in the web interface. I'm guessing and hoping that this is because of something I've messed up in my implementation, but the transactions all go through and there are no warnings or anything else to go on really. Looking at the cloud functions log, the total time from function call to completion log statement appears to be a bit more than a minute, but that log doesn't appear until after same the 3-4 minute wait for the data.
Here's the code as it is. If someone has time to have a look and maybe spot what's wrong I'd be hugely grateful!
This function is called from Unity client:
exports.uploadGameRound = functions.https.onCall((roundUploadData, response) => {
console.log("UPLOADING GAME ROUND. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// CODE REMOVED FOR READABILITY. JUST PREPARING SOME VARIABLES TO USE BELOW. NOTHING HEAVY, NO DATABASE TRANSACTIONS. //
// Get a new write batch
const batch = firestoreDatabase.batch();
// Save game info to activeGamesInfo
var gameInfoRef = firestoreDatabase.collection('activeGamesInfo').doc(gameId);
batch.set(gameInfoRef, gameInfo);
// Save game data to activeGamesData
const gameDataRef = firestoreDatabase.collection('activeGamesData').doc(gameId);
batch.set(gameDataRef, { gameDataCompressed: updatedGameDataGzippedString });
if (foundWord !== undefined && foundWord !== null) {
const wordId = foundWord.timeStamp + "_" + foundWord.word;
// Save word to allFoundWords
const wordRef = firestoreDatabase.collection('allFoundWords').doc(wordId);
batch.set(wordRef, foundWord);
exports.incrementNumberOfTimesWordFound(gameInfo.language, foundWord.word);
}
console.log("COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// Commit the batch
batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
});
Again, any help with understanding this weird behaviour massively appreciated!
Ok, so I found the problem now and thought I should share it:
Simply adding a return statement before the batch commit fixed the function and reduced the time from 4 minutes to less than a second:
RETURN batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
Your function isn't returning a promise that resolves with the data to send to the client app. In the absence of a returned promise, it will return immediately, with no guarantee that any pending asynchronous work will terminate correctly.
Calling then on a single promise isn't enough to handle promises. You likely have lots of async work going on here, between commit() and other functions like incrementNumberOfTimesWordFound. You will need to handle all of the promises correctly, and make sure your overall function returns only a single promise that resolves when all that work is complete.
I strongly suggest taking some time to learn how promises work in JavaScript - this is crucial to writing effective functions. Without a full understanding, things will appear to go wrong, or not at all, in strange ways.
My Firestore contains 17500 documents.
It's a list of tokens, in order to send push notifications.
I stock these data in a dictionary, to be able to use them later:
users = {"fr":[token, token], "en":[token, token]....}
My code:
async function getAllUsers() {
const snapshot = await admin.firestore().collection('users').get();
var users= {};
snapshot.forEach(doc => {
const userId = doc.id;
var lang = doc.data().language
if (!(lang in users)) {
users[lang] = [];
users[lang].push(doc.data().token);
} else {
users[lang].push(doc.data().token);
}
});
return users;
}
My code doesn't work anymore. I get a timeout during the foreach loop.
Is it because I have too many documents?
Any idea?
Thanks
It's not clear from your question what exactly is timing out, but there are a couple things you should be aware of.
You certainly can get errors if you attempt to read too many documents in one query. The alternative to this is to use pagination to read the documents in smaller batches so that you don't exceed any query limits.
By default Cloud Functions assumes a 60 second timeout on any function invocations. If you need more than that, you can increase the timeout, but you can only go up to 9 minutes. After that, you have to split your work up among multiple function invocations.
Here is the test code (in an express environment just because that's what I happen to be messing around with):
const fs = require('fs-extra');
const fsPromises = fs.promises;
const express = require('express');
const app = express();
const speedtest = async function (req, res, next) {
const useFsPromises = (req.params.promises == 'true');
const jsonFileName = './json/big-file.json';
const hrstart = process.hrtime();
if (useFsPromises) {
await fsPromises.readFile(jsonFileName);
} else {
fs.readFileSync(jsonFileName);
}
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
};
app.get('/speedtest/:promises', speedtest);
The big-file.json file is around 16 MB. Using node 12.18.4.
Typical results (varies quite a bit around these values, but the following are "typical"):
https://dev.mydomain.com/speedtest/false
time taken to read: 3.948152 ms
https://dev.mydomain.com/speedtest/true
time taken to read: 61.865763 ms
UPDATE to include two more variants... plain fs.readFile() and also a promisified version of this:
const fs = require('fs-extra');
const fsPromises = fs.promises;
const util = require('util');
const readFile = util.promisify(fs.readFile);
const express = require('express');
const app = express();
const speedtest = async function (req, res, next) {
const type = req.params.type;
const jsonFileName = './json/big-file.json';
const hrstart = process.hrtime();
if (type == 'readFileFsPromises') {
await fsPromises.readFile(jsonFileName);
} else if (type == 'readFileSync') {
fs.readFileSync(jsonFileName);
} else if (type == 'readFileAsync') {
return fs.readFile(jsonFileName, function (err, jsondata) {
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
});
} else if (type == 'readFilePromisified') {
await readFile(jsonFileName);
}
res.send(`time taken to read: ${process.hrtime(hrstart)[1]/1000000} ms`);
};
app.get('/speedtest/:type', speedtest);
I am finding that the fsPromises.readFile() is the slowest, while the others are much faster and all roughly the same in terms of reading time. I should add that in a different example (which I can't fully verify so I'm not sure what was going on) the time difference was vastly bigger than reported here. Seems to me at present that fsPromises.readFile() should simply be avoided because there are other async/promise options.
After stepping through each implementation in the debugger (fs.readFileSync and fs.promises.readFile), I can confirm that the synchronous version reads the entire file in one large chunk (the size of the file). Whereas fs.promises.readFile() reads 16,384 bytes at a time in a loop, with an await on each read. This is going to make fs.promises.readFile() go back to the event loop multiple times before it can read the entire file. Besides giving other things a chance to run, it's extra overhead to go back to the event loop every cycle through a for loop. There's also memory management overhead because fs.promises.readFile() allocates a series of Buffer objects and then combines them all at the end, whereas fs.readFileSync() allocates one large Buffer object at the beginning and just reads the entire file into that one Buffer.
So, the synchronous version, which is allowed to hog the entire CPU, is just faster from a pure time to completion point of view (it's significantly less efficient from a CPU cycles used point of view in a multi-user server because it blocks the event loop from doing anything else during the read). The asynchronous version is reading in smaller chunks, probably to avoid blocking the event loop too much so other things can effectively interleave and run while fs.promises.readFile() is doing its thing.
For a project I worked on awhile ago, I wrote my own simple asynchronous version of readFile() that reads the entire file at once and it was significantly faster than the built-in implementation. I was not concerned about event loop blockage in that particular project so I did not investigate if that's an issue.
In addition, fs.readFile() reads the file in 524,288 byte chunks (much larger chunks that fs.promises.readFile()) and does not use await, using just plain callbacks. It is apparently just coded more optimally than the promise implementation. I don't know why they rewrote the function in a slower way for the fs.promises.readFile() implementation. For now, it appears that wrapping fs.readFile() with a promise would be faster.
My app can have a large amount of writes, reads and updates (can even go above 10000) under certain circumstances.
While developing the application locally, these operations usually take a few seconds at most (great!) however, it can easily take minutes when running the application on google cloud, to the point that the Firebase function times out.
I developed a controlled test in a separate project, whose sole purpose is to write, get and delete thousands of items for bench-marking. These were the results (averaged out from several tests):
Local Emulator:
5000 items, 4.2s write, 2.2s delete
5000 items, batch mode ON, 0.75s write, 0.11s delete
Cloud Firestore:
100 items, 15.8s write, 14.5s delete
1000 items, batch mode ON, 4.8s write, 3.0s delete
5000 items, async mode ON, 10.2s write, 8.0s delete
5000 items, batch & async ON, 4.5s write, 3.9s delete
NOTE: My local emulator crashes whenever I try to perform db operations async (which is a problem for another day) but it is why I was unable to test the write/delete speeds asynchronously locally. Also, write and read values usually vary +-25% between runs.
However, as you can see, the fact that my local emulator is faster in its slowest mode compared to the fastest test in the cloud definitely raises some questions.
Could it be that I have some sort of configuration issue? or is it just that these numbers are standard for firestore? Here is the (summarised) typescript code if you wish to try it:
functions.runWith({ timeoutSeconds: 540, memory: "2GB" }).https.onRequest(async (req, res) => {
//getting the settings from the request
var data = req.body;
var numWrites: number = data.numWrites;
var syncMode: boolean = !data.asyncMode;
var batchMode: boolean = data.batchMode;
var batchLimit: number = data.batchLimit;
//pre-run setup
var dbObj = {
number: 123,
string: "abc",
boolean: true,
object: { var1: "var1", num1: 1 },
array: [1, 2, 3, 4]
};
var collection = db.collection("testCollection");
var startTime = moment();
//insert requested number of items, using requested settings
var allInserts: Promise<any>[] = [];
if (!batchMode) { //sequential writes
for (var i = 0; i < numWrites; i++) {
var set = collection.doc().set(dbObj);
allInserts.push(set);
if (syncMode) await set;
}
} else { //batch writes
var batch = db.batch();
for (var i = 1; i <= numWrites; i++) {
batch.set(collection.doc(), dbObj);
if (i % batchLimit === 0) {
var commit = batch.commit();
allInserts.push(commit);
batch = db.batch();
if (syncMode) await commit;
}
}
}
//some logging information. Getting items to delete
var numInserts = allInserts.length;
await Promise.all(allInserts);
var insertTime = moment();
var alldocs = (await collection.get()).docs;
var numDocs = alldocs.length;
var getTime = moment();
//deletes all of the items in the collection
var allDeletes: Promise<any>[] = [];
if (!batchMode) { //sequential deletes
for (var doc of alldocs) {
var del = doc.ref.delete();
allDeletes.push(del);
if (syncMode) await del;
}
} else { //batch deletes
var batch = db.batch();
for (var i = 1; i <= numDocs; i++) {
var doc = alldocs[i - 1];
batch.delete(doc.ref);
if (i % batchLimit === 0) {
var commit = batch.commit();
allDeletes.push(commit);
batch = db.batch();
if (syncMode) await commit;
}
}
}
var numDeletes = allDeletes.length;
await Promise.all(allDeletes);
var deleteTime = moment();
res.status(200).send(/* a whole bunch of metrics for analysis */);
});
EDIT: just to clarify, the UI does not perform these write operations, so latency between the end-user machine and cloud servers should (in theory) not cause any major latency issues. The communication to the database is handled fully by Firebase Functions
EDIT 2: I have run this test on two deployments, one in Europe and another in US. Both took around the same amount of time to run, even though my ping to these two servers are vastly different
It is normal to have faster response with the local emulator than Cloud Firestore as the remote environment adds the network traffic that takes time.
For large amounts of operations from a single source the recommendation is to use batch operations as these will reduce the transcactions, and with it Round trips.
And the reason for the Async mode to be faster is that the caller is not waiting for each transaction to be completed before sending the next one So it also makes sense that the calls are faster with it.
The Times you have on the table seem normal to me.
Just as an additional thing to optimize make sure that the region where your firestore database is located is the closest one to your location.