Firebase Firestore transactions incredibly slow (3-4 minutes) - node.js

Edit: Removing irrelevant code to improve readability
Edit 2: Reducing example to only uploadGameRound function and adding log output with times.
I'm working on a mobile multiplayer word game and was previously using the Firebase Realtime Database with fairly snappy performance apart from the cold starts. Saving an updated game and setting stats would take at most a few seconds. Recently I made the decision to switch to using Firestore for my game data and player stats / top lists, primarily because of the more advanced queries and the automatic scaling with no need for manual sharding. Now I've got things working on Firestore, but the time it takes to save an updated game and update a number of stats is just ridiculous. I'm clocking average between 3-4 minutes before the game is updated, stats added and everything is available in the database for other clients and viewable in the web interface. I'm guessing and hoping that this is because of something I've messed up in my implementation, but the transactions all go through and there are no warnings or anything else to go on really. Looking at the cloud functions log, the total time from function call to completion log statement appears to be a bit more than a minute, but that log doesn't appear until after same the 3-4 minute wait for the data.
Here's the code as it is. If someone has time to have a look and maybe spot what's wrong I'd be hugely grateful!
This function is called from Unity client:
exports.uploadGameRound = functions.https.onCall((roundUploadData, response) => {
console.log("UPLOADING GAME ROUND. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// CODE REMOVED FOR READABILITY. JUST PREPARING SOME VARIABLES TO USE BELOW. NOTHING HEAVY, NO DATABASE TRANSACTIONS. //
// Get a new write batch
const batch = firestoreDatabase.batch();
// Save game info to activeGamesInfo
var gameInfoRef = firestoreDatabase.collection('activeGamesInfo').doc(gameId);
batch.set(gameInfoRef, gameInfo);
// Save game data to activeGamesData
const gameDataRef = firestoreDatabase.collection('activeGamesData').doc(gameId);
batch.set(gameDataRef, { gameDataCompressed: updatedGameDataGzippedString });
if (foundWord !== undefined && foundWord !== null) {
const wordId = foundWord.timeStamp + "_" + foundWord.word;
// Save word to allFoundWords
const wordRef = firestoreDatabase.collection('allFoundWords').doc(wordId);
batch.set(wordRef, foundWord);
exports.incrementNumberOfTimesWordFound(gameInfo.language, foundWord.word);
}
console.log("COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// Commit the batch
batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
});
Again, any help with understanding this weird behaviour massively appreciated!

Ok, so I found the problem now and thought I should share it:
Simply adding a return statement before the batch commit fixed the function and reduced the time from 4 minutes to less than a second:
RETURN batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});

Your function isn't returning a promise that resolves with the data to send to the client app. In the absence of a returned promise, it will return immediately, with no guarantee that any pending asynchronous work will terminate correctly.
Calling then on a single promise isn't enough to handle promises. You likely have lots of async work going on here, between commit() and other functions like incrementNumberOfTimesWordFound. You will need to handle all of the promises correctly, and make sure your overall function returns only a single promise that resolves when all that work is complete.
I strongly suggest taking some time to learn how promises work in JavaScript - this is crucial to writing effective functions. Without a full understanding, things will appear to go wrong, or not at all, in strange ways.

Related

Poor Performance Writing to Firebase Realtime Database from Google Cloud Function

I have been developing a game where on the user submitting data, the client writes some data to a Firebase Realtime Database. I then have a Google Cloud Function which is triggered onUpdate. That function checks the submissions from various players in a particular game and if certain criteria are met, the function writes an update to the DB which causes the clients to move to the next round of the game.
This is all working, however, I have found performance to be quite poor.
I've added logging to the function and can see that the function takes anywhere from 2-10ms to complete, which is acceptable. The issue is that the update is often written anywhere from 10-30 SECONDS after the function has returned the update.
To determine this, my function obtains the UTC epoch timestamp from just before writing the update, and stores this as a key with the value being the Firebase server timestamp.
I then have manually checked the two timestamps to arrive at the time between return and database write:
The strange thing is that I have another cloud function which is triggered by an HTTP request, and found that the update from that function is typically from 0.5-2 seconds after the function calls the DB update() API.
The difference between these two functions, aside from how they are triggered, is how the data is written back to the DB.
The onUpdate() function writes data by returning:
return after.ref.update(updateToWrite);
Whereas the HTTP request function writes data by calling the update API:
dbRef.update({
// object to write
});
I've provided a slightly stripped out version of my onUpdate function here (same structure but sanitised function names) :
exports.onGameUpdate = functions.database.ref("/games/{gameId}")
.onUpdate(async(snapshot, context) => {
console.time("onGameUpdate");
let id = context.params.gameId;
let after = snapshot.after;
let updatedSnapshot = snapshot.after.val();
if (updatedSnapshot.serverShouldProcess && !isFinished) {
processUpdate().then((res)=>{
// some logic to check res, and if criteria met, move on to promises to determine update data
determineDataToWrite().then((updateToWrite)=>{
// get current time
var d = new Date();
let triggerTime = d.getTime();
// I use triggerTime as the key, and firebase.database.ServerValue.TIMESTAMP as the value
if(updateToWrite["timestamps"] !== null && updateToWrite["timestamps"] !== undefined){
let timestampsCopy = updateToWrite["timestamps"];
timestampsCopy[triggerTime] = firebase.database.ServerValue.TIMESTAMP;
updateToWrite["timestamps"][triggerTime] = firebase.database.ServerValue.TIMESTAMP;
}else{
let timestampsObj = {};
timestampsObj[triggerTime] = firebase.database.ServerValue.TIMESTAMP;
updateToWrite["timestamps"] = timestampsObj;
}
// write the update to the database
return after.ref.update(updateToWrite);
}).catch((error)=>{
// error handling
})
}
// this is just here because ES Linter complains if there's no return
return null;
})
.catch((error) => {
// error handling
})
}
});
I'd appreciate any help! Thanks :)

Google Cloud Functions + Realtime Database extremely slow

I'm developing a multiplayer word game for mobile in Unity. I've been using GameSparks as a backend, but felt that it didn't give me the control I wanted over the project. After looking around a bit, I decided to go with Firebase. It felt like a good decision, but now I'm experiencing some serious slowness with the database + Cloud Functions
The game has a turn based mode where you play a round and get a notification when your opponent has finished. A round is uploaded via a cloudscript call, with only the necessary data. The new game state is then pieced together in the cloudscript and the game database entry (activeCasualGames/{gameId}) updated. After this is done, I update a gameInfo (activeCasualGamesInfo/{gameId}) entry with some basic info about the game, then send a cloud message to notify the opponent that it's their turn.
The data sent is only 32 kb, and no complex changes are made to the code (see cloud function code below). The complete database entry for the game would be around 100kb max at this point, yet the time from sending the round to the game being updated on server and opponent notified ranges from 50 seconds to more than a minute. On GameSparks, the same operation takes maybe a few seconds.
Does anyone else experience this slowness with Firebase? Have I made some mistake in the cloud script function, or is this the performance I should expect? The game also has a live mode, which I haven't started implementing in Firebase yet, and with this kind of lag, I'm not feeling that it's gonna work out too well
Many thanks in advance!
exports.uploadCasualGameRound = functions.https.onCall(
(roundUploadData, response) => {
const roundData = JSON.parse(roundUploadData);
const updatedGameData = JSON.parse(roundData.gameData);
const updatedGameInfo = JSON.parse(roundData.gameInfo);
console.log("game data object:");
console.log(updatedGameData);
console.log("game info:");
console.log(updatedGameInfo);
const gameId = updatedGameData.gameId;
console.log("updating game with id: " + gameId);
admin
.database()
.ref("/activeCasualGames/" + gameId)
.once("value")
.then(function(snapshot: { val: any }) {
let game = snapshot.val();
console.log("old game:");
console.log(game);
const isNewGame = (game as boolean) === true;
if (isNewGame === false) {
console.log("THIS IS AN EXISTING GAME!");
} else {
console.log("THIS IS A NEW GAME!");
}
// make the end state of the currently stored TurnBasedGameData the beginning state of the uploaded one (limits the upload of data and prevents timeout errors)
if (isNewGame === true) {
updatedGameData.gameStateRoundStart.playerOneRounds =
updatedGameData.gameStateRoundEnd.playerOneRounds;
} else {
// Player one rounds are not uploaded by player two, and vice versa. Compensate for this here:
if (updatedGameData.whoseTurn === updatedGameData.playerTwoId) {
updatedGameData.gameStateRoundEnd.playerTwoRounds =
game.gameStateRoundEnd.playerTwoRounds;
} else {
updatedGameData.gameStateRoundEnd.playerOneRounds =
game.gameStateRoundEnd.playerOneRounds;
}
updatedGameData.gameStateRoundStart = game.gameStateRoundEnd;
}
game = updatedGameData;
console.log("Game after update:");
console.log(game);
// game.lastRoundFinishedDate = Date.now();
game.finalRoundWatchedByOtherPlayer = false;
admin
.database()
.ref("/activeCasualGames/" + gameId)
.set(game)
.then(() => {
updatedGameInfo.roundUploaded = true;
return admin
.database()
.ref("/activeCasualGamesInfo/" + gameId)
.set(updatedGameInfo)
.then(() => {
exports.sendOpponentFinishedCasualGameRoundMessage(gameId, updatedGameInfo.lastPlayer, updatedGameInfo.whoseTurn);
return true;
});
});
});
}
);
There is some things you must to consider when using a Firebase Cloud Function + Realtime Database. This question may help about the serverless performance issues, such cold start. Also, I think the code itself may be refactored to make less calls to the realtime database, as every external request, it may take time. Some tips are to use more local-stored resources (in memory, browser cache, etc) and use some global variables.

Difficulty processing CSV file, browser timeout

I was asked to import a csv file from a server daily and parse the respective header to the appropriate fields in mongoose.
My first idea was to make it to run automatically with a scheduler using the cron module.
const CronJob = require('cron').CronJob;
const fs = require("fs");
const csv = require("fast-csv")
new CronJob('30 2 * * *', async function() {
await parseCSV();
this.stop();
}, function() {
this.start()
}, true);
Next, the parseCSV() function code is as follow:
(I have simplify some of the data)
function parseCSV() {
let buffer = [];
let stream = fs.createReadStream("data.csv");
csv.fromStream(stream, {headers:
[
"lot", "order", "cwotdt"
]
, trim:true})
.on("data", async (data) =>{
let data = { "order": data.order, "lot": data.lot, "date": data.cwotdt};
// Only add product that fulfill the following condition
if (data.cwotdt !== "000000"){
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
}
})
.on("end", function(){
db.Product.find({}, function(err, productAvailable){
// Check whether database exists or not
if(productAvailable.length !== 0){
// console.log("Database Exists");
// Add subsequent onward
db.Product.insertMany(buffer)
buffer = [];
} else{
// Add first time
db.Product.insertMany(buffer)
buffer = [];
}
})
});
}
It is not a problem if it's just a few line of rows in the csv file but just only reaching 2k rows, I encountered a problem. The culprit is due to the if condition checking when listening to the event handler on, it needs to check every single row to see whether the database contains the data already or not.
The reason I'm doing this is that the csv file will have new data added into it and I need to add all the data for the first time if database is empty or look into every single row and only add those new data into mongoose.
The 1st approach I did from here (as in the code),was using async/await to make sure that all the datas have been read before proceeding to the event handler end. This helps but I see from time to time (with mongoose.set("debug", true);), some data are being queried twice, which I have no idea why.
The 2nd approach was not to use the async/await feature, this has some downside since the data was not fully queried, it proceeded straight to the event handler end and then insertMany some of the datas which were able to get pushed into the buffer.
If i stick with the current approach, it is not an issue, but the query will take 1 to 2 minutes, not to mention even more if the database keeps growing. So, during those few minutes of querying, the event queue got blocked and therefore when sending request to the server, the server time out.
I used stream.pause() and stream.resume() before this code but I can't get it to work as it just jump straight to the end event handler first. This cause the buffer to be empty every single time since end event handler runs before the on event handler
I cant' remember the links that I used but the fundamentals that I got from is through this.
Import CSV Using Mongoose Schema
I saw these threads:
Insert a large csv file, 200'000 rows+, into MongoDB in NodeJS
Can't populate big chunk of data to mongodb using Node.js
to be similar to what I need but it's a bit too complicated for me to understand what is going on. Seems like using socket or a child process maybe? Furthermore, I still need to check conditions before adding into the buffer
Anyone care to guide me on this?
Edit: await is removed from console.log as it is not asynchronous
Forking a child process approach:
When web service got a request of csv data file save it somewhere in app
Fork a child process -> child process example
Pass the file url to the child_process to run the insert checks
When child process finish processing the csv file, delete the file
Like what Joe said, indexing the DB would speed up the processing time by a lot when there are lots(millions) of tuples.
If you create an index on order and lot. The query should be very fast.
db.Product.createIndex( { order: 1, lot: 1 }
Note: This is a compound index and may not be the ideal solution. Index strategies
Also, your await on console.log is weird. That may be causing your timing issues. console.log is not async. Additionally the function is not marked async
// removing await from console.log
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
I would try with removing the await on console.log (that may be a red herring if console.log is for stackoverflow and hiding the actual async method.) However, be sure to mark the function with async if that is the case.
Lastly, if the problem still exists. I may look into a 2 tiered approach.
Insert all lines from the CSV file into a mongo collection.
Process that mongo collection after the CSV has been parsed. Removing the CSV from the equation.

Getting data pushed to an array outside of a Promise

I'm using https://github.com/Haidy777/node-youtubeAPI-simplifier to grab some information from a playlist of Bounty Killers. The way, this library is setup seems to use Promise via Bluebird (https://github.com/petkaantonov/bluebird) which I don't know much about. Looking up the Beginner's Guide for BlueBird gives http://bluebirdjs.com/docs/beginners-guide.html which literally just shows
This article is partially or completely unfinished. You are welcome to create pull requests to help completing this article.
I am able to set up the library
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
As well as list some information about Bounty Killers
ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
for (var i = 0, len = data.length; i < len; i++) {
ytapi.videoFunctions.getDetailsForVideoIds([data[i].videoId])
.then(function (video) {
console.log(video);
// ytdata.push(video); <- Push a Bounty Killer Video
});
}
});
// console.log(ytdata); This gives []
Basically the above pulls the full playlist (normally there will be some pagination here depending on the length) then it takes the data from getVideosForPlaylist iterates the list and calls getDetailsForVideoIds for each YouTube video. All good here.
The issues arises with getting data out of this. I would like to push the video object to ytdata array and I'm unsure whether the empty array at the end is due to scoping or some out of sync such that console.log(ytdata) gets called before the API calls are finished.
How will I be able to get each Bounty Killer video into the ytdata array to be available globally?
console.log(ytdata) gets called before the API calls are finished
Spot on, that's exactly what's happening here, the API calls are async. Once you're using async functions, you must go the async way if you want to deal with the returned data. Your code could be written like this:
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
// this function return a promise you can "wait"
function getVideos() {
return ytapi.playlistFunctions
.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (videos) {
// extract all videoIds
var videoIds = videos.map(video => video.videoId);
// getDetailsForVideoIds is called with an array of videoIds
// and return a promise, one API call is enough
return ytapi.videoFunctions.getDetailsForVideoIds(videoIds);
});
}
getVideos().then(function (ydata) {
// this is the only place ydata is full of data
console.log(ydata);
});
I made use of ES6's arrow function in videos.map(video => video.videoId);, that should work if your nodejs is v4+.
console.log(ytdata) should be immediately AFTER your FOR loop. This data is NOT available until the promise is resolved and the FOR loop execution is complete and attempting to access it beforehand will give you an empty array.
(your current console.log is not working because that code is being executed immediately before the promise is resolved). Only code inside the THEN block is executed AFTER the promise is resolved.
If you NEED the data available NOW or ASAP and the requests for the videos is taking a long time then can you request 1 video at a time or on demand or on a separate thread (using a webworker maybe)? Can you implement caching?
Can you make the requests up front behind the scenes before the user even visits this page? (not sure this is a good idea but it is an idea)
Can you use video thumbnails (like youtube does) so that when the thumbnail is clicked then you start streaming and playing the video?
Some ideas ... Hope this helps
ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE VIDEO IDS HAVE BEEN RETRIEVED AND ARE AVAILABLE
// YOU COULD SAVE THESE TO A DATASTORE IF YOU WANT
for (var i = 0, len = data.length; i < len; i++) {
var videoIds = [data[i].videoId];
ytapi.videoFunctions.getDetailsForVideoIds(videoIds)
.then(function (video) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE DETAILS HAVE BEEN DOWNLOADED FOR ALL videoIds provided
// AGAIN YOU CAN DO WHATEVER YOU WANT WITH THESE DETAILS
// ALSO NOW THAT THE DATA IS AVAILABLE YOU MIGHT WANT TO HIDE THE LOADING ICON AND RENDER THE PAGE! AGAIN JUST AN IDEA, A DATA STORE WOULD PROVIDE FASTER ACCESS BUT YOU WOULD NEED TO UPDATE THE CACHE EVERY SO OFTEN
// ytdata.push(video); <- Push a Bounty Killer Video
});
// THE DETAILS FOR ANOTHER VIDEO BECOMES AVAILABLE AFTER EACH ITERATION OF THE FOR LOOP
}
// ALL THE DATA IS AVAILABLE WHEN THE FOR LOOP HAS COMPLETED
});
// This is executed immediately before YTAPI has responded.
// console.log(ytdata); This gives []

A persistent multiset that is built up sequentially and processed at once for Node.js

In Node.js I am trying to get the following behaviour: During the runtime of my Express application I accumulate several IDs of objects that need further processing. For further processing, I need to transmit these IDs to a different service. The other service however, cannot handle a lot of requests, but rather requires batch transmission. Hence I need to accumulate a lot of individual requests to a bigger one while allowing persistence.
tl;dr — Over 15 minutes, several IDs are accumulated in my application, then after this 15 minute-window, they're all emitted at once. At the same time, the next windows is opened.
From my investigation, this is probably the abstract data type of a multiset: The items in my multiset (or bag) can have duplicates (hence the multi-), they're packaged by the time window, but have no index.
My infrastructure is already using redis, but I am not sure whether there's a way to accumulate data into one single job. Or is there? Are there any other sensible ways to achieve this kind of behavior?
I might be misunderstanding some of the subtlety of your particular situation, but here goes.
Here is a simple sketch of some code that processes a batch of 10 items at a time. The way you would do this differs slightly depending on whether the processing step is synchronous or asynchronous. You don't need anything more complicated than an array for this, since arrays have constant time push and length methods and that is the only thing you need to do. You may want to add another option to flush the batch after a given item in inserted.
Synchronous example:
var batch = [];
var batchLimit = 10;
var sendItem = function (item) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchSynch(batch);
batch = [];
}
}
Asynchronous example:
// note that in this case the job of emptying the batch array
// has to be done inside the callback.
var batch = [];
var batchLimit = 10;
// your callback might look something like function(err, data) { ... }
var sendItem = function (item, cb) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchAsync(batch, cb);
}
}
I came up with a npm module to solve this specific problem using a MySQL database for persistence:
persistent-bag: This is to bags like redis is to queues. A bag (or multiset) is filled over time and processed at once.
On instantiation of the object, the required table is created in the provided MySQL database if necessary.
var PersistentBag = require('persistent-bag');
var bag = new PersistentBag({
host: 'localhost',
port: '3306',
user: 'root',
password: '',
database: 'test'
});
Then items can be .add()ed during the runtime of any number of applications:
var item = {
title: 'Test item to store to bag'
};
bag.add(item, function (err, itemId) {
console.log('Item id: ' + itemId);
});
Working with the emitted, aggregated items every 15 minutes is done like in kue for redis by subscribing to .process():
bag.process(function worker(bag, done) {
// bag.data is now an array of all items
doSomething(bag.data, function () {
done();
});
});

Resources