Firestore Query slow on empty collection - node.js

Below is the start of my code for a firebase function. It gets to the "oncreate" log statement in less than 2 seconds. It takes almost 2 minutes to get to the "got snapshot" log statement. The Invitation collection does not exist, it has zero documents. Why is running a query on an empty collection take so long and how do I speed this up? Thanks in advance.
exports.register = functions.firestore.document("Users/{Email}").onCreate(
async (snap, context) => {
// see if Invitation exists, if yes get FamilyId from there
const collectionRef = admin.firestore().collection("Invitations");
functions.logger.info("oncreate", {structuredData: true});
collectionRef.where("Email", "==", snap.id)
.get().then((querySnapshot) => {
functions.logger.info("got snapshot", {structuredData: true});
if (querySnapshot.empty) {
addUser(snap);
return;
} ....

Since you're performing an asynchronous operation in the Cloud Functions code, you need to return a promise from the top level of your code so that the Cloud Functions container knows how long to keep it running.
From the code you shared, that means you need to add a return here:
return collectionRef.where("Email", "==", snap.id)
...
I recommend checking out the Firebase documentation on sync, async, and promises, which explains more about how to deal with asynchronous calls.

Related

Retrieving a value in a function in Node.js

I'm struggling with callbacks in Node.js. I simply want playerNumber to be set to the number of players in my collection of Players. The console.log works, but I can't get the variable out of the function and into the playerNumber variable.
And if there's a simpler way get this value for use in the rest of my backend code, I'm all ears. I'm clearly new at Node.js, but the code always seems more involved than I'm expecting.
Thanks in advance!
var playerNumber = function countPlayers(callback){
Player.count(function(err, numOfDocs) {
console.log('I have '+numOfDocs+' documents in my collection');
callback(err, numOfDocs);
});
}
It's probably async, and it's a typical first-timer experience to want to "get back to normal" on the call chain on the way back from async call. This can't be done, but it's not so bad to live with it. Here's how...
Step 1: Promises are better than callbacks. I'll leave the long story
to others.
Step 2: Callbacks can be made into promises
In the OP case...
// The promise constructor takes a function that has two functions as params
// one to call on success, and one to call on error. Instead of a callback
// call the 'resolve' param with the data and the 'reject' param with any error
// mark the function 'async' so callers know it can be 'await'-ed
const playerNumber = async function countPlayers() {
return new Promise((resolve, reject) => {
Player.count(function(err, numOfDocs) {
err ? reject(err) : resolve(numOfDocs);
});
});
}
Step 3: Yes, the callers must deal with this, and the callers of the callers, and so on. It's not so bad.
In the OP case (in the most modern syntax)...
// this has to be async because it 'awaits' the first function
// think of await as stopping serial execution until the async function finishes
// (it's not that at all, but that's an okay starting simplification)
async function printPlayerCount() {
const count = await playerNumber();
console.log(count);
}
// as long as we're calling something async (something that must be awaited)
// we mark the function as async
async function printPlayerCountAndPrintSomethingElse() {
await printPlayerCount();
console.log('do something else');
}
Step 4: Enjoy it, and do some further study. It's actually great that we can do such a complex thing so simply. Here's good reading to start with: MDN on Promises.

Node.js .map function causing express server to return the response object early

My goal is to create an abstracted POST function for an express running on Node similar to Django's inbuilt REST methods. As a part of my abstracted POST function I'm checking the database (mongo) for valid foreign keys, duplicates, etc..., which is where we are in the process here. The function below is the one that actually makes the first calls to mongo to check that the incoming foreign keys actually exist in the tables/collections that they should exist in.
In short, the inbuilt response functionality inside the native .map function seems to be causing an early return from the called/subsidiary functions, and yet still continuing on inside the called/subsidiary functions after the early return happens.
Here is the code:
const db = require(`../../models`)
const findInDB_by_id = async params => {
console.log(`querying db`)
const records = await Promise.all(Object.keys(params).map(function(table){
return db[table].find({
_id: {
$in: params[table]._ids
}
})
}))
console.log('done querying the db')
// do stuff with records
return records
}
// call await findIndDB_by_id and do other stuff
// eventually return the response object
And here are the server logs
querying db
POST <route> <status code> 49.810 ms - 9 //<- and this is the appropriate response
done querying the db
... other stuff
When I the function is modified so that the map function doesn't return anything, (a) it doesn't query the database, and (b) doesn't return the express response object early. So by modifying this:
const records = await Promise.all(Object.keys(params).map(function(table){
return db[table].find({ // going to delete the `return` command here
_id: {
$in: params[table]._ids
}
})
}))
to this
const records = await Promise.all(Object.keys(params).map(function(table){
db[table].find({ // not returning this out of map
_id: {
$in: params[table]._ids
}
})
}))
the server logs change to:
querying db
done querying the db
... other stuff
POST <route> <status code> 49.810 ms - 9 // <-appropriate reponse
But then I'm not actually building my query, so the query response is empty. I'm experiencing this behavior with the anonymous map function being an arrow function of this format .map(table => (...)) as well.
Any ideas what's going on, why, or suggestions?
Your first version of your code is working as expected and the logs are as expected.
All async function return a promise. So findInDB_by_id() will always return a promise. In fact, they return a promise as soon as the first await inside the function is encountered as the calling code continues to run. The calling code itself needs to use await or .then() to get the resolved value from that promise.
Then, some time later, when you do a return someValue in that function, then someValue becomes the resolved value of that promise and that promise will resolve, allowing the calling code to be notified via await or .then() that the final result is now ready.
await only suspends execution of the async function that it is in. It does not cause the caller to be blocked at all. The caller still has to to deal with a promise returned from the async function.
The second version of your code just runs the db queries open loop with no control and no ability to collect results or process errors. This is often referred to as "fire and forget". The rest of your code continues to execute without regard for anything happening in those db queries. It's highly unlikely that the second version of code is correct. While there are a very few situations where "fire and forget" is appropriate, I will always want to at least log errors in such a situation, but since it appears you want results, this cannot be correct
You should try something like as when it encounter first async function it start executing rest of the code
let promiseArr = []
Object.keys(params).map(function(table){
promiseArr.push(db[table].find({
_id: {
$in: params[table]._ids
}
}))
})
let [records] = await Promise.all(promiseArr)
and if you still want to use map approach
await Promise.all(Object.keys(params).map(async function(table){
return await db[table].find({
_id: {
$in: params[table]._ids
}
})
})
)

Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? Best practices for using Knex.Transaction

When working with a big application that has several tables and several DB operations it's very difficult to keep track of what transactions are occurring. To workaround this we started by passing around a trx object.
This has proven to be very messy.
For example:
async getOrderById(id: string, trx?: Knex.Transaction) { ... }
Depending on the function calling getOrderById it will either pass a trx object or not. The above function will use trx if it is not null.
This seems simple at first, but it leads to mistakes where if you're in the middle of a transaction in one function and call another function that does NOT use a transaction, knex will hang with famous Knex: Timeout acquiring a connection. The pool is probably full.
async getAllPurchasesForUser(userId: string) {
..
const trx = await knex.transaction();
try {
..
getPurchaseForUserId(userId); // Forgot to make this consume trx, hence Knex timesout acquiring connection.
..
}
Based on that, I'm assuming this is not a best practice, but I would love if someone from Knex developer team could comment.
To improve this we're considering to instead use knex.transactionProvider() that is accessed throughout the app wherever we perform DB operations.
The example on the website seems incomplete:
// Does not start a transaction yet
const trxProvider = knex.transactionProvider();
const books = [
{title: 'Canterbury Tales'},
{title: 'Moby Dick'},
{title: 'Hamlet'}
];
// Starts a transaction
const trx = await trxProvider();
const ids = await trx('catalogues')
.insert({name: 'Old Books'}, 'id')
books.forEach((book) => book.catalogue_id = ids[0]);
await trx('books').insert(books);
// Reuses same transaction
const sameTrx = await trxProvider();
const ids2 = await sameTrx('catalogues')
.insert({name: 'New Books'}, 'id')
books.forEach((book) => book.catalogue_id = ids2[0]);
await sameTrx('books').insert(books);
In practice here's how I'm thinking about using this:
SingletonDBClass.ts:
const trxProvider = knex.transactionProvider();
export default trxProvider;
Orders.ts
import trx from '../SingletonDBClass';
..
async getOrderById(id: string) {
const trxInst = await trx;
try {
const order = await trxInst<Order>('orders').where({id});
trxInst.commit();
return order;
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to fetch order, error: ${e}`);
}
}
..
Am I understanding this correctly?
Another example function where a transaction is actually needed:
async cancelOrder(id: string) {
const trxInst = await trx;
try {
trxInst('orders').update({ status: 'CANCELED' }).where({ id });
trxInst('active_orders').delete().where({ orderId: id });
trxInst.commit();
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to cancel order, error: ${e}`);
}
}
Can someone confirm if I'm understanding this correctly? And more importantly if this is a good way to do this. Or is there a best practice I'm missing?
Appreciate your help knex team!
No. You cannot have global singleton class returning the transaction for your all of your internal functions. Otherwise you are trying always to use the same transaction for all the concurrent users trying to do different things in the application.
Also when you once commit / rollback the transaction returned by provider, it will not work anymore for other queries. Transaction provider can give you only single transaction.
Transaction provider is useful in a case, where you have for example middleware, which provides transaction for request handlers, but it should not be started, since it might not be needed so you don't want yet allocate a connection for it from pool.
Good way to do your stuff is to pass transcation or some request context or user session around, so that each concurrent user can have their own separate transactions.
for example:
async cancelOrder(trxInst, id: string) {
try {
trxInst('orders').update({ status: 'CANCELED' }).where({ id });
trxInst('active_orders').delete().where({ orderId: id });
trxInst.commit();
} catch (e) {
trxInst.rollback();
throw new Error(`Failed to cancel order, error: ${e}`);
}
}
Depending on the function calling getOrderById it will either pass a trx object or not. The above function will use trx if it is not null.
This seems simple at first, but it leads to mistakes where if you're in the middle of a transaction in one function and call another function that does NOT use a transaction, knex will hang with famous Knex: Timeout acquiring a connection. The pool is probably full.
We usually do it in a way that if trx is null, query throws an error, so that you need to explicitly pass either knex / trx to be able to execute the method and in some methods trx is actually required to be passed.
Anyhow if you really want to force everything to go through single transaction in a session by default you could create API modules in a way that for each user session you create an API instance which is initialized with transaction:
const dbForSession = new DbService(trxProvider);
const users = await dbForSession.allUsers();
and .allUsers() does something like return this.trx('users');

Updating a user concurrently in mongoose using model.save() throws error

I am getting a mongoose error when I attempt to update a user field multiple times.
What I want to achieve is to update that user based on some conditions after making an API call to an external resource.
From what I observe, I am hitting both conditions at the same time in the processUser() function
hence, user.save() is getting called almost concurrently and mongoose is not happy about that throwing me this error:
MongooseError [ParallelSaveError]: Can't save() the same doc multiple times in parallel. Document: 5ea1c634c5d4455d76fa4996
I know am guilty and my code is the culprit here because I am a novice. But is there any way I can achieve my desired result without hitting this error? Thanks.
function getLikes(){
var users = [user1, user2, ...userN]
users.forEach((user) => {
processUser(user)
})
}
async function processUser(user){
var result = await makeAPICall(user.url)
// I want to update the user based on the returned value from this call
// I am updating the user using `mongoose save()`
if (result === someCondition) {
user.meta.likes += 1
user.markModified("meta.likes")
try {
await user.save()
return
} catch (error) {
console.log(error)
return
}
} else {
user.meta.likes -= 1
user.markModified("meta.likes")
try {
await user.save()
return
} catch (error) {
console.log(error)
return
}
}
}
setInterval(getLikes, 2000)
There are some issues that need to be addressed in your code.
1) processUser is an asynchronous function. Array.prototype.forEach doesn't respect asynchronous functions as documented here on MDN.
2) setInterval doesn't respect the return value of your function as documented here on MDN, therefore passing a function that returns a promise (async/await) will not behave as intended.
3) setInterval shouldn't be used with functions that could potentially take longer to run than your interval as documented here on MDN in the Usage Section near the bottom of the page.
Hitting an external api for every user every 2 seconds and reacting to the result is going to be problematic under the best of circumstances. I would start by asking myself if this is absolutely the only way to achieve my overall goal.
If it is the only way, you'll probably want to implement your solution using the recursive setTimeout() mentioned at the link in #2 above, or perhaps using an async version of setInterval() there's one on npm here

Using firebase cloud function to copy all documents from a master collection in Firestore to new sub-collection

I have a master collection in firestore with a couple hundred documents (which will grow to a few thousand in a couple of months).
I have a use case, where every time a new user document is created in /users/ collection, I want all the documents from the master to be copied over to /users/{userId}/.
To achieve this, I have created a firebase cloud function as below:
// setup for new user
exports.setupCollectionForUser = functions.firestore
.document('users/{userId}')
.onCreate((snap, context) => {
const userId = context.params.userId;
db.collection('master').get().then(snapshot => {
if (snapshot.empty) {
console.log('no docs found');
return;
}
snapshot.forEach(function(doc) {
return db.collection('users').doc(userId).collection('slave').doc(doc.get('uid')).set(doc.data());
});
});
});
This works, the only problem is, it takes forever (~3-5 mins) for only about 200 documents. This has been such a bummer because a lot depends on how fast these documents get copied over. I was hoping this to be not more than a few seconds at max. Also, the documents show up altogether and not as they are written, or at least they seem that way.
Am I doing anything wrong? Why should it take so long?
Is there a way I can break this operation into multiple reads and writes so that I can guarantee a minimum documents in a few seconds and not wait until all of them are copied over?
Please advise.
If I am not mistaking, by correctly managing the parallel writes with Promise.all() and returning the Promises chain it should normally improve the speed.
Try to adapt your code as follows:
exports.setupCollectionForUser = functions.firestore
.document('users/{userId}')
.onCreate((snap, context) => {
const userId = context.params.userId;
return db.collection('master').get().then(snapshot => {
if (snapshot.empty) {
console.log('no docs found');
return null;
} else {
const promises = [];
const slaveRef = db.collection('users').doc(userId).collection('slave');
snapshot.forEach(doc => {
promises.push(slaveRef.doc(doc.get('uid')).set(doc.data()))
});
return Promise.all(promises);
}
});
});
I would suggest you watch the 3 videos about "JavaScript Promises" from the Firebase video series which explain why it is key to return a Promise or a value in a background triggered Cloud Function.
Note also, that if you are sure you have less than 500 documents to save in the slave collection, you could use a batched write. (You could use it for more than 500 docs but then you would have to manage different batches of batched write...)

Resources