Shortly, imagine I have a Cloud Firestore DB where I store some users data such as email, geo-location data (as geopoint) and some other things.
In Cloud Functions I have "myFunc" that runs trying to "link" two users between them based on a geo-query (I use GeoFirestore for it).
Now everything works well, but I cannot figure out how to avoid this kind of situation:
User A calls myFunc trying to find a person to be associated with, and finds User B as a possible one.
At the same time, User B calls myFunc too, trying to find a person to be associated with, BUT finds User C as possible one.
In this case User A would be associated with User B, but User B would be associated with User C.
I already have a field called "associated" set to FALSE on each user initialization, that becomes TRUE whenever a new possible association has been found.
But this code cannot guarantee the right association if User A and User B trigger the function at the same time, because at the moment in which the function triggered by User A will find User B, the "associated" field of B will be still set to false because B is still searching and has not found anybody yet.
I need to find a solution otherwise I'll end up having
wrong associations ( User A pointing at User B, but User B pointing at User C ).
I also thought about adding a snapshotListener to the user who is searching, so in that way if another User would update the searching user's document, I could terminate the function, but I'm not really sure it will work as expected.
I'd be incredibly grateful if you could help me with this problem.
Thanks a lot!
Cheers,
David
HERE IS MY CODE:
exports.myFunction = functions.region('europe-west1').https.onCall( async (data , context) => {
const userDoc = await firestore.collection('myCollection').doc(context.auth.token.email).get();
if (!userDoc.exists) {
return null;
}
const userData = userDoc.data();
if (userData.associated) { // IF THE USER HAS ALREADY BEEN ASSOCIATED
return null;
}
const latitude = userData.g.geopoint["latitude"];
const longitude = userData.g.geopoint["longitude"];
// Create a GeoQuery based on a location
const query = geocollection.near({ center: new firebase.firestore.GeoPoint(latitude, longitude), radius: userData.maxDistance });
// Get query (as Promise)
let otherUser = []; // ARRAY TO SAVE THE FIRST USER FOUND
query.get().then((value) => {
// CHECK EVERY USER DOC
value.docs.map((doc) => {
doc['data'] = doc['data']();
// IF THE USER HAS NOT BEEN ASSOCIATED YET
if (!doc['data'].associated) {
// SAVE ONLY THE FIRST USER FOUND
if (otherUser.length < 1) {
otherUser = doc['data'];
}
}
return null;
});
return value.docs;
}).catch(error => console.log("ERROR FOUND: ", error));
// HERE I HAVE TO RETURN AN .update() OF DATA ON 2 DOCUMENTS, IN ORDER TO UPDATE THE "associated" and the "userAssociated" FIELDS OF THE USER WHO WAS SEARCHING AND THE USER FOUND
return ........update({
associated: true,
userAssociated: otherUser.name
});
}); // END FUNCTION
You should use a Transaction in your Cloud Function. Since Cloud Functions are using the Admin SDK in the back-end, Transactions in a Cloud Function use pessimistic concurrency controls.
Pessimistic transactions use database locks to prevent other operations from modifying data.
See the doc form more details. In particular, you will read that:
In the server client libraries, transactions place locks on the
documents they read. A transaction's lock on a document blocks other
transactions, batched writes, and non-transactional writes from
changing that document. A transaction releases its document locks at
commit time. It also releases its locks if it times out or fails for
any reason.
When a transaction locks a document, other write operations must wait
for the transaction to release its lock. Transactions acquire their
locks in chronological order.
Related
I use MongoDB to store user data. The user id goes incrementally, such as 1, 2, 3, 4 etc when new user register.
I have the following code to generate the user id. "users" is the name of the collection where I store the user data.
// generate new user id
let uid;
const collections = await db.listCollections().toArray();
const collectionNames = collections.map(collection => collection.name);
if(collectionNames.indexOf("users") == -1){
uid = 1;
}
else{
const newest_user = await db.collection("users").find({}).sort({"_id":-1}).limit(1).toArray();
uid = newest_user[0]["_id"] + 1;
}
user._id = uid;
// add and save user
db.collection("users").insertOne(user).catch((error)=>{
throw error;
});
One concern I have now is that when two users make a request to register at the same time, they will get same maximum user id, and create the same new user id. One way to prevent it is using a locked thread. But, I think Node.js and Next.js doesn't support multi-thread.
What are some alternatives I have to solve this problem?
In addition, _id will be the field for uid. Will it make a difference since _id can't be duplicated.
Why not have the database generate the auto-incrementing ID? https://www.mongodb.com/basics/mongodb-auto-increment
One idea I have is using a transaction which can solve the concurrency issue. Transactions obey the rule of ACID. The writes to the database from the concurrent requests will run in isolation.
I'm creating an app in react-native with a nodejs backend. The app is almost done, and I'm now stress testing the backend.
In my postgresql database, I have a table called notifications to store all the notifications a user receives.
In my app, a user can follow pages. When a page posts a new message, I want to send a notification to all users following that page. Every user should receive an individual notification.
Let's say a page is followed by 1 million users, and the page posts a new message: this means 1 million notifications (eg. 1 million rows) should be inserted into the database.
My solution (for now) is by chunking up the array of user ID's (of the users following the page) into chunks of 1000 user ID's each, and doing an insert query using every chunk.
const db = require('./db');
const format = require('pg-format');
const userIds = [1,2, 3, 4, 5, ..., 1000000];
// split up the user ID's array into chunks of 1000 user ID's each
const chunks = chunkArray(userIds, 1000); // chunkArray is a function that splits up an array
into multiple arrays with x items, in this case x = 1000;
// loop over each chunk
chunks.forEach(chunk => {
const array = [];
// create an array containing 1000 objects, each containing a user ID, notification type and
// page ID (for inserting into the database)
chunk.forEach(userId => {
array.push({ userId, type: 'post', pageId: _PAGE_ID_ });
});
// create and run the query
const query = format("INSERT INTO notifications (userId, type, pageId) VALUES %L", array);
const result = await db.query(query);
});
I'm using node-postgres for the database connection, and I'm creating a connection pool. I fetch one client from the pool, so only 1 connection is used for all the queries in the forEach-loop.
This all works, but inserting 1 million rows takes a few minutes. I'm not sure this is the right way to do this.
Another solution I came up with is using "general notifications". When a page updates a post, I only insert 1 notification into the notifications table, and when I query for all notifications for a specific user, I check which pages the user is following, and fetch all the general notifications of that page with the query. Would this be a better solution? I would leave me with A LOT less notification-rows and I think it would increase performance.
Thank you for all responses!
I'm trying to implement my other solution. When a page updates a post, I insert only one notification without a user ID (because it has no specific destination), but with the page ID.
When I fetch all the notifications for a user, I first check for all the notifications with that user ID, and for all notification without a user ID but with a page ID of a page that the user is following.
I think this is not the easiest solution, but it will reduce the number of rows and if I do a good job with indexes and stuff, I think I'm able to write a pretty performant query.
Without getting into which solution would be better, one way to solve it could be this, provided that you keep all the pages and followers in the same database.
INSERT INTO notifications (userId, type, pageID)
SELECT users.id, 'post', pages.id
from pages
join followers on followers.pageId = pages.id
join users on followers.userId = users.id
where pages.id = _PAGE_ID_
This would allow the DB to handle everything, which should speed up the insert since you don't need to send each individual row from the server.
If you don't have the users/pages in the same DB then it's a bit more tricky.
You could prepare a CSV file, upload it to the database server and use the COPY command. If you don't have access to the server, you might be able to stream the data directly as the COPY command can read from stdin (that depends on the library, I'm not familiar with node-postgres so I can't tell if it's possible.)
Alternatively you can do everything in a transaction by issuing a BEGIN before you do the inserts, this is the slowest alternative, but save time in the overhead of postgres creating an implicit transaction for each statement. Just don't forget to commit after. The library might even have ways to create explicit transactions and insert data through.
That said, I would probably do a variation of your second solution since it would create less rows in the DB, but that depends on your other requirements, it might not be possible if you need to track notifications or perform other actions on them.
use async.eachOfLimit to insert X chunks in parallel
In the following example you will insert 10 chunks in parallel
const userIds = [1,2, 3, 4, 5, ..., 1000000];
const chunks = chunkArray(userIds, 1000);
var BATCH_SIZE_X = 10;
async.eachOfLimit(chunks, BATCH_SIZE_X, function(c, i, ecb){
c = c.map(function(e){ return { e, type: 'post', pageId: _PAGE_ID_ });
const query = format("INSERT INTO notifications (userId, type, pageId) VALUES %L", c);
const result = await db.query(query);
return ecb();
}, function(err){
if(err){
}
else{
}
});
I am evaluating Mikro-Orm for a future project. There are several questions I either could not find an answer in the docs or did not fully understand them.
Let me describe a minimal complex example (NestJS): I have an order processing system with two entities: Orders and Invoices as well as a counter table for sequential invoice numbers (legal requirement). It's important to mention, that the OrderService create method is not always called by a controller, but also via crobjob/queue system. My questions is about the use case of creating a new order:
class OrderService {
async createNewOrder(orderDto) {
const order = new Order();
order.customer = orderDto.customer;
order.items = orderDto.items;
const invoice = await this.InvoiceService.createInvoice(orderDto.items);
order.invoice = invoice;
await order.persistAndFlush();
return order
}
}
class InvoiceService {
async create(items): Invoice {
const invoice = new Invoice();
invoice.number = await this.InvoiceNumberService.getNextInSequence();
// the next two lines are external apis, if they throw, the whole transaction should roll back
const pdf = await this.PdfCreator.createPdf(invoice);
const upload = await s3Api.uplpad(pdf);
return invoice;
}
}
class InvoiceNumberService {
async getNextInSequence(): number {
return await db.collection("counter").findOneAndUpdate({ type: "INVOICE" }, { $inc: { value: 1 } });
}
}
The whole use case of creating a new order with all subsequent service calls should happen in one Mikro-Orm transaction. So if anything throws in OrderService.createNewOrder() or one one of the subsequently called methods, the whole transaction should be rolled back.
Mikro-Orm does not allow the atomic update-increment shown in InvoiceNumberService. I can fall back to the native mongo driver. But how do I ensure the call to collection.findOneAndUpdate() shares the same transaction as the entities managed by Mikro-Orm?
Mikro-Orm needs a unique request context. In the examples for NestJS, this unique context is created at the controller level. In the example above the service methods are not necessarily called by a controller. So I would need a new context for each call to OrderService.createNewOrder() that has a lifetime scoped to the function call, correct? How can I acheive this?
How can I share the same request context between services? In the example above InvoiceService and InvoiceNumberService would need the same context as OrderService for Mikro-Orm to work properly.
I will start with the bad news, mongodb transactions are not yet supported in MikroORM (athough they will land within weeks probably, already got the PoC implemented). You can subscribe here for updates: https://github.com/mikro-orm/mikro-orm/issues/34
But let me answer the rest as it will then apply:
You can use const collection = (em as EntityManager<MongoDriver>).getConnection().getCollection('counter'); to get the collection from the internal mongo connection instance. You can also use orm.em.getTransactionContext() to get the current trasaction context (currently implemented only in sql drivers, but in future this will probably return the session object in mongo).
Also note that in mongo driver, implicit transactions won't be enabled by default (it will be configurable though), so you will need to use explicit transaction demarcation via em.transactional(...).
The RequestContext helper works automatically. You just register it as a middleware (done automatically in the nestjs orm adapter) and then your request handler (route/endpoint/controller method) is ran inside a domain that shares the context. Thanks to this, all services in the DI can share singleton instances of repositories, but they will automatically pick the right context from the domain.
You basically have this automatic request context, and then you can create new (nested) contexts manually via em.transactional(...).
https://mikro-orm.io/docs/transactions/#approach-2-explicitly
When I create a new document in the note collection, I want to update the quantity in the info document. What am I doing wrong?
exports.addNote = functions.region('europe-west1').firestore
.collection('users/{userId}/notes').onCreate((snap,context) => {
const uid = admin.user.uid.toString();
var t;
db.collection('users').doc('{userId}').collection('info').doc('info').get((querySnapshot) => {
querySnapshot.forEach((doc) => {
t = doc.get("countMutable").toString();
});
});
let data = {
countMutable: t+1;
};
db.collection("users").doc(uid).collection("info").doc("info").update({countMutable: data.get("countMutable")});
});
You have... a lot going on here. A few problems:
You can't trigger firestore functions on collections, you have to supply a document.
It isn't clear you're being consistent about how to treat the user id.
You aren't using promises properly (you need to chain them, and return them out of the function if you want them to execute properly).
I'm not clear about the relationship between the userId context parameter and the uid you are getting from the auth object. As far as I can tell, admin.user isn't actually part of the Admin SDK.
You risk multiple function calls doing an increment at the same time giving inconsistent results, since you aren't using a transaction or the increment operation. (Learn More Here)
The document won't be created if it doesn't already exist. Maybe this is ok?
In short, this all means you can do this a lot more simply.
This should do you though. I'm assuming that the uid you actually want is actually the one on the document that is triggering the update. If not, adjust as necessary.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();
const db = admin.firestore();
exports.addNote = functions.firestore.document('users/{userId}/notes/{noteId}').onCreate((snap,context) => {
const uid = context.params.userId;
return db.collection("users").doc(uid).collection("info").doc("info").set({
countMutable: admin.firestore.FieldValue.increment(1)
}, { merge: true });
});
If you don't want to create the info document if it doesn't exist, and instead you want to get an error, you can use update instead of set:
return db.collection("users").doc(uid).collection("info").doc("info").update({
countMutable: admin.firestore.FieldValue.increment(1)
});
I am using Firebase cloud code and firebase realtime database.
My database structure is:
-users
-userid32
-userid4734
-flag=true
-userid722
-flag=false
-userid324
I want to query only the users who's field 'flag' is 'true' .
What I am doing currently is going over all the users and checking one by one. But this is not efficient, because we have a lot of users in the database and it takes more than 10 seconds for the function to run:
const functions = require('firebase-functions');
const admin = require("firebase-admin");
admin.initializeApp(functions.config().firebase);
exports.test1 = functions.https.onRequest((request, response) => {
// Read Users from database
//
admin.database().ref('/users').once('value').then((snapshot) => {
var values = snapshot.val(),
current,
numOfRelevantUsers,
res = {}; // Result string
numOfRelevantUsers = 0;
// Traverse through all users to check whether the user is eligible to get discount.
for (val in values)
{
current = values[val]; // Assign current user to avoid values[val] calls.
// Do something with the user
}
...
});
Is there a more efficient way to make this query and get only the relevant records? (and not getting all of them and checking one by one?)
You'd use a Firebase Database query for that:
admin.database().ref('/users')
.orderByChild('flag').equalTo(true)
.once('value').then((snapshot) => {
const numOfRelevantUsers = snapshot.numChildren();
When you need to loop over child nodes, don't treat the resulting snapshot as an ordinary JSON object please. While that may work here, it will give unexpected results when you order on a value with an actual range. Instead use the built-in Snapshot.forEach() method:
snapshot.forEach(function(userSnapshot) {
console.log(userSnapshot.key, userSnapshot.val());
}
Note that all of this is fairly standard Firebase Database usage, so I recommend spending some extra time in the documentation for both the Web SDK and the Admin SDK for that.