In reservation system, only 5 different user can create bookings. If 100 user call booking api at same time than how to handle concurrency with locking. I am using nodejs with mongodb. I went through mongo concurrency article and transactions in mongodb, but cannot find any sample coding solution with locking.
I have achieved solution with Optimistic concurrency control (when there is low contention for the resource - This can be easily implemented using versionNumber or timeStamp field).
Thank you in advance for suggesting me solution with locking.
Now the algorithm is:
Step 1: Get userAllowedNumber from userSettings collection.
//Query
db.getCollection('userSettings').find({})
//Response Data
{ "userAllowedNumber": 5 }
Step 2, Get current bookedCount from bookings collection.
//Query
db.getCollection('bookings').count({ })
//Response Data
2
Step 3, if bookedCount <= userAllowedNumber then insert in bookings.
//Query
db.getCollection('bookings').create({ user_id: "usr_1" })
I had indepth discussion about locking with transaction in mongodb community. In the conclusion, I learn and found the limitation of transaction. There is no lock we can use to handle concurrent request for this task.
You can see the full Mongodb community conversation at this link
https://www.mongodb.com/community/forums/t/implementing-locking-in-transaction-for-read-and-write/127845
Github demo code with Jmeter testing shows the limitation and not able to handle concurrent request for this task.
https://github.com/naisargparmar/concurrencyMongo
New suggestion are still welcome and appreciate
To solve your problem, you can also try using Redlock (https://redis.com/redis-best-practices/communication-patterns/redlock/) for distributed locking or using Mutex lock for instance locking.
For a simple example of transaction, first you need to have a client connect to a MongoDB.
const client = new MongoClient(uri);
await client.connect();
Once you have the client, you can create a session from which you can make transactions:
const session = await client.startSession();
const transactionResults = await session.withTransaction( async () => {
await client.db().collection("example").insertOne(example);
}, transactionOptions);
With transactionOptions being:
const transactionOptions = {
readPreference: 'primary',
readConcern: { level: 'majority' },
writeConcern: { w: 'majority' }
};
Find about the Read/Write concern in the MongoDb documentation.
Depending on your usecase, you may also consider findAndModify which lock the document on change.
Related
I am working in a node proyect and I installed firebase package.
This is the structure of my database.
I want to stract all data of the client aaa#gmail.com in only one operation. I mean
the name, email, age and all addresses....
const docClient = doc(db,"Client", "aaa#gmail.com");
const collectionAddresses = collection(docClient, "addresses");
const DataClient = await getDoc(docClient)
const querySnapshot = await getDocs(collectionAddresses);
console.log(DataClient.data());
querySnapshot.forEach((doc) => {
console.log(doc.id, " => ", doc.data());
});
I think there are too many operations to only extract the customer data found in the same document
CHAT GPT told me that if I use transactions, both operations are performed as if it were one and he gave me a code that is not working.
But this code is wrong, because I think transaction only works with documents, no collections.
try{
await runTransaction(db,async (transaccion)=>{
var collectionAddresses=collection(doc(db,"Client","aaaa#gmail.com"),"addresses");
var a=await transaccion.get(collectionAddresses);
});
}catch(errors){
console.log(errors);
}
In fact, node print an error for console.....
So my question is if someone knows how can i get all data in only one operation?
How can i get all data in only one operation?
I understand that by "one operation" you mean "one query". With your data model this is not possible. You'll have to query the client doc and all the address docs.
So for your example (with 3 addresses) you'll pay for 4 document reads.
Note that using a transaction will not change anything from a number of reads perspective. CHAT GPT answer is not very good... The "one operation" that this AI refers to is the fact that changes in a transaction are grouped in an atomic operation, not that there is only one read or one write from/to the Firestore DB.
One possible approach would be to store the addresses in an Array in the client doc, but you have to be sure that you'll never reach the 1MiB maximum size limit for a document.
Hello and thanks in advance!
MERN stack student here. I'm just getting to know the MVC design pattern aswell.
So, here's my question:
I'm trying to get some docs from a Mongo collection (working with Mongoose) and I get a limit query with my request (let's say I need only the first 5 docs from a collection of 30 docs). What is considered best practice in general? Would you use one or the other in different cases (i.e. how big is the db, as an example that comes to mind)?
Something like this:
Controller:
getProducts() {
const { limit } = req.query;
const products = await productManagerDB.getProducts();
res.status(200).json({ success: true, limitedProductsList: products.slice(0, Number(limit))});
}
Or
Like this:
Controller:
getProducts() {
const { limit } = req.query;
const products = await productManagerDB.getProducts(limit);
res.status(200).json({ success: true, limitedProductsList: products});
}
Service:
getProducts(query) {
try {
const limit = query? Number(query) : 0;
const products = await ProductsModel.find().limit(limit);
return products;
} catch (error) {
throw new Error(error.message)
}
}
Tried both ways with the same outcome. I expect second to be more efficient since it's not loading all the data that I ain't using but curious if in some cases would be better to fetch the whole collection...
As you have already stated the second query is far more efficient - especially when you have a large collection.
the two differences would be:
MongoDB engine will have to fetch all the documents from the disk -
the way MongoDB is designed (the internal wired-tiger storage
engine) is that it caches the frequently used data either in the
application cache or the OS cache - in any case, its quite possible
that the whole collection will not fit in the memory and therefore a
large number of disk operations will happen (very slow comparatively
even with the latest nvme disks)
A large amount of data will have to flow over the network from the
database server to the application server which is a waste of
bandwidth and will be slower.
Where you might need the full collection obviously depends on the usecase
I'm developing an app using sequelize and I just started to use transaction() because I want my queries to be able to rollback on errors. So I had my regular functions
const customer = await Customer.findOne({ where: { email: { [Op.eq]: body.email } } });
and now I also have
const newCustomer = await Customer.create(body, { transaction: t });
Everything is working fine for the moment, but I can help thinking if it's a good idea in terms of performance to use both on the same operation. (looking for an already created customer and in case it doesn't exist, create a new one based on email address) I think they both are using different transactions but I'm not sure how that can affect for example my pool max number.
PD: I'm facing some issues where it seems that a query blocks or something my node and I have to restart the server to make everything work again.
Thanks in advance for your help!
If you forget to indicate a transaction in one or more queries than it can lead to deadlocks because you will have two transactions changing/reading the same or linked records.
You should indicate a transaction even in read operations like findOne:
const customer = await Customer.findOne({ where: { email: { [Op.eq]: body.email } }, transaction: t });
I'm very new to systems design in general, so let me try and explain my question to the best of my ability!
I have two EC2 t2.micro instances up and running: one is housing my MongoDB, which is storing 10,000,000 primary records, and the other has my express server on it.
The structure of my MongoDB documents are as follows:
{
_id: 1,
images: ["url_1.jpg", "url_2.jpg", "url_3.jpg"],
}
This is what my mongo connection looks like:
const { MongoClient } = require('mongodb');
const { username, password, ip } = require('./config.js');
const client = new MongoClient(`mongodb://${username}:${password}#${ip}`,
{ useUnifiedTopology: true, poolSize: 10 });
client.connect();
const Images = client.db('imagecarousel').collection('images');
module.exports = Images;
I am using loader.io to run a 1000PRS stress test to my servers GET API endpoint. The first test uses a .findOne() query, the second a .find().limit(1) query, like so:
const query = { _id: +req.params.id };
Images.findOne(query).then((data) => res.status(200).send(data))
.catch((err) => {
console.log(err);
res.status(500).send(errorMessage);
});
//////////////////////////////////////////
const query = { _id: +req.params.id };
Images.find(query).limit(1).toArray().then((data) => res.status(200).send(data[0]))
.catch((err) => {
console.log(err);
res.status(500).send(errorMessage);
});
When I looked at the results on New Relic, I was a little perplexed by what I saw: New Relic Results
After some research, I figured this has something to do with .findOne() returning a document, and .find() returning a cursor?
So my question is: How do I determine if the bottle neck is node.js or MongoDB, and do the queries I use determine that for me (in this specific case)?
I would suggest that you start with the mongodb console and explore your queries in detail. This way you will isolate the mongodb behavior from the driver behavior.
A good way to analyse your queries is:
cursor.explain() - https://docs.mongodb.com/manual/reference/method/cursor.explain/
$explain - https://docs.mongodb.com/manual/reference/operator/meta/explain/
If you aim at pitch-perfect database performance tuning, you need to understand every detail of the execution of your queries. It will take some time to get grip of it, but it's totally worth it!
Another detail of interest is the real-world performance monitoring and profiling in production, which reveals the true picture of the bottlenecks in your application, as opposed to the more "sterile" non-production stress-testing. Here is a good profiler, which allows you to insert custom profiling points in your app and to easily turn profiling on and off without restarting the app:
https://www.npmjs.com/package/mc-profiler
A good practice would be to first let the application run in production as a beta, inspect profiling data and optimize the slow code. Otherwise you could waste swathes of time going after some optimizations, which have little to no impact to the general app performance.
I'm working on caching strategies for an application that uses knex.js for all sql related stuff.
Is there a way to intercept the query to check if it can be fetched from a cache instead of querying the database?
Briefly looked into knex.js events, which has a query event.
Doc:
A query event is fired just before a query takes place, providing data about the query, including the connection's __knexUid / __knexTxId properties and any other information about the query as described in toSQL. Useful for logging all queries throughout your application.
Which means that it's possible to do something like (also from docs)
.from('users')
.on('query', function(data) {
app.log(data);
})
.then(function() {
// ...
});
But is it possible to make the on query method intercept and do some logic before actually executing the query towards the database?
I note that this suggestion is attached to a Knex GitHub issue (credit to Arian Santrach) which seems relevant:
knex.QueryBuilder.extend('cache', async function () {
try {
const cacheKey = this.toString()
if(cache[cacheKey]) {
return cache[cacheKey]
}
const data = await this
cache[cacheKey] = data
return data
} catch (e) {
throw new Error(e)
}
});
This would allow:
knex('tablename').where(criteria).cache()
to check for cached data for the same query. I would think a similar sort of structure could be used for whatever your caching solution was, using the query's string representation as the key.