Hello and thanks in advance!
MERN stack student here. I'm just getting to know the MVC design pattern aswell.
So, here's my question:
I'm trying to get some docs from a Mongo collection (working with Mongoose) and I get a limit query with my request (let's say I need only the first 5 docs from a collection of 30 docs). What is considered best practice in general? Would you use one or the other in different cases (i.e. how big is the db, as an example that comes to mind)?
Something like this:
Controller:
getProducts() {
const { limit } = req.query;
const products = await productManagerDB.getProducts();
res.status(200).json({ success: true, limitedProductsList: products.slice(0, Number(limit))});
}
Or
Like this:
Controller:
getProducts() {
const { limit } = req.query;
const products = await productManagerDB.getProducts(limit);
res.status(200).json({ success: true, limitedProductsList: products});
}
Service:
getProducts(query) {
try {
const limit = query? Number(query) : 0;
const products = await ProductsModel.find().limit(limit);
return products;
} catch (error) {
throw new Error(error.message)
}
}
Tried both ways with the same outcome. I expect second to be more efficient since it's not loading all the data that I ain't using but curious if in some cases would be better to fetch the whole collection...
As you have already stated the second query is far more efficient - especially when you have a large collection.
the two differences would be:
MongoDB engine will have to fetch all the documents from the disk -
the way MongoDB is designed (the internal wired-tiger storage
engine) is that it caches the frequently used data either in the
application cache or the OS cache - in any case, its quite possible
that the whole collection will not fit in the memory and therefore a
large number of disk operations will happen (very slow comparatively
even with the latest nvme disks)
A large amount of data will have to flow over the network from the
database server to the application server which is a waste of
bandwidth and will be slower.
Where you might need the full collection obviously depends on the usecase
Related
I am working in a node proyect and I installed firebase package.
This is the structure of my database.
I want to stract all data of the client aaa#gmail.com in only one operation. I mean
the name, email, age and all addresses....
const docClient = doc(db,"Client", "aaa#gmail.com");
const collectionAddresses = collection(docClient, "addresses");
const DataClient = await getDoc(docClient)
const querySnapshot = await getDocs(collectionAddresses);
console.log(DataClient.data());
querySnapshot.forEach((doc) => {
console.log(doc.id, " => ", doc.data());
});
I think there are too many operations to only extract the customer data found in the same document
CHAT GPT told me that if I use transactions, both operations are performed as if it were one and he gave me a code that is not working.
But this code is wrong, because I think transaction only works with documents, no collections.
try{
await runTransaction(db,async (transaccion)=>{
var collectionAddresses=collection(doc(db,"Client","aaaa#gmail.com"),"addresses");
var a=await transaccion.get(collectionAddresses);
});
}catch(errors){
console.log(errors);
}
In fact, node print an error for console.....
So my question is if someone knows how can i get all data in only one operation?
How can i get all data in only one operation?
I understand that by "one operation" you mean "one query". With your data model this is not possible. You'll have to query the client doc and all the address docs.
So for your example (with 3 addresses) you'll pay for 4 document reads.
Note that using a transaction will not change anything from a number of reads perspective. CHAT GPT answer is not very good... The "one operation" that this AI refers to is the fact that changes in a transaction are grouped in an atomic operation, not that there is only one read or one write from/to the Firestore DB.
One possible approach would be to store the addresses in an Array in the client doc, but you have to be sure that you'll never reach the 1MiB maximum size limit for a document.
I'm developing an app using sequelize and I just started to use transaction() because I want my queries to be able to rollback on errors. So I had my regular functions
const customer = await Customer.findOne({ where: { email: { [Op.eq]: body.email } } });
and now I also have
const newCustomer = await Customer.create(body, { transaction: t });
Everything is working fine for the moment, but I can help thinking if it's a good idea in terms of performance to use both on the same operation. (looking for an already created customer and in case it doesn't exist, create a new one based on email address) I think they both are using different transactions but I'm not sure how that can affect for example my pool max number.
PD: I'm facing some issues where it seems that a query blocks or something my node and I have to restart the server to make everything work again.
Thanks in advance for your help!
If you forget to indicate a transaction in one or more queries than it can lead to deadlocks because you will have two transactions changing/reading the same or linked records.
You should indicate a transaction even in read operations like findOne:
const customer = await Customer.findOne({ where: { email: { [Op.eq]: body.email } }, transaction: t });
I'm very new to systems design in general, so let me try and explain my question to the best of my ability!
I have two EC2 t2.micro instances up and running: one is housing my MongoDB, which is storing 10,000,000 primary records, and the other has my express server on it.
The structure of my MongoDB documents are as follows:
{
_id: 1,
images: ["url_1.jpg", "url_2.jpg", "url_3.jpg"],
}
This is what my mongo connection looks like:
const { MongoClient } = require('mongodb');
const { username, password, ip } = require('./config.js');
const client = new MongoClient(`mongodb://${username}:${password}#${ip}`,
{ useUnifiedTopology: true, poolSize: 10 });
client.connect();
const Images = client.db('imagecarousel').collection('images');
module.exports = Images;
I am using loader.io to run a 1000PRS stress test to my servers GET API endpoint. The first test uses a .findOne() query, the second a .find().limit(1) query, like so:
const query = { _id: +req.params.id };
Images.findOne(query).then((data) => res.status(200).send(data))
.catch((err) => {
console.log(err);
res.status(500).send(errorMessage);
});
//////////////////////////////////////////
const query = { _id: +req.params.id };
Images.find(query).limit(1).toArray().then((data) => res.status(200).send(data[0]))
.catch((err) => {
console.log(err);
res.status(500).send(errorMessage);
});
When I looked at the results on New Relic, I was a little perplexed by what I saw: New Relic Results
After some research, I figured this has something to do with .findOne() returning a document, and .find() returning a cursor?
So my question is: How do I determine if the bottle neck is node.js or MongoDB, and do the queries I use determine that for me (in this specific case)?
I would suggest that you start with the mongodb console and explore your queries in detail. This way you will isolate the mongodb behavior from the driver behavior.
A good way to analyse your queries is:
cursor.explain() - https://docs.mongodb.com/manual/reference/method/cursor.explain/
$explain - https://docs.mongodb.com/manual/reference/operator/meta/explain/
If you aim at pitch-perfect database performance tuning, you need to understand every detail of the execution of your queries. It will take some time to get grip of it, but it's totally worth it!
Another detail of interest is the real-world performance monitoring and profiling in production, which reveals the true picture of the bottlenecks in your application, as opposed to the more "sterile" non-production stress-testing. Here is a good profiler, which allows you to insert custom profiling points in your app and to easily turn profiling on and off without restarting the app:
https://www.npmjs.com/package/mc-profiler
A good practice would be to first let the application run in production as a beta, inspect profiling data and optimize the slow code. Otherwise you could waste swathes of time going after some optimizations, which have little to no impact to the general app performance.
im trying to build a filter menu to filter the incoming data from mongodb.
im using the .find(); function to limit the incoming data.
usage like this
Post.find({boatType: "Cruiser"})
So here is what ive got so far.
I transfer the data that i need to filter via query
const res = await axios.get(`/api/posts/`,
{
params: {
hull: "Catamaran",
boatType: "Cruiser",
seller: "Private",
etc..
}
});
express Backend
const posts = await Post.find({exampleField: "exampleFilter"});
and this is where im stuck. After i pass the queries to the node backend, i have no idea how to get the .find() function to work across multiple fields. and even more confusing for me is that these queries will be dynamic. Sometimes the "hull" query will not be there, or it might be that "seller" is not there, etc etc. Is there a better way to do what im doing? Should i be using $Regex or $in... im lost and the documentation is doing me no favors.
all help is appreciated.
you should just pass your query object to Post.find(), like this:
app.get('/api/posts' , (req, res) => {
// you can access the query from req.query
const posts = await Post.find(req.query);
})
You can check here and here for more details.
I'm working on caching strategies for an application that uses knex.js for all sql related stuff.
Is there a way to intercept the query to check if it can be fetched from a cache instead of querying the database?
Briefly looked into knex.js events, which has a query event.
Doc:
A query event is fired just before a query takes place, providing data about the query, including the connection's __knexUid / __knexTxId properties and any other information about the query as described in toSQL. Useful for logging all queries throughout your application.
Which means that it's possible to do something like (also from docs)
.from('users')
.on('query', function(data) {
app.log(data);
})
.then(function() {
// ...
});
But is it possible to make the on query method intercept and do some logic before actually executing the query towards the database?
I note that this suggestion is attached to a Knex GitHub issue (credit to Arian Santrach) which seems relevant:
knex.QueryBuilder.extend('cache', async function () {
try {
const cacheKey = this.toString()
if(cache[cacheKey]) {
return cache[cacheKey]
}
const data = await this
cache[cacheKey] = data
return data
} catch (e) {
throw new Error(e)
}
});
This would allow:
knex('tablename').where(criteria).cache()
to check for cached data for the same query. I would think a similar sort of structure could be used for whatever your caching solution was, using the query's string representation as the key.