MongoDB query an array of documents for specific match - node.js

I want to search the transactions array looking for a specific match. In this example, by pipedrive_id.
This is what I tried (as per mongodb instructions and this other stack overflow post)
const pipedrive_id = 1677;
const inner_pipedrive_id = 1838;
const result = await Transactions.find({
pipedrive_id,
'transactions': { $elemMatch: { 'pipedrive_id': inner_pipedrive_id } }
});
const result2= await Transactions.find({
'transactions': { $elemMatch: { 'pipedrive_id': inner_pipedrive_id } }
});
const result3 = await Transactions.find({
'transactions.pipedrive_id': inner_pipedrive_id
});
And each result itteration returns all transaction items (all 6 items, instead of 2 [that's how many Mark Smith has in the array).

You can use aggregate to filter out the array. Something like this
You can remove $project if you want all the fields
db.collection.aggregate([
{
$match: {
pipedrive_id: "1677"
}
},
{
$unwind: "$transactions"
},
{
$match: {
"transactions.pipedrive_id": "1838"
}
},
{
$project: {
_id: 0,
pipedrive_id: 1,
transactions: 1
}
}
])
You can check the Mongo playground here.

As the doc, $elemMatch matches documents that contain an array field with at least one element that matches the criteria.
To filter the result inside the array, you will need to use $filter from aggregation
Ref: https://www.mongodb.com/docs/manual/reference/operator/aggregation/filter/

Related

How to filter with pagination efficiently with millions of records in mongodb?

I know there are a LOT of questions regarding this subject. And while most work, they are really poor in performance when there are millions of records.
I have a collection with 10,000,000 records.
At first I was using mongoose paginator v2 and it took around 8s to get each page, with no filtering and 25s when filtering. Fairly decent compared to the other answers I found googling around. Then I read about aggregate (in some question about the same here) and it was a marvel, 7 ms to get each page without filtering, no matter what page it is:
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
let recordCount;
ServiceClass.find().count().then((count) =>{
recordCount = count;
ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
res.status(200).json({
message: msgGettingRecordsSuccess,
serviceClasses: documents,
count: recordCount,
});
})
.catch((error) => {
res.status(500).json({ message: msgGettingRecordsError });
});
}).catch((error) => {
res.status(500).json({ message: "Error getting record count" });
});
What I'm having issues with is when filtering. aggregate doesn't really work like find so my conditions are not working. I read the docs about aggregate and tried with [ {$match: {description: {$regex: regex}}} ] inside aggregate as a start but it did not return anything.
This is my current working function for filtering and pagination (which takes 25s):
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const filter = req.params.filter;
const regex = new RegExp(filter, 'i');
ServiceClass.paginate({
$or:[
{code: { $regex: regex }},
{description: { $regex: regex }},
]
},{limit: pageSize, page: currentPage}).then((documents)=>{
res.status(200).json({
message: msgGettingRecordsSuccess,
serviceClasses: documents
});
}).catch((error) => {
res.status(500).json({ message: "Error getting the records." });
});
code and description are both indexes. code is a unique index and description is just a normal index. I need to search for documents which contains a string either in code or description field.
What is the most efficient way to filter and paginate when you have millions of records?
Below code will get the paginated result from the database along with the count of total documents for that particular query simultaneously.
const pageSize = +req.query.pagesize;
const currentPage = +req.query.currentpage;
const skip = currentPage * pageSize - pageSize;
const query = [
{
$match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
},
{
$facet: {
result: [
{
$skip: skip,
},
{
$limit: pageSize,
},
{
$project: {
createdAt: 0,
updatedAt: 0,
__v: 0,
},
},
],
count: [
{
$count: "count",
},
],
},
},
{
$project: {
result: 1,
count: {
$arrayElemAt: ["$count", 0],
},
},
},
];
const result = await ServiceClass.aggregate(query);
console.log(result)
// result is an object with result and count key.
Hope it helps.
The most efficient way to filter and paginate when you have millions of records is to use the MongoDB's built-in pagination and filtering features, such as the skip(), limit(), and $match operators in the aggregate() pipeline.
You can use the skip() operator to skip a certain number of documents, and the limit() operator to limit the number of documents returned. You can also use the $match operator to filter the documents based on certain conditions.
To filter your documents based on the code or description field, you can use the $match operator with the $or operator, like this:
ServiceClass.aggregate([
{ $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] } },
{ $skip: currentPage },
{ $limit: pageSize }
])
You can also use the $text operator instead of $regex which will perform more efficiently when you have text search queries.
It's also important to make sure that the relevant fields (code and description) have indexes, as that will greatly speed up the search process.
You might have to adjust the query according to your specific use case and data.

mongodb push if key exists otherwise set array

If the Investment has deletedDocuments key exists, push the new item to the array, it works fine but
if the deletedDocuments key does not exist in Investment I want to set the item as array, but its setting a [] blank array in db, item values are not saved
const deletedDoc = {
_id: updatedInvestment.id,
docType: 'CADocument',
deletedBy: user,
deletedDate: new Date()
}
if (updatedInvestment.deletedDocuments && updatedInvestment.deletedDocuments.length) {
await Investment.updateOne(
{ "_id": updatedInvestment._id },
{ $push: { "deletedDocuments": deletedDoc } }
);
// this works fine
} else {
await Investment.updateOne(
{ "_id": updatedInvestment._id },
{ $set: { "deletedDocuments": [deletedDoc] } }
// this is setting a blank array [] in the db
);
}
I am thinking about two options here:
Using not in in the filter + push to add the deleted doc to the array.
Using the $addToSet operator.
1:
If you have defined deletedDocuments as an array of objects in your schema you could do something like the following:
await Investment.updateOne(
{
"_id": updatedInvestment._id,
// I'm supposing your deletedDocuments have an _id here
"deletedDocuments._id": { $nin: [deletedDoc._id] }
},
{ $push: { "deletedDocuments": deletedDoc } }
);
This will filter if the deletedDoc does not already exists, if it not, it will push it to the array.
2:
If you have defined deletedDocs as an array of docs ObjectIds instead of objects, you could use the $addToSet operator.
The $addToSet operator adds a value to an array unless the value is already present, in which case $addToSet does nothing to that array.
await Investment.updateOne(
{
"_id": updatedInvestment._id,
},
// Be carefull here, you have to "insert" the deletedDoc's _id if your schema is an ObjectId array
{ $addToSet: { "deletedDocuments": deletedDoc._id } }
);
$addToSet mongodb docs
Not In mongodb docs

Mongoose SUM get stacked

I'm trying to make trivial SUM on mongoDB to count number of prices for single client.
My collection:
{"_id":"5d973c71dd93adfbda4c7272","name":"Faktura2019006","clientId":"5d9c87a6b9676069c8b5e15b","expiration":"2019-10-02T01:11:18.965Z","price":999999,"userId":"123"},
{"_id":"5d9e07e0b9676069c8b5e15d","name":"Faktura2019007","clientId":"5d9c87a6b9676069c8b5e15b","expiration":"2019-10-02T01:11:18.965Z","price":888,"userId":"123"}
What I tried:
// invoice.model.js
const mongoose = require("mongoose");
const InvoiceSchema = mongoose.Schema({
_id: String,
name: String,
client: String,
userId: String,
expiration: Date,
price: Number
});
module.exports = mongoose.model("Invoice", InvoiceSchema, "invoice");
and
// invoice.controller.js
const Invoice = require("../models/invoice.model.js");
exports.income = (req, res) => {
console.log("Counting Income");
Invoice.aggregate([
{
$match: {
userId: "123"
}
},
{
$group: {
total: { $sum: ["$price"] }
}
}
]);
};
What happen:
When I now open a browser and code above is being called, I get console log 'Counting Income' in terminal however in browser it's just loading forever and nothing happen.
Most likely I just miss some stupid minor thing but I'm trying to find it out for quite a long time without any success so any advise is welcome.
The reason that the controller never finishes is because you are not ending the response process (meaning, you need to use the res object and send something back to the caller).
In order to get the aggregate value, you also need to execute the pipeline (see this example).
Also, as someone pointed out in the comments, you need to add _id: null in your group to specify that you are not going to group by any specific field (see the second example here).
Finally, in the $sum operator, for what you're trying to do, you just need to remove the array brackets since you only want to sum on a single field (see a few examples down here).
Here is the modified code:
// invoice.controller.js
const Invoice = require("../models/invoice.model.js");
exports.income = (req, res) => {
console.log("Counting Income");
Invoice.aggregate([
{
$match: {
userId: "123"
}
},
{
$group: {
_id: null,
total: { $sum: "$price" }
}
}
]).then((response) => {
res.json(response);
});
};
Edit for your comment about when an empty array is returned.
If you want to always return the same type of object, I would control that in the controller. I'm not sure if there is a fancy way to do this with the aggregate pipeline in mongo, but this is what I would do.
Invoice.aggregate([
{
$match: {
userId: "123"
}
},
{
$group: {
_id: null,
total: { $sum: "$price" }
}
},
{
$project: {
_id: 0,
total: "$total"
}
}
]).then((response) => {
if (response.length === 0) {
res.json({ total: 0 });
} else {
// always return the first (and only) value
res.json(response[0]);
}
});
Here, if you find a userId of 123, then you would get this as the return:
{
"total": 1000887
}
But if you change the userId to, say, 1123 which doesn't exist in your db, the result will be:
{
"total": 0
}
This way, your client can always consume the same type of object.
Also, the reason I put the $project pipeline stage in there was to suppress the _id field (see here for more info).

How to $match multiple values for MongoDB Mongoose query

I am trying to find specific fields for more than one value. For example, I have a database with different countries and I am trying to retrieve their name, year, and nominalGDP (renamed to y in the result for some other important reason). It works perfect for this example, where I am only retrieving from USA, but how would I add another country like China or whatever?
Country.aggregate([
{
$match: {
name: "USA"
}
},
{
$project: {
_id: 0,
name: 1,
year: 1,
'y' : '$nominalGDP'
}
}
], function(err, recs){
if(err){
console.log(err);
} else {
console.log(recs);
}
});
This is probably really simple but I have not been able to find out how.
Use $in operator to specify more than one matching option. For example:
{
$match: {
name: { $in: [ "USA", "China" ] }
}
}
`const results = await SchemaName.find([{$match:{name:{$in:["USA","China"]}}}])
res.status(200).json(results);`
but if you are getting the country names from frontend through the body or something then:
` const allCountries = req.body.allCountries;
var result;
for(let i=0; i < allCountries.length; i++){
result = await SchemaName.find([{$match:{name:{$in:allCountries[i]}}}])
}
`
assuming you are building asynchronous functions...

How do I create a conditional query that has optional $in with Mongoose?

I'm trying to return a list of documents that match a, or b, or c conditions.
Right now, I can only get it to work by matching ALL conditions, not just one...
I have tried this:
return User
.find()
.where({ $or: [
{ $in: skillTags },
{ $in: roleTags }
]})
But I get an error.
This one works but is not what I want as it only returns results that match both a skillTag and a roleTag. I want docs that match at least one or the other or both:
return User
.find()
.where({
skillTags: { $in: skillTags }
}, {
roleTags: { $in: roleTags }
})
This works to find docs that match either a skillTag or a roleTag:
return User
.find({
$or: [{
skillTags: {
$in: skillTags
}
}, {
roleTags: {
$in: roleTags
}
}]
})

Resources