i am trying to use aggregate framework in mongo for some data stats. the query i am using, when run on local is hardly taking a a minute but when i run the same query on server it does not give response and after keep on waiting for too long , i had to cancel it, can anyone please suggest why is this happening.
var orderIds = db.delivery.find({"status":"DELIVERED"}).map(function(o) {
return o.order
});
var userIds = db.order.aggregate([{
$match : { _id : { $in : orderIds } }
}, {
$group: { _id : "$customer" }
}]).map(function(u) { return u._id });
var userstats = db.order.aggregate([{
$sort : { customer : 1, dateCreated : 1 }
}, {
$match : { status : "DELIVERED", customer : { $in : userIds } }
}, {
$group: {
_id : "$customer", orders : { $sum : 1 },
firstOrderDate : { $first : "$dateCreated" },
lastOrderDate : { $last : "$dateCreated" }
}
}]);
userstats.forEach(function(x) {
db.user.update({ _id : x._id }, {
$set : {
totalOrders : x.orders,
firstOrderDate : x.firstOrderDate,
lastOrderDate : x.lastOrderDate
}
})
})
I am not sure , but shouldn't it be more fast on server ? , but instead its not able to give output.
To speed up the process you could refactor your operations in a couple of ways.
The first would be to eliminate unnecessary pipeline operations like the $sort operator which could be replaced with the $max and $min operators within the $group pipeline.
Secondly, use the bulk() API which will increase perfromance on update operations especially when dealing with large collections since they will be sending the operations to the server in batches (for example, say a batch size of 500) unlike sending every request to the server (as you are currently doing with the update statement within the forEach() loop).
Consider the following refactored operations:
var orderIds = db.delivery.find({"status": "DELIVERED"}).map(function(d){return d.order;}),
counter = 0,
bulk = db.user.initializeUnorderedBulkOp();
var userstatsCursor = db.orders.aggregate([
{ "$match": { "_id": { "$in": orderIds } } },
{
"$group": {
"_id": "$customer",
"orders": { "$sum": 1 },
"firstOrderDate": { "$min": "$dateCreated" },
"lastOrderDate":{ "$max": "$dateCreated" } }
}
}
]);
userstatsCursor.forEach(function (x){
bulk.find({ "_id": x._id }).updateOne({
"$set": {
"totalOrders": x.orders,
"firstOrderDate": x.firstOrderDate,
"lastOrderDate": x.lastOrderDate
}
});
counter++;
if (counter % 500 == 0) {
bulk.execute(); // Execute per 500 operations and
// re-initialize every 500 update statements
bulk = db.user.initializeUnorderedBulkOp();
}
});
// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }
I recommend you make $match the first operation in your pipeline as the $match operator can only use an index if it is first in the aggregation pipeline:
var userstats = db.order.aggregate([{
$match : {
status :"DELIVERED",
customer : { $in : userIds }
}
}, {
$sort : {
customer : 1,
dateCreated : 1
}
}, {
$group : {
_id : "$customer",
orders : { $sum : 1 },
firstOrderDate: { $first : "$dateCreated" },
lastOrderDate : { $last:"$dateCreated" }
}
}]);
You should also add an index on status and customer if you have not already defined one:
db.delivery.createIndex({status:1,customer:1})
Related
Is this possible to get n number of documents from the collection, and fetching should start from specific object. e.g. if a collection has 100 documents then I need 10 documents started from 46 [ specific id ]. i.e. 46-55
return new Promise((resolve, reject) => {
return products_model.find({'from_id' : 9677270841774}).limit(limit).exec((err, records) => {
if(err)
reject(err)
else
resolve(records)
})
})
Edit: The original document structure is as follows:
{
"_id" : ObjectId("6128a0d9cf34c208c30e6800"),
"id" : 3238425384636,
"title" : "i3jMH8CHPWY6Ru18KrmsDGdiyl2qDuFjxXD1M4yCzJHrOmSF8v",
"body_html" : "This is body",
"vendor" : NumberInt(1),
"__v" : NumberInt(0)
},
{
"_id" : ObjectId("6128a0d9cf34c208c30e6805"),
"id" : 30336405569734,
},
{},
{},
{},
{}
and the API is
http://localhost:3000/products.json?since_id=30336405569734&limit=10
Example with 10ids, we want to get 4 ids starting from 4
filter id>=4
sort asceding id
limit 2
For you data you want read the query string and then you want, id>=30336405569734 and limit 10 .(numbers not strings)
Test code here
You can use the aggregation in mongoose.
db.collection.aggregate([
{
"$match": {
"id": {
"$gte": 4 // "$gte": 30336405569734 (for your data)
}
}
},
{
"$sort": {
"id": 1
}
},
{
"$limit": 2 // "$limit": 10 (for your data)
}
])
Or write a find in mongoose, i don't use mongoose but i think it will be like this.
products_model.find({ id: { $gte: 30336405569734 } })
.sort({id: 1})
.limit(10)
.then(products => {
console.log(products)
});
or
await products_model.find({ id: { $gte: 30336405569734 } })
.sort({id: 1})
.limit(10));
I have the following documents in my collection. Need to get all employees in stores using the aggregate function.
//Store 1
{
"_id" : ObjectId("5b0d3fa6b426ea12ec0f6e5a"),
"store_name": KFC
"employees":[
ObjectId("5b0d4c5ec47e6223a08af5fd"), //query id
ObjectId("5b3b0ea9074f944699f1bcfc"),
ObjectId("5b11558d0a50c067a91875e9"),
],.. },
//Store 2
{
"_id" : ObjectId("5b0d3fa6b426ea12ec0f6e5a"),
"store_name": McDonalds
"employees":[
ObjectId("5b0d4c5ec47e6223a08af5fd"),
ObjectId("5b3b0ea9074f944699f1bcfc"),
ObjectId("5b11558d0a50c067a91875e9"),
],.. },
//Store 3
{
"_id" : ObjectId("5b0d3fa6b426ea12ec0f6e5a"),
"store_name": Dominos
"employees":[
ObjectId("5b0d4c5ec47e6223a08af5fd"),
ObjectId("5b1623905bc92d76abfe0ab1"),
ObjectId("5b14e0b1fc1507569f830f7d")
],.. }
Using aggregate function
db.getCollection('stores').aggregate([
{
$match:{
"employees":{
$in:[ ObjectId("5b0d4c5ec47e6223a08af5fd")] //employee_id
}
}
},{
$unwind: "$employees"
},{
$group: {
"_id": null,
"emps": {
$addToSet: "$employees"
}
}
}
])
OUTPUT
{
"_id" : null,
"emps" : [
ObjectId("5b0d4c5ec47e6223a08af5fd"), // employee id
ObjectId("5b3b0ea9074f944699f1bcfc"),
ObjectId("5b11558d0a50c067a91875e9"),
ObjectId("5b1623905bc92d76abfe0ab1"),
ObjectId("5b14e0b1fc1507569f830f7d")
]
}
its fine. I need to get this result without the employee queried id. how i handle it.
Here, How can I remove the queried employee id & need to get result like this
{
"_id" : null,
"emps" : [
ObjectId("5b3b0ea9074f944699f1bcfc"),
ObjectId("5b11558d0a50c067a91875e9"),
ObjectId("5b1623905bc92d76abfe0ab1"),
ObjectId("5b14e0b1fc1507569f830f7d")
]
}
You can use $filter and repeat your $in condition insinde $not:
db.getCollection('stores').aggregate([
// your pipeline,
{
$addFields: {
emps: {
$filter: {
input: "$emps",
as: "emp",
cond: { $not: { $in: [ "$$emp", [ ObjectId("5b0d4c5ec47e6223a08af5fd")] ] } }
}
}
}
}
])
$addFields is used to replace existing emp field
Here is my code
Bill.aggregate(
{$unwind:'$detais'},
{ $match : {
createdOn : {
$gt : moment().startOf('day'),
$lt : moment().endOf('day')
}
}
},
{
$group : {
_id : '$detais.product_id',
total : { $sum : '$detais.quantity' },
}
},
{ $sort :{ total: -1 } }
)
.limit(10)
.exec((err, records) => {
if (err) {console.log(err)};
res.send({
data : records
});
});
The query
createdOn : {
$gt : moment().startOf('day'),
$lt : moment().endOf('day')
}
work fine in another case.
But in aggregate is empty result... Please someone tell me where i'm mistake....
You need to apply $and condition in $match.
Bill.aggregate({
$unwind: '$detais'
}, {
$match: {
createdOn: {
$and: [{
$gt: moment().startOf('day')
},
{
$lt: moment().endOf('day')
}]
}
});
I have Post collection as like as following:
{ "_id" : ObjectId(..), "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : ObjectId(..), "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : ObjectId(..), "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : ObjectId(..), "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : ObjectId(..), "date" : ISODate("2014-04-04T21:23:13.331Z") }
I need to get total count and max date of post. So desired result for coeumtns above is the following:
{count: 5, date: ISODate("2014-04-04T21:23:13.331Z")}
How to get desired result with single query to MongoDB without handling and counting in application code?
EDIT: #chridam thanks for the response. I've accepted your answer as best one! Could help me with one more thing?
Let's say that posts are not exists yet, so I need to fetch result with zero count and current date as timestamp like the following:
{count: 0, [Date.now()]}
Is it possible with MongoDB ?
Use the $max and $sum operators as
Model.aggregate([
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"date": { "$max": "$date" }
}
}
]).exec(function (err, result) {
console.log(result);
})
EDIT: Addressing your further question with regards to an empty collection, the aggregate function will return an empty cursor since there wont be any documents to aggregate in the collection. So you would need to address this logic on the client i.e. check the results from the above aggregation, if the result is an empty array then create the placeholder doc as required:
Model.aggregate([
{
"$group": {
"_id": null,
"count": { "$sum": 1 },
"date": { "$max": "$date" }
}
}
]).exec(function (err, result) {
console.log(result);
if (!result.length) {
result = [{ count:0, date: new Date() }];
}
});
My structure.
User:
{
name: "One",
favoriteWorkouts: [ids of workouts],
workouts: [ { name: "My workout 1" },...]
}
I want to get list of favorits/hottest workouts from database.
db.users.aggregate(
{ $unwind : "$favorite" },
{ $group : { _id : "$favorite" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
)
This returns
{
"hot": [
{
"_id": "521f6c27145c5d515f000006",
"number": 1
},
{
"_id": "521f6c2f145c5d515f000007",
"number": 1
},...
]}
But I want
{
hot: [
{object of hottest workout 1, object of hottest workout 2,...}
]}
How do you sort hottest data and fill the result with object, not just ids?
You are correct to want to use MongoDB's aggregation framework. Aggregation will give you the output you are looking for if used correctly. If you are looking for just a list of the _id's of all users' favorite workouts, then I believe that you would need to add an additional $group operation to your pipeline:
db.users.aggregate(
{ $unwind : "$favoriteWorkouts" },
{ $group : { _id : "$favoriteWorkouts", number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $group : { _id : "oneDocumentWithWorkoutArray", hot : { $push : "$_id" } } }
)
This will yield a document of the following form, with the workout ids listed by popularity:
{
"_id" : "oneDocumentWithWorkoutArray",
"hot" : [
"workout6",
"workout1",
"workout5",
"workout4",
"workout3",
"workout2"
]
}