mongodb/mongoose aggregate memory usage very big+

mongodb/mongoose aggregate memory usage very big+ - node.js

I have a mongodb into which multiple sensors dump their data once a day to
a mongodb. Each document in essense is: { sid , date, data } (sensor_id, date as date (I only use the date component), and a data array of a couple hundred values.
Now I want to be able to get a overview statistic, for how many sensors I have data for each day. This aggegation works nicely, while I have a few dozens of elements, but even if I have a couple of hundred documents, then the query never finishes.
function dailyStatistic(callback) {
return air
.aggregate( [
{ $match: {} },
{ $group: { _id: { date: '$date' }, myCount: { $sum: 1 } } }
])
.allowDiskUse(true);
}
air is the name of my mongoose collection.
The aggregation should really just return:
[ {date:2017-08-07, myCount: 10}, {date:2017-08-08}, myCount: 26} ]
Now when I watch the machine (via glances) I get CPU_IOWAIT and MEMSWAP errrors, that ultimately will kill the node.js process before it gets the data.
When I check out the collection on robomongo, I can easily browse the
different data points. But also in robomongo, this script never gets me
a result:
db.getCollection('air').find({}).length()
Any ideas?
Thanks Andreas

Probably you do not have an index on date db.getCollection('air').createIndex({date:1})
db.getCollection('air').find({}).length() browse all the results
Instead uses db.getCollection('air').count({})

The best way to do this without crashing MongoDb would be to fetch data for a date range. In your case for 1 day.
function dailyStatistic(dateMin,dateMax,callback) {
return air
.aggregate( [
{ $match: {
date:{$gte:dateMin,$lte:dateMax}} },
{
$project:{
sid:1,
date:1,
data:1,
day: {$day: "$date"},
month: {$month: "$date"},
year: {$year: "$date"}
}
},
{ $group: { _id: {day: "$day",month: "$month", year: "$year"}, myCount: { $sum: 1 } } }
])
.allowDiskUse(true);}
You can take this further by adding pagination when the records available per hour/min is also too huge.
And as pagetronic suggested, create the indexes if you haven't.

Related

How do I get count in MongoDB based on specific fields

I have documents like this in my MongoDB Listings collection.
listingID: 'abcd',
listingData: {
category: 'resedetial'
},
listingID: 'xyz',
listingData: {
category: 'resedetial'
},
listingID: 'efgh',
listingData: {
category: 'office'
}
I am trying to get total count of all listings and count according to category.
I can get total count of listings with aggregation query. But I am not sure how to get output like this resedentialCount: 2, officeCount: 1 , ListingsCount: 3
This is my aggregation query
{
$match: {
listingID,
},
},
{
$group: {
_id: 1,
ListingsCount: { $sum: 1 },
},
}

Try this:
let listingAggregationCursor = db.collection.aggregate([
{$group: {_id:"$listingData.category",ListingsCount:{$sum:1} }}
])
let listingAggregation=await listingAggregationCursor.toArray();
(I got this query from https://www.statology.org/mongodb-group-by-count)
This will give you an array of objects with each listing category as well as how many times they occur.
For getting the total listingsCount, sum up all of the count fields from the array of objects. You can do that like this:
let listingsCount=0;
for(listingCategory of listingAggregation) {
listingsCount+=listingCategory.count;
}
You should have the data you need at this point. Now it's just a matter of extracting and formatting it as you see fit.
Hope this helps!

Mongo db - how to join and sort two collection with pagination

I have 2 collections:
Office -
{
_id: ObjectId(someOfficeId),
name: "some name",
..other fields
}
Documents -
{
_id: ObjectId(SomeId),
name: "Some document name",
officeId: ObjectId(someOfficeId),
...etc
}
I need to get list of offices sorted by count of documetns that refer to office. Also should be realized pagination.
I tryied to do this by aggregation and using $lookup
const aggregation = [
{
$lookup: {
from: 'documents',
let: {
id: '$id'
},
pipeline: [
{
$match: {
$expr: {
$eq: ['$officeId', '$id']
},
// sent_at: {
// $gte: start,
// $lt: end,
// },
}
}
],
as: 'documents'
},
},
{ $sortByCount: "$documents" },
{ $skip: (page - 1) * limit },
{ $limit: limit },
];
But this doesn't work for me
Any Ideas how to realize this?
p.s. I need to show offices with 0 documents, so get offices by documets - doesn't work for me

Query
you can use lookup to join on that field, and pipeline to group so you count the documents of each office (instead of putting the documents into an array, because you only case for the count)
$set is to get that count at top level field
sort using the noffices field
you can use the skip/limit way for pagination, but if your collection is very big it will be slow see this. Alternative you can do the pagination using the _id natural order, or retrieve more document in each query and have them in memory (instead of retriving just 1 page's documents)
Test code here
offices.aggregate(
[{"$lookup":
{"from":"documents",
"localField":"_id",
"foreignField":"officeId",
"pipeline":[{"$group":{"_id":null, "count":{"$sum":1}}}],
"as":"noffices"}},
{"$set":
{"noffices":
{"$cond":
[{"$eq":["$noffices", []]}, 0,
{"$arrayElemAt":["$noffices.count", 0]}]}}},
{"$sort":{"noffices":-1}}])
As the other answer pointed out you forgot the _ of id, but you don't need the let or match inside the pipeline with $expr, with the above lookup. Also $sortByCount doesn't count the member of an array, you would need $size (sort by count is just group and count its not for arrays). But you dont need $size also you can count them in the pipeline, like above.
Edit
Query
you can add in the pipeline what you need or just remove it
this keeps all documents, and counts the array size
and then sorts
Test code here
offices.aggregate(
[{"$lookup":
{"from":"documents",
"localField":"_id",
"foreignField":"officeId",
"pipeline":[],
"as":"alldocuments"}},
{"$set":{"ndocuments":{"$size":"$alldocuments"}}},
{"$sort":{"ndocuments":-1}}])

There are two errors in your lookup
While passing the variable in with $let. You forgot the _ of the $_id local field
let: {
id: '$id'
},
In the $exp, since you are using a variable id and not a field of the
Documents collection, you should use $$ to make reference to the variable.
$expr: {
$eq: ['$officeId', '$$id']
},

nodejs: $lte for date is not working as expected in mongo db

As the question explains, I am trying to query on dates and the result is not as expected. Here is the how the objects, that I am trying to query, look like:
{
"_id":{"$oid":"5f660cfcde436c3035b59648"},
"orderid":"2020-09-19-8939",
"orderdate":"2021-08-09T11:02:31.202+00:00",
"paycondition_ref":{"$oid":"5f211e9690e310990ea6aa5d"},
"receivedate":{"$date":"2020-09-22T05:59:38.211Z"},
"duedate":"2020-09-19T13:51:56.219Z",
"currency":{"$oid":"5f660cfcde436c3035b59647"},
"supplier_ref":{"$oid":"5f2e12286b15925440f03b56"},
"history":false,
"remark":"Clarance - Tuesday"
}
And here is how I am querying it:
"$match": {
"$and": [
{ 'purchaseorder.orderdate': { $gte: new Date(startDate), $lte: new Date(endDate) } },
{ 'purchaseorder.history': false },
]
}
And the startDate and endDate objects look like this:
startDate 2021-08-08T19:00:00.000Z
endDate 2021-08-08T19:00:00.000Z
This should return the object above, shouldn't it? But it doesn't! What am I doing wrong here? How can I resolve this?

use ISODate instead of new Date and remove "purchaseorder"
"$match": {
"$and": [
{ 'orderdate': { $gte: ISODate(startDate), $lte: ISODate(endDate) } },
{ 'history': false },
]
}

Looking at your data object and query I noticed two things,
Why are you fetching data using purchaseorder.orderdate and purchaseorder.history'. You can directly access field
You are storing the date fields as string instead of date type, that is the reason comparison won't work when you do new Date . Change that schema to hold date field as type of date
Also you are storing dates in local Time Zone, so when comparing you should not use UTC to match your existing date. To make it simple it is always recommended to store dates in UTC TZ.
The following comparisons work for me:
db.getCollection("test-code").aggregate({
"$match": {
'orderdate':
{ $gte: new Date("2021-08-08T16:32:31.202+05:30"), $lte: new Date("2021-08-10T16:32:31.202+05:30") },
'history': false,
}
})
Note: I have used local TZ and my fields were of type date in DB 2021-08-10T16:32:31.202+05:30

Combine multiple query with one single $in query and specify limit for each array field?

I am using mongoose with node.js for MongoDB. Now i need to make 20 parallel find query requests in my database with limit of documents 4, same as shown below just brand_id will change for different brand.
areamodel.find({ brand_id: brand_id }, { '_id': 1 }, { limit: 4 }, function(err, docs) {
if (err) {
console.log(err);
} else {
console.log('fetched');
}
}
Now as to run all these query parallely i thought about putting all 20 brand_id in a array of string and then use a $in query to get the results, but i don't know how to specify the limit 4 for every array field which will be matched.
I write below code with aggregation but don't know where to specify limit for each element of my array.
var brand_ids = ["brandid1", "brandid2", "brandid3", "brandid4", "brandid5", "brandid6", "brandid7", "brandid8", "brandid9", "brandid10", "brandid11", "brandid12", "brandid13", "brandid14", "brandid15", "brandid16", "brandid17", "brandid18", "brandid19", "brandid20"];
areamodel.aggregate(
{ $project: { _id: 1 } },
{ $match : { 'brand_id': { $in: brand_ids } } },
function(err, docs) {
if (err) {
console.error(err);
} else {
}
}
);
Can anyone please tell me how can i solve my problem using only one query.
UPDATE- Why i don't think $group be helpful for me.
Suppose my brand_ids array contains these strings
brand_ids = ["id1", "id2", "id3", "id4", "id5"]
and my database have below documents
{
"brand_id": "id1",
"name": "Levis",
"loc": "india"
},
{
"brand_id": "id1",
"name": "Levis"
"loc": "america"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "india"
},
{
"brand_id": "id2",
"name": "Lee"
"loc": "america"
}
Desired JSON output
{
"name": "Levis"
},
{
"name": "Lee"
}
For above example suppose i have 25000 documents with "name" as "Levis" and 25000 of documents where "name" is "Lee", now if i will use group then all of 50000 documents will be queried and grouped by "name".
But according to the solution i want, when first document with "Levis" and "Lee" gets found then i will don't have to look for remaining thousands of the documents.
Update- I think if anyone of you can tell me this then probably i can get to my solution.
Consider a case where i have 1000 total documents in my mongoDB, now suppose out of that 1000, 100 will pass my match query.
Now if i will apply limit 4 on this query then will this query take same time to execute as the query without any limit, or not.
Why i am thinking about this case
Because if my query will take same time then i don't think $group will increase my time as all documents will be queried.
But if time taken by limit query is more than the time taken without the limit query then.
If i can apply limit 4 on each array element then my question will be solved.
If i cannot apply limit on each array element then i don't think $group will be useful, as in this case i have to scan whole documents to get the results.
FINAL UPDATE- As i read on below answer and also on mongodb docs that by using $limit, time taken by query does not get affected it is the network bandwidth that gets compromised. So i think if anyone of you can tell me how to apply limit on array fields (by using $group or anything other than that)then my problem will get solved.
mongodb: will limit() increase query speed?
Solution
Actually my thinking about mongoDB was very wrong i thought adding limit with queries decrease time taken by query but it is not the case that's why i stumbled so many days to try the answer which Gregory NEUT and JohnnyHK Told me to. Thanks a lot both of you guys i must have found the solution at the day one if i had known about this thing. thanks alot for helping me out of here guys i really appreciate it.

I propose you to use the $group aggregation attribute to group all data you got from the $match by brand_id, and then limit the groups of data using $slice.
Look at this stack overflow post
db.collection.aggregate(
{
$sort: {
created: -1,
}
}, {
$group: {
_id: '$city',
title: {
$push: '$title',
}
}, {
$project: {
_id: 0,
city: '$_id',
mostRecentTitle: {
$slice: ['$title', 0, 2],
}
}
})

I propose using distinct, since that will return all different brand names in your collection. (I assume this is what you are trying to achieve?)
db.runCommand ( { distinct: "areamodel", key: "name" } )
MongoDB docs
In mongoose i think it is: areamodel.db.db.command({ distinct: "areamodel", key: "name" }) (Untested)

Mongo aggregate query pulling out last 7 days worth of data (node.js)

I have a large collection of data which I'm trying to pull out of Mongo (node js) in order to render some graphs.
I need to pull the last 7 days worth of data out of a few thousand users. The specific piece of data on each user is formatted like so:
{
"passedModules" :
[{
"module" : ObjectId("53ea17dcac1d13a66fb6d14e"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
},
{
"module" : ObjectId("53ec5c91af2792f1112554e8"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
},
{
"module" : ObjectId("53ec5c5baf2792f1112554e6"),
"date" : ISODate("2014-09-17T00:00:00.000Z")
}]
}
At the moment I have a really messy group of queries which is working, but I believe this is possible to do entirely with Mongo?
Basically, I need to pull out all the entries from 7 days ago until now, in a dynamic fashion.
Is there a tried and testing way of working with dynamic dates in this way, more specifically using the aggregation framework in mongo? The reason for the aggregation framework is that I need to group these afterwards.
Many thanks

The general pattern for this type of query is:
// Compute the time 7 days ago to use in filtering the data
var d = new Date();
d.setDate(d.getDate()-7);
db.users.aggregate([
// Only include the docs that have at least one passedModules element
// that passes the filter.
{$match: {'passedModules.date': {$gt: d}}},
// Duplicate the docs, one per passedModules element
{$unwind: '$passedModules'},
// Filter again to remove the non-matching elements
{$match: {'passedModules.date': {$gt: d}}}
])

#JohnnyHK has a good answer already. But if you are using a querying tool (Like Robot3T or Metabase) and don't have programmatic access to create and change variables, here's another option.
{
"$match": {
"$expr":{
"$gte":[
"$passedModules.date",
{ $add: [new Date(), -604800000]}
]
}
}
}
Where the number 604800000 is just the time in milliseconds: 1000*60*60*24*<Number of Days>
You can make it relative to the week or month as well.
{
"$match": {
"$expr":{
"$eq":[
{ $add:[ {$multiply:[1000, { $year: ["$passedModules.date"] }]}, { $week: ["$passedModules.date"] } ]},
{ $add:[ {$multiply:[1000, { $year: [new Date()] }]}, { $week: [new Date()] } ]}
]
}
}
},
Hope it can help others in the same context as I was.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

mongodb/mongoose aggregate memory usage very big+ - node.js

Probably you do not have an index on date db.getCollection('air').createIndex({date:1}) db.getCollection('air').find({}).length() browse all the results Instead uses db.getCollection('air').count({})

Related

How do I get count in MongoDB based on specific fields

Mongo db - how to join and sort two collection with pagination

nodejs: $lte for date is not working as expected in mongo db

Combine multiple query with one single $in query and specify limit for each array field?

Mongo aggregate query pulling out last 7 days worth of data (node.js)

Categories

Resources