MongoDB Aggregation pipeline python - python-3.x

I have a collection of log files and i am required to find the number of times a system shows a message "Average limit exceeded while connecting ..." in a given date range and display the result for all the systems in the given date range in descending order
Currently my documents in the mongodb collection look like
{'computerName':'APOOUTRDFG',
'datetime': 11/27/2019 10:45:23.123
'message': 'Average limit ....'
}
So, I have tried filtering my result by first matching the message string and then grouping them by computer name but this does not help out the case
db.collection.aggregate([
{ "$match": {
'message': re.compile(r".*Average limit.*")
},
{ "$group": {
"_id": { "$toLower": "$computerName" },
"count": { "$sum": 1 }
} }
])
Expected results
Date : 01-01-2012 to 31-01-2012
Computer Name Number of Average limit exceeded
computername1 120
computername2 83
computername3 34

Assuming you have the following data in DB:
[
{
"computerName": "APOOUTRDFG",
"datetime": "11/27/2019 10:45:23.123",
"message": "Average limit ...."
},
{
"computerName": "BPOOUTRDFG",
"datetime": "01/02/2012 10:45:23.123",
"message": "Average limit ...."
},
{
"computerName": "CPOOUTRDFG",
"datetime": "01/30/2012 10:45:23.123",
"message": "Average limit ...."
},
{
"computerName": "DPOOUTRDFG",
"datetime": "01/30/2012 10:45:23.123",
"message": "Some other message ...."
}
]
Note: 'datetime' is format %m/%d/%Y %H:%M:%S.%L and input date range is in the format: %d-%m-%Y
The following query can get you the expected output:
db.collection.aggregate([
{
$match:{
"message": /.*Average limit.*/i,
$expr:{
$and:[
{
$gte:[
{
$dateFromString:{
"dateString":"$datetime",
"format":"%m/%d/%Y %H:%M:%S.%L"
}
},
{
$dateFromString:{
"dateString":"01-01-2012",
"format":"%d-%m-%Y"
}
}
]
},
{
$lte:[
{
$dateFromString:{
"dateString":"$datetime",
"format":"%m/%d/%Y %H:%M:%S.%L"
}
},
{
$dateFromString:{
"dateString":"31-01-2012",
"format":"%d-%m-%Y"
}
}
]
}
]
}
}
},
{
$group:{
"_id":{
$toLower:"$computerName"
},
"count":{
$sum:1
}
}
}
]).pretty()
Recommended: Its better to save date as ISODate or as timestamp in DB.

Related

MongoDB Aggregate - How to use values of previous stage as a field name in next stage?

I have an aggregation query like this:
...
{
'$unwind': {
path: '$modifier'
}
}, {
'$group': {
'_id': {
'date': {
'$dateToString': {
'format': '%d/%m/%Y',
'date': '$updatedTime'
}
}
},
'$$modifier': { '$sum': 1 }
}
},
...
and I would like to use modifier values, a result of the previous stage ($unwind), as a field in the next stage ($group). The detail is in the picture below. How should I accomplish it?
MongoDB aggregation question detailed picture
Current:
This is the output of $unwind stage:
updatedTime:2020-03-27T11:02:43.608+00:00
modifier:"james#email.com"
updatedTime:2020-03-27T11:02:43.608+00:00
modifier:"eric#email.com"
This is the output of $group stage :
_id: { date:"27/03/2020" }
modifier:1
Expected:
the output of $unwind stage:
updatedTime:2020-03-27T11:02:43.608+00:00
modifier:"james#email.com"
updatedTime:2020-03-27T11:02:43.608+00:00
modifier:"eric#email.com"
This is the output of $group stage:
_id: { date:"27/03/2022" }
james#email.com:1
eric#email.com:1
total:2
Notice that "james#email.com" and "eric#email.com" come from $unwind stage which is before $group stage. total is the total of modifier (examples: 'james#email.com' and 'eric#email.com') values.
Any suggestion is appreciated.
It's not straight but You need to group by both the properties date, email and do another group by only date and construct the array of modifiers and do replace that in root,
$group by updatedTime and modifier and get total count
$group by only date property and construct the array of object of modifier and count in key-value pair
$arrayToObject convert that key-value pair into object
$mergeObject to merge required properties like we added date property and above array to object operation result
$replaceRoot to replace above merged object in root of the document
{
"$group": {
"_id": {
"date": {
"$dateToString": {
"format": "%d/%m/%Y",
"date": "$updatedTime"
}
},
"modifier": "$modifier"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.date",
"modifiers": {
"$push": {
"k": "$_id.modifier",
"v": "$count"
}
},
"total": { "$sum": "$count" }
}
},
{
"$replaceRoot": {
"newRoot": {
"$mergeObjects": [
{
"date": "$_id",
"total": "$total"
},
{ "$arrayToObject": "$modifiers" }
]
}
}
}

Elasticsearch query is not returning a total value of a summed field

I have the following elasticsearch query. It's being executed as part of AWS Amplify serverless backend.
const elasticBody = {
...defaultBody,
aggs: {
points: {
date_histogram: {
field: "createdAt",
interval: "day",
},
aggs: {
points: {
sum: {
field: "points",
},
},
},
},
total: {
sum: {
field: "points",
},
},
},
};
const data = await search(index, elasticBody);
I get the following response, which is most of what I'm attempting to get, however, the 'total' value, in the lower portion of the query is not yielding a result.
I've been poring over the Elasticsearch documentation but I'm unable to find a solution.
I was expecting the following structure in the response.
count: x,
data: [{...}],
I was expecting the count to be the summed value of all the points within the returned data set.
You need to use Sum Bucket Aggregation to get total sum of return bucket response. Please check below query:
"aggs": {
"points": {
"date_histogram": {
"field": "createdAt",
"interval": "day"
},
"aggs": {
"points": {
"sum": {
"field": "points"
}
}
}
},
"total":{
"sum_bucket": {
"buckets_path": "points>points"
}
}
}

Mongodb Aggregate with natural order

I am looking for a way to query database to fetch last saved entry in the collection.
I have a function which saves the document to the collection this is hwo my saved collection looks like.
{
"_id": {
"$oid": "5ebbf2b4586b4946226e2c88"
},
"name": "Stept_1",
"description": "",
"coordinates": {
"pose": {
"$numberInt": "0"
},
"x": {
"$numberDouble": "-9.760518723445719"
},
"y": {
"$numberDouble": "-3.4586615766853854"
},
"z": {
"$numberInt": "0"
}
},
"depth": {
"$numberInt": "1"
},
"_neighbours": [],
"optional": {},
"__v": {
"$Reference": "1111111"
}
}
Each document is saved with name in ascending order e.g. name:Step_1, Step_2 etc.
I have tried fetching the last saved documents using aggregate method like
db.collection.aggregate([
{ $sort: { name: -1 } },
{ $group: { _id: "$Reference", name: { $first: "$name" } } }
])
This returns the data in ascending order till name:Step_10 i.e. "Step_1,2,3,..." but once after name:Step_1o if I restart the app and again start fetching the last saved document the index of ascending order returns to name:Step_9 which then lead some duplicate entries of few documents. What I am looking for is it should always return follow natural order(i.e. Step_1,2,3,..11,12,13 etc) while fetching those documents.
Thanks in advance.
You can use collation to specify numericOrdering like so:
db.collection.aggregate([
{ $sort: { name: -1 } },
{ $group: { _id: "$Reference", name: { $first: "$name" } } }
],
{ collation : {
locale: "en_US",
numericOrdering: true
}}
)
From mongo docs:
numericOrdering boolean
Optional. Flag that determines whether to compare numeric strings as numbers or as strings.
If true, compare as numbers; i.e. "10" is greater than "2".
If false, compare as strings; i.e. "10" is less than "2".
Default is false.

MongoDB how to filter by two fields(id and month) and also return a sum by all the filtered documents?

Hi I am working on writing a Node/Express endpoint and I am trying to write a MongoDB query using Mongoose to return a sum value of my documents that have an expiration within the current day or month.
Currently, I am just trying to return the sum for the current day.
First I want to match all documents that contains the user's id.
Then I want to match all documents within the current day.
Lastly, I want to return the sum of the $amount field for all those
documents.
I know that the syntax for this is wrong and I am having a hard time find good examples matching what I am doing.
let now = Date.now(),
oneDay = 1000 * 60 * 60 * 24,
today = new Date(now - (now % oneDay)),
tomorrow = new Date(today.valueOf() + oneDay);
router.get("/user_current_month/:id", [jsonParser, jwtAuth], (req, res) => {
return Expense.aggregate([
{
$match: { user: new mongoose.Types.ObjectId(req.params.id) }
},
{
$match: {
$expiration: {
$gte: today,
$lt: tomorrow
}
}
},
{
$group: {
_id: "$month",
total: {
$sum: "$amount"
}
}
}
])
.then(sum => {
res.status(200).json(sum);
})
.catch(err => res.status(500).json({ message: "Something went terribly wrong!" }));
});
This is what a document looks like:
{
"_id": {
"$oid": "5daea018b3d92643fc2c8e6c"
},
"user": {
"$oid": "5d9929f065ef083d2cdbf66c"
},
"expense": "Post 1",
"expenseType": "One Time",
"notes": "",
"amount": {
"$numberDecimal": "100"
},
"expiration": {
"$date": "2019-10-22T06:22:01.628Z"
},
"status": "Active",
"createdAt": {
"$date": "2019-10-22T06:22:16.644Z"
},
"__v": 0
}
You need to pass the two match expressions in a AND clause and then group to perform the addition operation. You need to do something like this:
Expense.aggregate([
{ $match: { $and: [{ user: new mongoose.Types.ObjectId(req.params.id) },
{
$expiration: {
$gte: today,
$lt: tomorrow
}
} ]}},
{ $group: {
_id: "$month",
total: {
$sum: "$amount"
}
}}
]);

MongoDB aggregation pipeline with loop

I am having this aggregation pipeline code below that I would like to run for every day of the year! Essentially calculating the minimum, maximum and average temperature ("TEMP" field) for every day of the year. At the moment I am calling this piece of code 365 times, passing the start date and the end date of a day.
Obviously this is very inefficient. Is there any way to loop this within mongo so that its faster, and return an array of 365 average values, 365 min values and 365 max values or something like that. Im using a timezone library to derive the start date and end date.
collection.aggregate([
{
$match:{$and:[
{"UID" : uid},
{"TEMP" :{$exists:true}}
{"site" : "SITE123"},
{"updatedAt": {$gte : new Date(START_DATE_ARG), $lte : new Date(END_DATE_ARG)} }
]}
},
{ "$group": {
"_id": "$UID",
"avg": { $avg: $TEMP },
"min": { $min: $TEMP },
"max": { $max: $TEMP }
}
}
], function(err, result){
if (err){
cb(1, err);
}
else{
cb(0, result);
}
});
});
The datasets look like this
....
{UID: "123", TEMP: 11, site: "SITE123", updatedAt: ISODate("2014-09-12T21:55:19.326Z")}
{UID: "123", TEMP: 10, site: "SITE123", updatedAt: ISODate("2014-09-12T21:55:20.491Z")}
....
Any ideas? Maybe we can pass all the timestamps of all the days of the year in the aggregation pipeline?
Thank you!!
Why run this for every day when you can simply make the date part of the grouping key? This is what the date aggregation operators exist for, so you can aggregate by time frames in a whole period at once without looping:
collection.aggregate([
{ "$match":{
"UID": uid,
"TEMP":{ "$exists": true }
"site" : "SITE123",
"updatedAt": {
"$gte": new Date(START_DATE_ARG),
"$lte": new Date(END_DATE_ARG)
}}
}},
{ "$group": {
"_id": {
"uid": "$UID",
"year": { "$year": "$updatedAt" },
"month": { "$month": "$updatedAt" },
"day": { "$dayOfMonth" }
},
"avg": { "$avg": "$TEMP" },
"min": { "$min": "$TEMP" },
"max": { "$max": "$TEMP" }
}}
])
Or possibly just condensing the date to a timestamp value instead. A little trick of date math with date objects:
collection.aggregate([
{ "$match":{
"UID": uid,
"TEMP":{ "$exists": true }
"site" : "SITE123",
"updatedAt": {
"$gte": new Date(START_DATE_ARG),
"$lte": new Date(END_DATE_ARG)
}}
}},
{ "$group": {
"_id": {
"uid": "$UID",
"date": {
"$subtract": [
{ "$subtract": [ "$updatedAt", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$updatedAt", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
}
},
"avg": { "$avg": "$TEMP" },
"min": { "$min": "$TEMP" },
"max": { "$max": "$TEMP" }
}}
])
Of course your "date range" here is now all of the dates you require to be in the results, so the start and the end dates for all the things where you intended to loop. The grouping is done in either case to reflect "one day", but of course you could change that to any interval you want to.
Also note that your use of $and here is not necessary. Queries in MongoDB "and" conditions by default. The only time you need that operator is for multiple conditions on the same field that would otherwise not be valid JSON/BSON.

Resources