I've created a database schema very similar to the documentation on time series data here.
I ended up with the following documents:
{
"_id" : "unique_name_1",
"data" : {
"2017" : {
"5" : {
"21" : {
"61" : [
{
"timestamp" : 1498162890460.0,
"value" : true
}
],
"84" : [
{
"timestamp" : 1498183202126.0,
"value" : false
}
]
},
"22" : {
"24" : [
{
"timestamp" : 1498215602957.0,
"value" : true
}
],
"61" : [
{
"timestamp" : 1498249322863.0,
"value" : false
}
]
}
},
"9" : {
"16" : {
"36" : [
{
"timestamp" : 1508249031987.0,
"value" : true
},
{
"timestamp" : 1508249429052.0,
"value" : false
}
]
}
}
}
}
}
The first subdocument under 'data' represents the year. The second subdocument represents the month. The third subdocument represents the day. And the fourth subdocument represents the 15 minute interval of the day in which the event happened.
I want to query the database and get the first true (and maybe the immediate false that follows if possible) every day while ignoring all subsequent entries.
The first entry every day may or may not be true. The data does not necessarily always go from true to false back to true. For example, I may have several trues in a row, in which case I would want to ignore all subsequent true values.
This structure is really great for querying specific times, if you know them, but I'm at a loss for querying specific values. Should I just query the entire document and parse it on the front end? That becomes more difficult if I want to run the same query on hundreds of documents.
Thanks for the help.
To analyze your data in context of single day you have to use a long series of $objectToArray combined with $unwind. That's because the only way to figure out what represents day, month, year is to analyze the level of nesting. However once you do that, the rest might be quite simple. Try:
db.col.aggregate([
{
$project: {
data: { $objectToArray: "$data" }
}
},
{
$unwind: "$data"
},
{
$project: {
year: "$data.k",
data: { $objectToArray: "$data.v" }
}
},
{
$unwind: "$data"
},
{
$project: {
year: 1,
month: "$data.k",
data: { $objectToArray: "$data.v" }
}
},
{
$unwind: "$data"
},
{
$project: {
year: 1,
month: 1,
day: "$data.k",
data: { $objectToArray: "$data.v" }
}
},
{
$unwind: "$data"
},
{
$project: {
year: 1,
month: 1,
day: 1,
interval: "$data.k",
data: "$data.v"
}
},
{
$unwind: "$data"
},
{
$match: { "data.value": true }
},
{
$sort: {
"data.timestamp": 1
}
},
{
$group: {
_id: {
year: "$year",
month: "$month",
day: "$day"
},
firstPositive: {
$first: "$data"
}
}
}
])
Note that because you're using key/value pairs rather than an array, you would need to use wildcard keys in order to perform such a query. This simply isn't possible in MongoDB.
That being said, there's a new aggregation operator as of version 3.6 called $objectToArray which may prove useful for resolving this problem. I would recommend taking your research there and attempting to develop the appropriate aggregation pipeline with that operator in mind.
If you get stuck on the pipeline, then that would be a good time to return with a new question showing what you've managed to write up so far. Until then, there's unfortunately not much more that can be done to help you work around your existing document structure.
The alternatives, of course, are either to completely restructure your documents (the nuclear option), which may not be a possibility if you're working on a production system which would require updating all relevant pieces of code, or to process this data directly on the backend instead of through a MongoDB query.
Related
I have a test document like that:
{
"_id" : ObjectId("5fb6b0ed9cad6e97cfc24c2d"),
"dates" : [
{
"date" : ISODate("2020-01-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-02-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-03-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-04-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-05-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-06-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-07-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-08-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-09-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-10-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-11-01T00:00:00.000Z")
},
{
"date" : ISODate("2020-12-01T00:00:00.000Z")
}
]
}
Now I want to retrieve only dates $gt: 2020-14-01T00:00:00.000Z. I tried a lot of combinations but none of them worked for me.
This is one of the queries I tried (taken from Mongodb docs):
db.getCollection('things').find({_id: ObjectId("5fb6b0ed9cad6e97cfc24c2d"), "dates.date": { $gt: new Date("2020-04-01T00:00:00.000Z")} } )
But it return all the document, not the gt... i tried using new Date and new ISODate too, but same effect.
Thank you.
First of all, according to mongo documentation dates are in the format "<YYYY-mm-dd>".
Also more formats are allowed, but If you try to use 2020-14-01 as date it will fail (unless you convert string to date with an specific format) because month is 14.
But, answering the question, you need a query like this:
EDITED
db.collection.aggregate({
"$match": {
"_id": ObjectId("5fb6b0ed9cad6e97cfc24c2d"),
}
},
{
"$project": {
"dates": {
"$filter": {
"input": "$dates",
"as": "item",
"cond": {
"$gt": [
"$$item.date",
ISODate("2020-01-14T00:00:00.000Z")
]
}
}
}
}
})
First $match by _id to get only the document you want. And then using $project to create the fields you want to get. You can filter in the array those values whose field date is greater than your date using $filter and $gt
Note that I've used 2020-01-14 to avoid errors.
Example here.
Also another example using $dateFromString in this query.
Edit: You can use $dateFromString and specify a format. Check this example
My Schema looks something like this.
{
_id: '1',
items: {
'id1': 'item1',
'id2': 'item2',
'id3': 'item3'
}
}
Following is the query
ItemModel.find({}, {
items: 1,
_id: 0
});
And the result of the find query is:
{ "items" : { "21" : "item21", "22" : "item22", "23" : "item23" } }
{ "items" : { "31" : "item31", "32" : "item32", "33" : "item33" } }
{ "items" : { "11" : "item11", "12" : "item32", "13" : "item13" } }
What I want is:
["item21", "item22", "item23",
"item31", "item32", "item33",
"item11", "item12", "item13"]
Currently, I am doing the processing on the node.js end for getting the above. I want to reduce the output payload size coming from MongoDB. The "items" key is redundant and the IDs mentioned are not required as well when I fetch it. Here, the IDs are quite small like 21, 22, 13, etc. but those are acutally 50 characters in length.
If not the above, any other efficient alternatives are also welcome.
One example of how to achieve that is the following aggregation:
[
{
$project: {
items: {
$objectToArray: '$items',
},
},
},
{ $unwind: '$items' },
{
$project: {
items: '$items.v',
},
},
{
$group: {
_id: null,
items: {
$push: '$items',
},
},
}
];
What this does is first we convert with $project & $objectToArray field to an array so that we could use $unwind. This way we'll have documents with different items. Now we convert with another $project to make it a string instead of an object (which would be { v: <value>, k: <value> }. And, finally, we $group them together.
Final result:
To get exactly that list, you'll need in your code to access items field, like result[0].items ([0] because aggregation will return an array).
Here is my sample document:
_id:5c10fd9def1d420ef80007af,
date_claimed:"01-14-2018"
I'm trying to sort my date_claimed in ascending order, it is working properly using 2018 as end year but after I put some 2019 value, it doesn't follow the order. How can I fix this? I should follow the mm-dd-yyyy format
Code:
Bloodrequest.aggregate([{$group: {_id : "$date_claimed" , count:{$sum:1}}},{$sort: {_id: 1}}]
console.log(date2);
Yields:
[ { _id: '01-01-2019', count: 1 }, //this should be the most bottom
{ _id: '01-14-2018', count: 1 },
{ _id: '01-20-2018', count: 1 },
{ _id: '02-13-2018', count: 2 },
{ _id: '03-13-2018', count: 3 },
{ _id: '04-25-2018', count: 1 }]
Since mm-dd-yyyy is not naturally sortable, any solution based on this field content would require a full collection scan and/or full collection manipulation to be able to do this, since you're essentially requiring a custom sort method on a string field.
This will be a major performance killer, and not practical to do in the long term.
I would suggest you store your date_claimed field into a proper ISODate() format, put an index on it for sorting purposes, and do $project (or similar method) into the required mm-dd-yyyy format for output purposes.
For example, if your document is structured like:
> db.test.find()
{
"_id": 0,
"date_claimed": ISODate("2018-01-01T00:00:00Z")
}
{
"_id": 1,
"date_claimed": ISODate("2018-01-02T00:00:00Z")
}
{
"_id": 2,
"date_claimed": ISODate("2019-01-01T00:00:00Z")
}
You can then create an index on the date_claimed field:
> db.test.createIndex({date_claimed:1})
You can display the sorted date in descending order as expected:
> db.test.aggregate([ {$sort: {date_claimed: -1}} ])
{ "_id": 2, "date_claimed": ISODate("2019-01-01T00:00:00Z") }
{ "_id": 1, "date_claimed": ISODate("2018-01-02T00:00:00Z") }
{ "_id": 0, "date_claimed": ISODate("2018-01-01T00:00:00Z") }
You can also use $project to display the date in mm-dd-yyyy format as required. Note that the documents are sorted properly:
> db.test.aggregate([
{$sort: {date_claimed: -1}},
{$project: {date_string: {$dateToString: {format: '%m-%d-%Y', date:'$date_claimed'}}}}
])
{ "_id": 2, "date_string": "01-01-2019" }
{ "_id": 1, "date_string": "01-02-2018" }
{ "_id": 0, "date_string": "01-01-2018" }
See the $dateToString and the $project manual page for more information.
There are a couple of good things from this method:
You can put an index on the date field, removing the necessity to perform ad-hoc manipulation on the whole collection.
By using the ISODate() field type, the whole Date Expression Operators is now available for your use.
Using moment.js converting the date into miliseconds and sorting using .sort()
would be your best bet I think. Something like this perhaps:
date2.map(res => {
let timeInMs = moment(res._id).unix() //gives you time in seconds
/* */
}).sort()
Try something like below:
Bloodrequest.aggregate([
{
$group: {
_id :{
"day": { "$dayOfMonth" : "$date_claimed" },
"month": { "$month" : "$date_claimed" },
"year": { "$year" : "$date_claimed"}
} ,
count:{$sum:1}
}
},
{
$sort: {
"_id.year": 1,
"_id.month": 1,
"_id.day": 1
}
}
]
Kevin's answer is the ideal solution. But if you're stuck not being able to change the schema for any reason, then you can use this solution:
Bloodrequest.dates.aggregate([
{
$group: { _id: "$date_claimed", count: { $sum: 1 } }
},
{
$addFields: {
date: {
$dateFromString: {
dateString: "$_id",
format: "%m-%d-%Y"
}
}
}
},
{
$sort:
{ date: 1 }
}
])
As highlighted by Kevin, this is a performance killer. Should be used only when the data size is small and/or the frequency of this query being used is very low.
I want to build online test application using mongoDB and nodeJS. And there is a feature which admin can view user test history (with date filter option).
How to do the query, if I want to display only user which the test results array contains date specified by admin.
The date filter will be based on day, month, year from scheduledAt.startTime, and I think I must use aggregate framework to achieve this.
Let's say I have users document like below:
{
"_id" : ObjectId("582a7b315c57b9164cac3295"),
"username" : "lalalala#gmail.com",
"displayName" : "lalala",
"testResults" : [
{
"applyAs" : [
"finance"
],
"scheduledAt" : {
"endTime" : ISODate("2016-11-15T16:00:00.000Z"),
"startTime" : ISODate("2016-11-15T01:00:00.000Z")
},
"results" : [
ObjectId("582a7b3e5c57b9164cac3299"),
ObjectId("582a7cc25c57b9164cac329d")
],
"_id" : ObjectId("582a7b3e5c57b9164cac3296")
},
{
.....
}
],
"password" : "andi",
}
testResults document:
{
"_id" : ObjectId("582a7cc25c57b9164cac329d"),
"testCategory" : "english",
"testVersion" : "EAX",
"testTakenTime" : ISODate("2016-11-15T03:10:58.623Z"),
"score" : 2,
"userAnswer" : [
{
"answer" : 1,
"problemId" : ObjectId("581ff74002bb1218f87f3fab")
},
{
"answer" : 0,
"problemId" : ObjectId("581ff78202bb1218f87f3fac")
},
{
"answer" : 0,
"problemId" : ObjectId("581ff7ca02bb1218f87f3fad")
}
],
"__v" : 0
}
What I have tried until now is like below. If I want to count total document, which part of my aggregation framework should I change. Because in query below, totalData is being summed per group not per whole returned document.
User
.aggregate([
{
$unwind: '$testResults'
},
{
$project: {
'_id': 1,
'displayName': 1,
'testResults': 1,
'dayOfTest': { $dayOfMonth: '$testResults.scheduledAt.startTime' },
'monthOfTest': { $month: '$testResults.scheduledAt.startTime' },
'yearOfTest': { $year: '$testResults.scheduledAt.startTime' }
}
},
{
$match: {
dayOfTest: date.getDate(),
monthOfTest: date.getMonth() + 1,
yearOfTest: date.getFullYear()
}
},
{
$group: {
_id: {id: '$_id', displayName: '$displayName'},
testResults: {
$push: '$testResults'
},
totalData: {
$sum: 1
}
}
},
])
.then(function(result) {
res.send(result);
})
.catch(function(err) {
console.error(err);
next(err);
});
You can try something like this. Added the project stage to keep the test results if any of result element matches on the date passed. Add this as the first step in the pipeline and you can add the grouping stage the way you want.
$map applies an equals comparison between the date passed and start date in each test result element and generates an array with true and false values. $anyElementTrue inspects this array and returns true only if there is atleast one true value in the array. Match stage to include only elements with matched value of true.
aggregate([{
"$project": {
"_id": 1,
"displayName":1,
"testResults": 1,
"matched": {
"$anyElementTrue": {
"$map": {
"input": "$testResults",
"as": "result",
"in": {
"$eq": [{ $dateToString: { format: "%Y-%m-%d", date: '$$result.scheduledAt.startTime' } }, '2016-11-15']
}
}
}
}
}
}, {
"$match": {
"matched": true
}
}])
Alternative Version:
Similar to the above version but this one combines both the project and match stage into one. The $cond with $redact accounts for match and when match is found it keeps the complete tree or else discards it.
aggregate([{
"$redact": {
"$cond": [{
"$anyElementTrue": {
"$map": {
"input": "$testResults",
"as": "result",
"in": {
"$eq": [{
$dateToString: {
format: "%Y-%m-%d",
date: '$$result.scheduledAt.startTime'
}
}, '2016-11-15']
}
}
}
},
"$$KEEP",
"$$PRUNE"
]
}
}])
I have a collection db.activities, each item of which has a dueDate. I need to present data in a following format, which basically a list of activities which are due today and this week:
{
"today": [
{ _id: 1, name: "activity #1" ... },
{ _id: 2, name: "activity #2" ... }
],
"thisWeek": [
{ _id: 3, name: "activity #3" ... }
]
}
I managed to accomplish this by simply querying for the last week's activities as a flat list and then grouping them with javascript on the client, but I suspect this is a very dirty solution and would like to do this on server.
look up mongo aggregation pipeline.
your aggregation has a match by date, group by date and a maybe a sort/order stage also by date.
lacking the data scheme it will be along the lines of
db.collection.aggregate([{ $match: {"duedate": { "$gte" : start_dt, "$lte" : end_dt} } ,
{ $group: {_id: "$duedate", recordid : "$_id" , name: "$name" },
{"$sort" : {"_id" : 1} } ] );
if you want 'all' records remove the $match or use { $match: {} } as one does with find.
in my opinion, you cannot aggregate both by day and week within one command. the weekly one may be achieved by projecting duedate using mongos $dayOfWeek. along the lines of
db.collection.aggregate([
{ $match: {"duedate": { "$gte" : start_dt, "$lte" : end_dt} } ,
{ $project : { dayOfWeek: { $dayOfWeek: "$duedate" } },
{ $group: {_id: "$dayOfWeek", recordid : "$_id" , name: "$name" },
{"$sort" : {"_id" : 1} } ] );
check out http://docs.mongodb.org/manual/reference/operator/aggregation/dayOfWeek/