Compare with time part only in mongodb query [duplicate] - node.js

I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?

Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.

Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.

Related

how to store hh(hour):mm(minutes) time in mongodb

Currently I am making scheduling system and storing information in mongodb
partial of my model looks like this
{
timings: [{from: "00:00", to: "01:00"}],
weekend: [0, 5, 6]
}
I am not sure if this will be good in the long run
can you please help me decide how to better store time in my documents
MongoDB does not have any time/interval data type, so you have to build your own solution. Some general notes:
Use week days according to ISO-8601 standard, i.e. first day (1) of week is Monday. Then it will be easier to create a Date value with $dateFromParts. For hour values use always 24-hour format.
You may consider to store times as
{from: {hour: 0, minute: 0}, to: {hour: 13, minute: 0}}
Otherwise, when you have to create Date value (e.g. for comparison), then it would be:
{
$dateFromParts : {
...,
hour: { $toInt: { $arrayElemAt: [ { $split: [ "$from", ":" ] }, 0 ] } },
minute: { $toInt: { $arrayElemAt: [ { $split: [ "$from", ":" ] }, 1 ] } }
}
}
Compared to
{
$dateFromParts : {
...,
hour: "$from.hour",
minute: "$from.minute"
}
}
Another approach is to store real Date, e.g. 0000-01-01T13:00:00 or any other arbitrary day value. When you use such values, simply ignore the date part.
Or you store number of minutes (0..1'440) or seconds (0..86'400) from midnight. However, such numbers are not user-friendly to read.

Elastsearch query causing NodeJS heap out of memory

What's happen now?
Recenly I build a Elasticsearch query. The main function is to get data count per hours until 12 weeks ago.
When the query get call over and over again. NodeJS memory will start from 20mb growing to 1024mb. And surprisingly the memory aren’t immediately get to the top. Its more like stably under 25mb ( maintain about several minutes ) and suddenly start to growing like (25mb,46mb,125mb,350mb...until 1024mb) and finally causing NodeJS memory leak. Whatever I call this query or not, The memory will still growing up and won’t release at all. And this scenario only happen at remote server (running in docker), At local docker env is totally fine (the memory is identical).
enter image description here
How am I query?
like below.
const query = {
"size": 0,
"query": {
"bool": {
"must": [
{ terms: { '_id.keyword': array_id } },
{
"range": {
"date_created": {
"gte": start_timestamp - timestamp_twelve_weeks,
"lt": start_timestamp
}
}
}
]
}
},
"aggs": {
"shortcode_log": {
"date_histogram": {
"field": "date_created",
"interval": "3600ms"
}
}
}
}
What's the return value?
like below ( total query time is around 2 sec ) .
{
"aggs_res": {
"shortcode_log": {
"buckets": [
{
"key": 1594710000,
"doc_count": 2268
},
{
"key": 1594713600,
"doc_count": 3602
},
{//.....total item count 2016
]
}
}
}
If your histogram interval is really of 3600ms(it should not be 3600s ?), it's a really short period of time to do the aggregation on 12 weeks.
It means 0.06 minutes.
24000 periods per day
168000 per week
2016000 for 12 weeks.
It can explain
Why your script wait for a long before doing anything
Why your memory explode when you try to loop on the buckets
In your example, you have 2016 buckets back only.
I think that their is a small difference between your 2 tests.
New update. The issue is solved. The problem in project has a layer between server and DB. So the code of this layer causing the query memory can't release.

Mongoose find time furthest from other times in db

I have a program that runs every 5 minutes and checks the last time a users data was updated. If it's been greater than 4 hours an update routine is called but as the service grows, I've seen some spikes in the number of calls at given times. I want to start spreading out the update times. Since I know each time the program updated each users data last, I was wondering if there was an elegant way to find the largest gap between times and set the new users update time to that?
Here's an example. Given the following data:
{
"_id": "1",
"updatedAt": "2018-01-17T01:12:33.807Z"
},{
"_id": "2",
"updatedAt": "2018-01-17T03:17:33.807Z"
},{
"_id": "3",
"updatedAt": "2018-01-17T02:22:33.807Z"
},{
"_id": "4",
"updatedAt": "2018-01-17T02:37:33.807Z"
}
The largest time between the given updates is 1 hour and 10 minutes between id: 1 and id: 3. I want a function that can find that largest gap of time and returns the a suggested update time for the next item added to the database of '2018-01-17T01:47:33.807Z'. Which was calculated by taking the 1 hour and 10 minutes and dividing it by 2 and then adding it to id: 1's date.
I would also like to spread out all the existing users update time but I suppose that would be a different function.
You can't use aggregation framework for a difference style comparison. However you can use map reduce to get the largest time diff between documents.
Something like
db.col.mapReduce(
function () {
if (typeof this.updatedAt != "undefined") {
var date = new Date(this.updatedAt);
emit(null, date);
}
},
function(key, dates) {
result = {"prev":dates[0].getTime(), "last":dates[0].getTime(), "diff":0}
for (var ix = 1; ix < dates.length; ix++) {
value = dates[ix].getTime();
curdiff = value - result.prev;
olddiff = result.diff;
if(olddiff < curdiff)
result = {"prev":value, "diff":curdiff, "last":result.prev};
}
return result;
},
{
"sort":{"updatedAt":1},
"out": { "inline": 1 },
"finalize":function(key, result) {
return new Date(result.last + result.diff/2);
}
}
)
Aggregation query:
db.col.aggregate([
{"$match":{"updatedAt":{"$exists":true}}},
{"$sort":{"updatedAt":1}},
{"$group":{
"_id":null,
"dates":{"$push":"$updatedAt"}
}},
{"$project":{
"_id":0,
"next":{
"$let":{
"vars":{
"result":{
"$reduce":{
"input":{"$slice":["$dates",1,{"$subtract":[{"$size":"$dates"},1]}]},
"initialValue":{"prev":{"$arrayElemAt":["$dates",0]},"last":{"$arrayElemAt":["$dates",0]},"diff":0},
"in":{
"$cond":[
{"$lt":["$$value.diff",{"$subtract":["$$this","$$value.prev"]}]},
{"prev":"$$this","last":"$$value.prev","diff":{"$subtract":["$$this","$$value.prev"]}},
"$$value"
]
}
}
}
},
"in":{
"$add":["$$result.last",{"$divide":["$$result.diff",2]}]
}
}
}
}}
])

Elasticsearch: aggregation with interval hour

I have ~100 documents coming in per hour. Every document has a viewers property (integer).
By the end of the day I want to aggregate an array of 24 documents, one for every hour of the day, represented by the document with the highest viewers count.
My query so far:
// query, fetch all documents of a specific day
var query = {
bool : {
filter : [
{
range : {
'created' : {
gte : day,
lte : day + (60 * 60 * 24)
}
}
}
]
}
}
// aggregation
var aggs = {
// ?
}
I think this could be achieved using Date Histogram plus Top Hits aggregations. Look at this:
{
"size": 0,
"aggs": {
"articles_over_time": {
"date_histogram": {
"field": "created",
"interval": "minute"
},
"aggs": {
"Viewers": {
"top_hits": {
"size": 1,
"sort": [
{
"viewers": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Date Histogram aggregation will create buckets for each hour in filtered date range and top hits agg will bring back document with highest viewers (we're ordering documents by viewers in descending order and bringing top 1 hit).
Let me know if this works.

Aggregation Framework on Date

I'm trying to aggregate datas by Date in Mongo, but I can't quite achieve what I want.
Right now, I'm using this:
db.aggregData.aggregate( { $group: {_id: "$Date".toString(),
tweets: { $sum: "$CrawledTweets"} } },
{ $match:{ _id: {$gte: ISODate("2013-03-19T12:31:00.247Z") }}},
{ $sort: {Date:-1} }
)
It results with this:
"result" : [
{
"_id" : ISODate("2013-03-19T12:50:00.641Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:45:00.631Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:55:00.640Z"),
"tweets" : 123
},
{
"_id" : ISODate("2013-03-19T12:40:00.628Z"),
"tweets" : 91
},
{
"_id" : ISODate("2013-03-19T12:31:00.253Z"),
"tweets" : 43
},
{
"_id" : ISODate("2013-03-19T13:20:00.652Z"),
"tweets" : 125
},
{
"_id" : ISODate("2013-03-19T12:31:00.252Z"),
"tweets" : 30
}
],
"ok" : 1
It seems to do the job, but with further inspection, we see that there is repetition:
ISODate("2013-03-19T12:31:00.253Z") and ISODate("2013-03-19T12:31:00.252Z").
The only thing that changes is the last bit before the Z.
So here is my question. What is this part ? And how can I do to ignore it in the aggregation ?
Thank you in advance.
EDIT: I wanna aggregate by date, so whole year/month/day + hour and minute. I don't care of the rest.
EDIT: My db in on mongolab, so I'm on 2.2
Well, I did it another way: I save all my date with seconds/milliseconds at 0. So I can keep a simple aggregate, with not a little more code server side, thanks to moment.js
You are trying to aggregate by "whole" date, in other words to drop the time from ISODate(), right? There are several ways to do it, I describe them in detail on my blog in the post called
Stupid Date Tricks with Aggregation Framework.
You can see the full step-by-step breakdown there, but to summarize you have two choices:
if you don't care about the aggregated-on value to be an ISODate() then you can use the {$year}, {$month} and {$dayOfMonth} operators in {$project} phase to pull out just Y-M-D to then {$group} on.
if you do care about the grouped-on value staying an ISODate you can {$subtract} the time part in {$project} phase and be left with ISODate() type - the caveat is that this method requires MongoDB 2.4 (just released) which adds support for date arithmetic and for $millisecond operator (see exact code in the blog post).
Here is probably what you want:
db.aggregData.aggregate([
{
$project:{
CrawledTweets: 1,
newDate: {
year:{$year:"$Date"},
month: {$month:"$Date"},
day: {$dayOfMonth:"$Date"},
hour: {$hour: "$Date"},
min: {$minute: "$Date"}
}
}
},
{
$group: {
_id: "$newDate",
tweets: { $sum: "$CrawledTweets"}
}
}
])
Without being a Mongo expert and without knowing your db fields I'd come up with something like this. Perhaps you can build upon this:
db.aggregData.aggregate(
{
$project:{
CrawledTweets: 1,
groupedTime: {
year:{$year:"$_id"},
month: {$month:"$_id"},
day: {$dayOfMonth:"$_id"},
hour: {$hour: "$_id"},
min: {$minute: "$_id"}
}
}
},
{
$group: {
_id: { groupedTime: "$CrawledTweets" },
tweets: { $sum: "$tweets"}
}
}
)
You can now use the MongoDB date aggregation operators, I have a post on my blog that goes over the Schema setup, using it in Node.js, etc:
http://smyl.es/how-to-use-mongodb-date-aggregation-operators-in-node-js-with-mongoose-dayofmonth-dayofyear-dayofweek-etc/

Resources