Currently I am making scheduling system and storing information in mongodb
partial of my model looks like this
{
timings: [{from: "00:00", to: "01:00"}],
weekend: [0, 5, 6]
}
I am not sure if this will be good in the long run
can you please help me decide how to better store time in my documents
MongoDB does not have any time/interval data type, so you have to build your own solution. Some general notes:
Use week days according to ISO-8601 standard, i.e. first day (1) of week is Monday. Then it will be easier to create a Date value with $dateFromParts. For hour values use always 24-hour format.
You may consider to store times as
{from: {hour: 0, minute: 0}, to: {hour: 13, minute: 0}}
Otherwise, when you have to create Date value (e.g. for comparison), then it would be:
{
$dateFromParts : {
...,
hour: { $toInt: { $arrayElemAt: [ { $split: [ "$from", ":" ] }, 0 ] } },
minute: { $toInt: { $arrayElemAt: [ { $split: [ "$from", ":" ] }, 1 ] } }
}
}
Compared to
{
$dateFromParts : {
...,
hour: "$from.hour",
minute: "$from.minute"
}
}
Another approach is to store real Date, e.g. 0000-01-01T13:00:00 or any other arbitrary day value. When you use such values, simply ignore the date part.
Or you store number of minutes (0..1'440) or seconds (0..86'400) from midnight. However, such numbers are not user-friendly to read.
Related
I'm making an application with chats and posts, and in the chat, I want to show the time of the most recent post.
The date part of my mongoose schema for posts is below, as well as a picture of what my Mongodb date field looks like.
I'm trying to get the date to the front end, but am unsure about how to format it. I want to format it as yesterday at 12:45pm or 6 days ago, etc.
Any help is very much appreciated.
Mongoose Schema field for date:
date: {
type: Date,
default: Date.now
}
Mongodb data field
Mongo (and almost? no other database in the world) stores constants like "yesterday" or "last week".
The problem with these concepts like "yesterday" is that it's very semantic. if it's 00:01 is yesterday 2min ago? if the answer is yes you will actually have to update your database every minute if you're willing to compromise to look at time difference you will still have to do it every day.
I'm not sure what your actual business needs that make you want to do this. but I recommend you do it whilst fetching documents. otherwise this is not scaleable.
Here is a quick example on how to do this:
db.collection.aggregate([
{
"$addFields": {
currDay: {
"$dayOfMonth": "$$NOW"
},
dateDay: {
"$dayOfMonth": "$date"
},
dayGap: {
"$divide": [
{
"$subtract": [
"$$NOW",
"$date"
]
},
86400000/**miliseconds in a day*/
]
}
}
},
{
$addFields: {
date: {
"$switch": {
"branches": [
{
"case": {
$and: [
{
$lt: [
"$dayGap",
1
]
},
{
$eq: [
"$dateDay",
"$currDay"
]
}
]
},
"then": "today"
},
{
"case": {
$lt: [
"$dayGap",
2
]
},
"then": "yesterday"
},
{
"case": {
$lt: [
"$dayGap",
1
]
},
"then": "today"
}
],
default: {
"$concat": [
{
"$toString": {
"$round": "$dayGap"
}
},
" days ago"
]
}
}
}
}
}
],
{
allowDiskUse: true
})
MongoPlayground
As you can see you'll have to manually construct the "phrase" you want for every single option. You can obviously do the same in code I just choose to show the "Mongoi" way as I feel is more complicated.
If you do end up choosing updating your database ahead of time you can use the same pipeline combined with $out to achieve this.
One final note is that I cheated a little as this aggregation looks at the miliseconds difference only (apart from today field). meaning if it's 1AM then 50 hours ago. even though the date is "three" days ago will still show up as two days ago.
I hope this example shows you why this formatting is not used anywhere and the difficulties it brings. Mind you I haven't even brought up timezones concepts like "yesterday" are even more semantic for different regions.
In my option the only viable "real" solution is to build a custom function that does this in code. mind you this is not so much fun as you have to account for events like gap years, timezones, geographical zone and more, however it is doable.
I am using GRPCv3 on node.js
I have a .proto file with the following message:
message ProductAvailabilityWithRatesResponse {
// How many slots are available for this day/time. Mandatory.
int32 capacity = 1;
// Date for when this product is available. Mandatory.
Date date = 2;
// When does this event start? Unset if this product does not support times.
Time time = 3;
// When does a pickup for this product start? Unset if this product does not support times or pickups.
Time pickupTime = 4;
// Rates with prices. Mandatory (should at least have one entry).
repeated RateWithPrice rates = 5;
}
on the server using console.log i see this output:
{ capacity: 1,
date: { year: 2019, month: 7, day: 1 },
rates: [ { rateId: 1, pricePerPerson: [Object] } ],
time: { hour: 9, minute: 0 } }
and on a client using node.js too:
{ rates:
[ { rateId: '1',
pricePerPerson: [Object],
pricingOptions: 'pricePerPerson' } ],
capacity: 0,
date: { year: 2019, month: 7, day: 1 },
time: { hour: 9, minute: 0 },
pickupTime: null }
but another person using a java client tells me that he sees:
2019-06-26 10:59:39,442 ← getProductAvailability::grpc response {date { year: 2019 month: 7 day: 1 } time { hour: 9 } rates { rateId: "1" pricePerPerson { pricingCategoryWithPrice { pricingCategoryId: "30" price { currency: "EUR" amount: "145" } } pricingCategoryWithPrice { pricingCategoryId: "31" price { currency: "EUR" amount: "150" } } } }}
where capacity is not set.
If it's value is 1 and not 0, everything works well everywhere.
Is it possible?
How can i force the server to output the value?
I already tried using capacity = parseInt(capacity)
I have a MongoDB datastore set up with location data stored like this:
{
"_id" : ObjectId("51d3e161ce87bb000792dc8d"),
"datetime_recorded" : ISODate("2013-07-03T05:35:13Z"),
"loc" : {
"coordinates" : [
0.297716,
18.050614
],
"type" : "Point"
},
"vid" : "11111-22222-33333-44444"
}
I'd like to be able to perform a query similar to the date range example but instead on a time range. i.e. Retrieve all points recorded between 12AM and 4PM (can be done with 1200 and 1600 24 hour time as well).
e.g.
With points:
"datetime_recorded" : ISODate("2013-05-01T12:35:13Z"),
"datetime_recorded" : ISODate("2013-06-20T05:35:13Z"),
"datetime_recorded" : ISODate("2013-01-17T07:35:13Z"),
"datetime_recorded" : ISODate("2013-04-03T15:35:13Z"),
a query
db.points.find({'datetime_recorded': {
$gte: Date(1200 hours),
$lt: Date(1600 hours)}
});
would yield only the first and last point.
Is this possible? Or would I have to do it for every day?
Well, the best way to solve this is to store the minutes separately as well. But you can get around this with the aggregation framework, although that is not going to be very fast:
db.so.aggregate( [
{ $project: {
loc: 1,
vid: 1,
datetime_recorded: 1,
minutes: { $add: [
{ $multiply: [ { $hour: '$datetime_recorded' }, 60 ] },
{ $minute: '$datetime_recorded' }
] }
} },
{ $match: { 'minutes' : { $gte : 12 * 60, $lt : 16 * 60 } } }
] );
In the first step $project, we calculate the minutes from hour * 60 + min which we then match against in the second step: $match.
Adding an answer since I disagree with the other answers in that even though there are great things you can do with the aggregation framework, this really is not an optimal way to perform this type of query.
If your identified application usage pattern is that you rely on querying for "hours" or other times of the day without wanting to look at the "date" part, then you are far better off storing that as a numeric value in the document. Something like "milliseconds from start of day" would be granular enough for as many purposes as a BSON Date, but of course gives better performance without the need to compute for every document.
Set Up
This does require some set-up in that you need to add the new fields to your existing documents and make sure you add these on all new documents within your code. A simple conversion process might be:
MongoDB 4.2 and upwards
This can actually be done in a single request due to aggregation operations being allowed in "update" statements now.
db.collection.updateMany(
{},
[{ "$set": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}}]
)
Older MongoDB
var batch = [];
db.collection.find({ "timeOfDay": { "$exists": false } }).forEach(doc => {
batch.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$set": {
"timeOfDay": doc.datetime_recorded.valueOf() % (60 * 60 * 24 * 1000)
}
}
}
});
// write once only per reasonable batch size
if ( batch.length >= 1000 ) {
db.collection.bulkWrite(batch);
batch = [];
}
})
if ( batch.length > 0 ) {
db.collection.bulkWrite(batch);
batch = [];
}
If you can afford to write to a new collection, then looping and rewriting would not be required:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$subtract": [ "$datetime_recorded", Date(0) ] },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
Or with MongoDB 4.0 and upwards:
db.collection.aggregate([
{ "$addFields": {
"timeOfDay": {
"$mod": [
{ "$toLong": "$datetime_recorded" },
1000 * 60 * 60 * 24
]
}
}},
{ "$out": "newcollection" }
])
All using the same basic conversion of:
1000 milliseconds in a second
60 seconds in a minute
60 minutes in an hour
24 hours a day
The modulo from the numeric milliseconds since epoch which is actually the value internally stored as a BSON date is the simple thing to extract as the current milliseconds in the day.
Query
Querying is then really simple, and as per the question example:
db.collection.find({
"timeOfDay": {
"$gte": 12 * 60 * 60 * 1000, "$lt": 16 * 60 * 60 * 1000
}
})
Of course using the same time scale conversion from hours into milliseconds to match the stored format. But just like before you can make this whatever scale you actually need.
Most importantly, as real document properties which don't rely on computation at run-time, you can place an index on this:
db.collection.createIndex({ "timeOfDay": 1 })
So not only is this negating run-time overhead for calculating, but also with an index you can avoid collection scans as outlined on the linked page on indexing for MongoDB.
For optimal performance you never want to calculate such things as in any real world scale it simply takes an order of magnitude longer to process all documents in the collection just to work out which ones you want than to simply reference an index and only fetch those documents.
The aggregation framework may just be able to help you rewrite the documents here, but it really should not be used as a production system method of returning such data. Store the times separately.
Now that I've had a weekend of banging my head on $project, aggregate(), and $group, it's time for another round of throwing myself on your mercy. I'm trying to do a call where I get back the totals for users, grouped by sex (this was the easier part) and grouped by age range (this is defeating me).
I got it to work with one group:
Person.aggregate([
{
$match: {
user_id: id
}
},
{
$group: {
_id: '$gender',
total: { $sum: 1 }
}
}
])
.exec(function(err, result) {
etc...
From that, it'll give me how many men, how many women in a nice json output. But if I add a second group, it seems to skip the first and throw hissy fits about the second:
Person.aggregate([
{
$match: {
user_id: id
}
},
{
$group: {
_id: '$gender',
total: { $sum: 1 }
},
$group: {
_id: '$age',
age: { $gte: 21 },
age: { $lte: 30 },
total: { $sum: 1 }
}
}
])
.exec(function(err, result) {
etc...
It doesn't like the $gte or $lte. If I switch it to $project, then it'll do the gte/lte but throws fits about $sum or $count. On top of that, I can't find any examples anywhere of how to construct a multi-request return. It's all just "here's this one thing," but I don't want to make 12+ calls just to get all the Person age-groups. I was hoping for output that looks something like this:
[
{"_id":"male","total":49},
{"_id":"woman","total":42},
{"_id":"age0_10", "total": 1},
{"_id":"age11_20", "total": 5},
{"_id":"age21_30", "total": 15}
]
(I have no idea how to make the _id for age be something other than the actual age, which doesn't make sense, b/c I don't want an id of 1517191919 or whatever, I want a reliable name so I know where to output it in my template. So I do know that _id: "$age" won't give me what I want, but I don't know how to get what I want, either.)
The only time I've seen more than one thing, it was a $match, a $group, and a $project. But if $project means I can't use $sum or $count, can I do multiple $groups, and if I can, what's the trick to it?
As for the case of producing the results in different age groupings, the $cond operator of the aggregation framework can help here. As a ternary operator, it takes a logical result ( if condition ) and can return a value where true ( then ) or otherwise where false ( else ). In the case of varying age groups you would "nest" the calls in the else condition to meet each range until logically exhausted.
The overall case is not really practical to do in a single pass with both results for "gender" and "age" in groupings. Whilst it "could" be done, the only method is basically accumulating all data in arrays and working that out again for subsuquent groupings. Not a great idea, as it almost always would break the practical BSON limit of 16MB when attempting to keep the data. So a better approach is generally required.
As such, where the API supports ( you are under nodejs, so it does ), then it is usually best to run each query separately and combine the results. The node async library has just such features:
async.concat(
[
// Gender aggregator
[
{ "$group": {
"_id": "$gender",
"total": { "$sum": 1 }
}}
],
// Age aggregator
[
{ "$group": {
"_id": {
"$cond": {
"if": { "$lte": [ "$age", 10 ] },
"then": "age_0_10",
"else": {
"$cond": {
"if": { "$lte": [ "$age", 20 ] },
"then": "age_11_20",
"else": {
"$cond": {
"if": { "$lte": [ "$age", 30 ] },
"then": "age_21_30",
"else": "age_over_30"
}
}
}
}
}
},
"total": { "$sum": 1 }
}}
]
],
function(pipeline,callback) {
Person.aggregate(pipeline,callback);
},
function(err,results) {
if (err) throw err;
console.log(results);
}
);
The default execution of async.concat here will kick off the tasks to run in parallel, so both can be running on the server at the same time. Each pipeline in the input array will be passed to the aggregate method, which is going to then return the results and combine the output arrays in the final result.
The end result is not only do you have the results nicely keyed to age groups, but the two result sets appear to be in the same combined response, with no other work required to merge the content.
This is not only convenient, but the parallel execution makes this much more time efficient and far less taxing ( if not beating the impossible ) on the aggregation method being used to return the results.
I'm trying to aggregate datas by Date in Mongo, but I can't quite achieve what I want.
Right now, I'm using this:
db.aggregData.aggregate( { $group: {_id: "$Date".toString(),
tweets: { $sum: "$CrawledTweets"} } },
{ $match:{ _id: {$gte: ISODate("2013-03-19T12:31:00.247Z") }}},
{ $sort: {Date:-1} }
)
It results with this:
"result" : [
{
"_id" : ISODate("2013-03-19T12:50:00.641Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:45:00.631Z"),
"tweets" : 114
},
{
"_id" : ISODate("2013-03-19T12:55:00.640Z"),
"tweets" : 123
},
{
"_id" : ISODate("2013-03-19T12:40:00.628Z"),
"tweets" : 91
},
{
"_id" : ISODate("2013-03-19T12:31:00.253Z"),
"tweets" : 43
},
{
"_id" : ISODate("2013-03-19T13:20:00.652Z"),
"tweets" : 125
},
{
"_id" : ISODate("2013-03-19T12:31:00.252Z"),
"tweets" : 30
}
],
"ok" : 1
It seems to do the job, but with further inspection, we see that there is repetition:
ISODate("2013-03-19T12:31:00.253Z") and ISODate("2013-03-19T12:31:00.252Z").
The only thing that changes is the last bit before the Z.
So here is my question. What is this part ? And how can I do to ignore it in the aggregation ?
Thank you in advance.
EDIT: I wanna aggregate by date, so whole year/month/day + hour and minute. I don't care of the rest.
EDIT: My db in on mongolab, so I'm on 2.2
Well, I did it another way: I save all my date with seconds/milliseconds at 0. So I can keep a simple aggregate, with not a little more code server side, thanks to moment.js
You are trying to aggregate by "whole" date, in other words to drop the time from ISODate(), right? There are several ways to do it, I describe them in detail on my blog in the post called
Stupid Date Tricks with Aggregation Framework.
You can see the full step-by-step breakdown there, but to summarize you have two choices:
if you don't care about the aggregated-on value to be an ISODate() then you can use the {$year}, {$month} and {$dayOfMonth} operators in {$project} phase to pull out just Y-M-D to then {$group} on.
if you do care about the grouped-on value staying an ISODate you can {$subtract} the time part in {$project} phase and be left with ISODate() type - the caveat is that this method requires MongoDB 2.4 (just released) which adds support for date arithmetic and for $millisecond operator (see exact code in the blog post).
Here is probably what you want:
db.aggregData.aggregate([
{
$project:{
CrawledTweets: 1,
newDate: {
year:{$year:"$Date"},
month: {$month:"$Date"},
day: {$dayOfMonth:"$Date"},
hour: {$hour: "$Date"},
min: {$minute: "$Date"}
}
}
},
{
$group: {
_id: "$newDate",
tweets: { $sum: "$CrawledTweets"}
}
}
])
Without being a Mongo expert and without knowing your db fields I'd come up with something like this. Perhaps you can build upon this:
db.aggregData.aggregate(
{
$project:{
CrawledTweets: 1,
groupedTime: {
year:{$year:"$_id"},
month: {$month:"$_id"},
day: {$dayOfMonth:"$_id"},
hour: {$hour: "$_id"},
min: {$minute: "$_id"}
}
}
},
{
$group: {
_id: { groupedTime: "$CrawledTweets" },
tweets: { $sum: "$tweets"}
}
}
)
You can now use the MongoDB date aggregation operators, I have a post on my blog that goes over the Schema setup, using it in Node.js, etc:
http://smyl.es/how-to-use-mongodb-date-aggregation-operators-in-node-js-with-mongoose-dayofmonth-dayofyear-dayofweek-etc/