How to organize MongoDB data by time? - node.js

I have an application that's saving data every second to MongoDB. This is important data, but holding data every second forever isn't necessary. After some time, I'd like to run a process (background worker) to clean up this data into hourly chunks, which includes every piece of data (1 per second) for each hour of that day. Kinda like Time Machine does on Mac.
From researching and thinking about it, there's a couple ways I can think of that I can make this happen with:
Mongo aggregators (not sure exactly how this would work)
Node background process with momentjs and sort by date, hour, etc. (really long time)
What's the best way to do this with MongoDB?

I think the Date Aggregation Operators could be better option for your case. Given your schema as below
var dataSchema = new Schema({
// other fields are here...
updated: Date,
});
var Data = mongoose.model('Data', dataSchema );
Just save those data as the normal Date.
Then you can retrieve the hourly chunks through aggregate operation in mongoose, one sample code like,
MyModel.aggregate([
{$match: {$and: [{updated: {$gte: start_date_hour}}, {updated: {$lte: end_date_hour}}]}},
{$group: {
_id: {
year: {$year: "$updated"},
month: {$month: "$updated"},
day: {$dayOfMonth: "$updated"}
// other fields should be here to meet your requirement
},
}},
{$sort: {"date.year":1, "date.month":1, "date.day":1}}
], callback);
For more arguments of aggregate, please refer to this doc.

Related

How to limit users daily post limit (MERN)

I'm currently using passport for authentication and mongodb to store user information.
However I'm stuck trying to limit user's daily post limit. I was thinking of having a field like daily post limit in User Schema and whenever user post something I deduct the count.
const user = new mongoose.Schema({
githubId: {
required: true,
type: String,
},
username: {
required: true,
type: String,
},
dailyPostLimit: {
type: Number,
default: 3,
},
});
However I'm not sure if there's a way to reset that count to default(3) everyday. Is CRON task suitable here or is there a simpler way to accomplish this?
A cron task works well for resetting a value like this one, and caching a value like this one is a reasonable approach to solving this problem. But, keep in mind that you're caching this value, and cache invalidation is a hard problem that can often lead to bugs & additional complexity.
counting posts
Rather than caching, my first instinct would be to count the number of posts each time. Here's some pseudo code:
const count = await posts.count({userId, createdAt: {$gte: startOfDay}});
// alternative: const count = await posts.count({userId, _id: {$gte: startofDayConvertedToMongoId});
if (count > 3) throw new Error("no more posts for you");
await posts.create(newPost)
(note: if you're worried about race conditions, any solution you choose will need to check the count in a transaction)
If you have an index that starts with {userId: 1, createdAt: 1}, or if you use the _id instead {userId: 1, _id: 1} (assuming that you're not allowing client _id creation), these queries will be quite cheap, and it'll be hard for them to get out of sync.
separate cache collection
Even if you do decide to cache these creation values, I'd recommend caching them away from your user's collection to keep your collections more focused. You could create a post_count collection and then update only the cache collection for these counts: post_count.updateOne({userId, day}, {$incr: {count: 1}, $setOnInsert: {day, userId, count: 0}}, {upsert: true});. One nice benefit of this approach is you can use a ttl index on day to have mongo automatically remove the old documents in this collection after they've expired.
Since you are using MongoDB I would suggest,
Use agenda and create a job that runs at UTC 00:00 (If you have diverse users from different time zone) or time zone specific to your user's country.
In this job call updateMany function on your user model to reset dailyPostLimit field.

Passing current time to Mongoose query

I've run into problem. I made field in my Mongoose schema with type "Date":
...
timeOfPassingQuestion: Date,
...
Now, I want to pass current time in hours, minutes, seconds and miliseconds and save it into that field. How should I format my Node variable so I can pass it without errors?
Edit: Also, I forgot to say that I wanna later see how much time user spent answering question by subtracting current time and time that I pulled from DB, timeOfPassingQuestion field.
This is the syntax to create a schema that supports a date field:
// Schema
{ //...
someDate: Date,
}
// date object that you can use whenever you decide to set it
var dateObj = new Date();
This will create a JavaScript date object that you can then pass into your Mongoose object for the date field.
Or, if you will always want it on creation, put it directly in your mongoose schema
{ //...
createdDate: { type: Date, default: Date.now },
}
In order to compare the time in the future, I suggest you use moment.js, then you can query the time difference like so:
moment(Model.createdDate).fromNow();
Sources:
Mongoose Schema
Moment.js fromNow

mongoose aggregate keeps node server blocked

I have couple hundred thousand documents in my collection, each with a timestamp field.
I want to count number of records with respect to each day for the last month.
I'm running a mongoose aggregate command to do that which is unfortunately taking long time than what I was expecting.
Following is the aggregate function:
function dailyStats()
{
var lastMonth = new Date();
lastMonth.setMonth(lastMonth.getMonth() - 1);
MyModel.aggregate(
{
$match: {timestamp: {$gte: lastMonth}}
},
{
$group: {
// Data count grouped by date
_id: {$dateToString: {format: "%Y-%m-%d", date: "$timestamp"}},
count: {$sum: 1}
}
},
{$sort: {"_id": 1}},
function (err, docs)
{
console.log(docs);
});
}
Now, whole point of callbacks is non-blocking code. However, when this function is executed, it takes around 20-25 seconds. For this whole time, my node application doesn't respond to other APIs!
First I thought that my CPU gets so busy that its not responding to anything else. So I had a small node app run along with it, which works fine!
So I don't understand why this application does not respond to other requests till mongodb driver returns with the result.

Finding documents using date arithmetic with Mongoose

I have a mongoose document which has these properties:
DocSchema = new Schema({
issue_date: Date,
days_to_expire: Number
});
DocSchema.virtual.get(function () {
return moment(this.issue_date).add(this.days_to_expire, 'days');
});
I want to find all document that are to expire within a week, is this possible with this schema definition?
The operation I'm looking for is something like this:
today - (issue_date + days_to_expire) <= 7
How can I query something like that using mongoose?
You'll be saving yourself a lot of trouble if you just store the expiration date inside the document as well. Then you're looking at a simple find({ expiration_date: { $gte: today, $lte: todayPlusSeven } }). Otherwise you're left calculating the expiration date in an aggregation and matching based off that.

How to calculate Rating in my MongoDB design

I'm creating a system that users can write review about an item and rate it from 0-5. I'm using MongoDB for this. And my problem is to find the best solution to calculate the total rating in product schema. I don't think querying all comments to get the size and dividing it by total rating is a good solution. Here is my Schema. I appreciate any advice:
Comments:
var commentSchema = new Schema({
Rating : { type: Number, default:0 },
Helpful : { type: Number, default:0 },
User :{
type: Schema.ObjectId,
ref: 'users'
},
Content: String,
});
Here is my Item schema:
var productSchema = new Schema({
//id is barcode
_id : String,
Rating : { type: Number, default:0 },
Comments :[
{
type: Schema.ObjectId,
ref: 'comments'
}
],
});
EDIT: HERE is the solution I got from another topic : calculating average in Mongoose
You can get the total using the aggregation framework. First you use the $unwind operator to turn the comments into a document stream:
{ $unwind: "$Comments" }
The result is that for each product-document is turned into one product-document per entry in its Comments array. That comment-entry is turned into a single object under the field Comments, all other fields are taken from the originating product-document.
Then you use $group to rejoin the documents for each product by their _id, while you use the $avg operator to calculate the average of the rating-field:
{ $group: {
_id: "$_id",
average: { $avg: "$Comments.Rating" }
} }
Putting those two steps into an aggregation pipeline calculates the average rating for every product in your collection. You might want to narrow it down to one or a small subset of products, depending on what the user requested right now. To do this, prepend the pipeline with a $match step. The $match object works just like the one you pass to find().
The underlying question that it would be useful to understand is why you don't think that finding all of the ratings, summing them up, and dividing by the total number is a useful approach. Understanding the underlying reason would help drive a better solution.
Based on the comments below, it sounds like your main concern is performance and the need to run map-reduce (or another aggregation framework) each time a user wants to see total ratings.
This person addressed a similar issue here: http://markembling.info/2010/11/using-map-reduce-in-a-mongodb-app
The solution they identified was to separate out the execution of the map-reduce function from the need in the view to see the total value. In this case, the optimal solution would be to run the map-reduce periodically and store the results in another collection, and have the average rating based on the collection that stores the averages, rather than doing the calculation in real-time each time.
As I mentioned in the previous version of this answer, you can improve performance further by limiting the map-reduce to addresing ratings that were created or updated more recently, or since the last map-reduce aggregation.

Resources