How to limit users daily post limit (MERN) - node.js

I'm currently using passport for authentication and mongodb to store user information.
However I'm stuck trying to limit user's daily post limit. I was thinking of having a field like daily post limit in User Schema and whenever user post something I deduct the count.
const user = new mongoose.Schema({
githubId: {
required: true,
type: String,
},
username: {
required: true,
type: String,
},
dailyPostLimit: {
type: Number,
default: 3,
},
});
However I'm not sure if there's a way to reset that count to default(3) everyday. Is CRON task suitable here or is there a simpler way to accomplish this?

A cron task works well for resetting a value like this one, and caching a value like this one is a reasonable approach to solving this problem. But, keep in mind that you're caching this value, and cache invalidation is a hard problem that can often lead to bugs & additional complexity.
counting posts
Rather than caching, my first instinct would be to count the number of posts each time. Here's some pseudo code:
const count = await posts.count({userId, createdAt: {$gte: startOfDay}});
// alternative: const count = await posts.count({userId, _id: {$gte: startofDayConvertedToMongoId});
if (count > 3) throw new Error("no more posts for you");
await posts.create(newPost)
(note: if you're worried about race conditions, any solution you choose will need to check the count in a transaction)
If you have an index that starts with {userId: 1, createdAt: 1}, or if you use the _id instead {userId: 1, _id: 1} (assuming that you're not allowing client _id creation), these queries will be quite cheap, and it'll be hard for them to get out of sync.
separate cache collection
Even if you do decide to cache these creation values, I'd recommend caching them away from your user's collection to keep your collections more focused. You could create a post_count collection and then update only the cache collection for these counts: post_count.updateOne({userId, day}, {$incr: {count: 1}, $setOnInsert: {day, userId, count: 0}}, {upsert: true});. One nice benefit of this approach is you can use a ttl index on day to have mongo automatically remove the old documents in this collection after they've expired.

Since you are using MongoDB I would suggest,
Use agenda and create a job that runs at UTC 00:00 (If you have diverse users from different time zone) or time zone specific to your user's country.
In this job call updateMany function on your user model to reset dailyPostLimit field.

Related

index new document and get the indexed document in the same query

it is possible to index a new document and return him after he succeeded indexed?
I tried to take the _id that returns but I'm using 2 queries and the index action takes some time and the second query not find the _id so it not always doing it perfectly.
this is the query that index the document:
const query = await elsaticClient.index({
routing: "dasdsad34_d",
index: "milan",
body: {
text: "san siro",
user: {
user_id: "3",
username: "maldini",
},
tags: ["Forza Milan","grande milan"],
publish_date: new Date(),
likes: [],
users_tags: [1,5],
type: {
name: "comment",
parent: "dasdsad34_d",
},
},
});
No, its not possible with default behavior. By default, Elasticsearch has only a near real time support. Its default refresh interval is 1 second as index refresh is deemed as a costly operation.
In order to overcome this, in your indexing operation, you can add refresh=true. You can get further details from below links.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
Please note that this is NOT a recommended option as this comes with huge overhead. Only use this, if your inserts into this index in question are having a very very low number.
Recommended way is to use refresh=wait_for on your indexing operation. But this has a downside of waiting for a second for the natural refresh to complete. So if you have default refresh interval set to 1 and are okay with this as an acceptable trade off, then this is the way to go.
However, if you have a higher refresh interval set, then the wait time for the indexing operation will be as high the refresh interval. So choose your option carefully.

How can I count the view of a Specific post by a User ? Count every User just Once

We have user and news model, in the news model we have e viewsCount field, I want to increment this view count when a GET request is made by a User.
When a specific user makes a GET request, the view count will increment one, every user just one view.
const NEWSModel = new Schema({
viewesCount: { type: Number },
Publisher: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User',
required: true
},
LikesCount: { type: Number },
DislikeCount: { type: Number },
Comments: CommenTs
});
Every user can view the news as many times as wants, but just can make one view. How can I do that?
you Can change your model like and then whenever you get a news just push the user id to the viewedBy field.
news.viewedBy.push(user id)
viewedBy: [{
type: mongoose.Schema.Types.ObjectId,
ref: "User"
}]
}); ```
If you have not a lot of users, you can add additional field to news model like users_viewed which would be array of unique user ids.
And make additional check before incrementing views count.
If user, who requested news is already in this users_viewed array, you skip any additional actions.
If don't, increment views counter.
But if you do have a lot of users, it's better to store views counter in Redis to skip request to database and increment in memory counter.
The logic for storing and showing data would be the same, but you'll reduce load on your database and speed up the whole process.
[UPDATE] According to your comment, about number of users.
To make things work you can use this package.
First of all, after request for a news from a client, you can store all the news data in your cache (to reduce number of requests to your database).
Now you have few possible ways to handle number of views.
I think, the easiest to implement would be to add user unique identifier to SET. And return number of users in SET using SCARD;
In this solution you wouldn't need to check if user already watched the news, because set data structure holds only unique values (the same reason why do we need to use user's unique identifier).
And you just use 2 redis requests, which is pretty good for heavy load services.
You can have another field called viewedBy of type array in which you can store users ids. Then it will be easier to check if a user already viewed your post or to count them.
File: news.model.js
const News = new Schema({
viewedBy: [{
type: mongoose.Schema.Types.ObjectId,
ref: "User"
}],
// other properties...
});
File: news.controller.js
const user = User.find({...}); // get current user
const news = News.find({...}); // get a news
/*
Update views count by adding the current user id if it's not already added
Thanks to '$addToSet', the update() function will do nothing if the user id it's already there)
*/
news.update({ $addToSet: { viewedBy: user._id } });
// Getting the views count
console.log('Total views:', news.viewedBy.length);
More about $addToSet: https://docs.mongodb.com/manual/reference/operator/update/addToSet/

How to organize MongoDB data by time?

I have an application that's saving data every second to MongoDB. This is important data, but holding data every second forever isn't necessary. After some time, I'd like to run a process (background worker) to clean up this data into hourly chunks, which includes every piece of data (1 per second) for each hour of that day. Kinda like Time Machine does on Mac.
From researching and thinking about it, there's a couple ways I can think of that I can make this happen with:
Mongo aggregators (not sure exactly how this would work)
Node background process with momentjs and sort by date, hour, etc. (really long time)
What's the best way to do this with MongoDB?
I think the Date Aggregation Operators could be better option for your case. Given your schema as below
var dataSchema = new Schema({
// other fields are here...
updated: Date,
});
var Data = mongoose.model('Data', dataSchema );
Just save those data as the normal Date.
Then you can retrieve the hourly chunks through aggregate operation in mongoose, one sample code like,
MyModel.aggregate([
{$match: {$and: [{updated: {$gte: start_date_hour}}, {updated: {$lte: end_date_hour}}]}},
{$group: {
_id: {
year: {$year: "$updated"},
month: {$month: "$updated"},
day: {$dayOfMonth: "$updated"}
// other fields should be here to meet your requirement
},
}},
{$sort: {"date.year":1, "date.month":1, "date.day":1}}
], callback);
For more arguments of aggregate, please refer to this doc.

Custom fields in MongoDB query result

After being so used to SQL, I have came across this problem with mongoDB.
First, I am using mongoose.
Now, the problem. I have a collection named User.
var UserSchema = new Schema ({
id : ObjectId,
name : {type : String, trim : true, required : true},
email: {type:String, trim:true, required: true, index: { unique: true }},
password: {type:String, required: true, set: passwordToMD5},
age: {type:Number, min: 18, required: true, default: 18},
gender: {type: Number, default:0, required: true},
height: {type: Number, default:180, min: 140, max: 220},
_eye_color: {type: ObjectId, default: null},
location: {
lon: {type: Number, default: 0},
lat: {type: Number, default: 0}
},
status: {type:Number, required: true, default:0}
},{
toObject: { virtuals: true },
toJSON: { virtuals: true },
collection:"user"});
Now I need to select all users from this collection and sort them by special attribude (say "rank"). This rank is calculated with certain logic depending of their distance from a point, age compared with given age, etc...
So now I was wondering how to select this rank and then use it in sorting? I have tried to use virtuals, they are handy to count additional info, but unfortunately, it is not possible to sort the find() results by a virtual field.
Of course I can calculate this rank in a virtual, then select all records, and after that, in callback, do some javascript. But in this case, as I select all the users then sort and then limit, the javascript part might take too long...
I was thinking to use mapreduce, but I am not sure it will do what I want.
Can someone give me a hint if my task is possible to do in mongoDB/mongoose?
EDIT 1
I have also tried to use aggregation framework, and at first it seemed to be the best solution with the $project ability. But then, when I needed to do rank calculations, I found out that aggregation does not support a lot of mathematical functions like sin, cos and sqrt. And also it was impossible to use pre-defined usual javascript functions in projection. I mean,the function got called, but I was not able to pass current record fields to it.
{$project: {
distance_from_user: mUtils.getDistance(point, this.location)
}
Inside function the second attr was "undefined".
So I guess it is impossible to do my rank calculations with aggregation framework.
EDIT 2
Ok, I know everyone tells me not to use mapreduce as it is not good for realtime queries, but as I cannot use aggregation, I think I'll try mapreduce. So Let's say I have this map reduce.
function map() {
emit(1, // Or put a GROUP BY key here
{name: this.name, // the field you want stats for
age: this.age,
lat: this.location.lat,
lon: this.location.lon,
distance:0,
rank:0
});
}
function reduce(key, values) {
return val;
}
function finalize(key, value){
return value;
}
var command = {'mapreduce': "user", 'map': map.toString(), 'reduce': reduce.toString(), query:{$and: [{gender: user_params.gender}, {_id: {$ne: current_user_id}}]}, 'out': {inline:1}};
mongoose.connection.db.executeDbCommand(command, function(error, result){
if(error) {
log(error);
return;
}
log(result);
return;
});
What should I write in reduce (or maybe change map) to calculate rank for every user?
The only real solution is to calculate your rank for each document and storing it in the document. Since this value will be constant as long as the values in your document remain constant you can simply calculate this value whenever you update the fields that affect it.
Map/reduce certainly isn't a good solution for this nor is any other type of aggregation. Precalculating your rank and storing it with the document is the only option that scales if you're using MongoDB.
You are aware of amount of computations such thing would need - if you'd do it every time user logs in, you'll have interesting load peaks when lots of people would log in at shorter amount of time - and your page (interface) would be heavily resources-bound (which is not good).
I'd recommend you something a bit different - keeping ranking for every logged-on user and updating them in intervals: keeping "short session" and "long session" (long session - the one you use in web browser and short - "online, currently using the site") and generating ranks regularly only for "shortly-active" users and rarely for the logged on in the long session. Something like every five minutes. Much more scallable - and if user would be unhappy about him not having his rank counted - you may always tweak the sys to count his ranks on demand.
You might use mapredurce in such case - your map function should only emit the data you need for counting the rank for a given user (like age, lat, long, whatever you need) AND a result (rank) for a tested user (emit it empty). For reduce function you'd need to look at sorting with mapreduce (it highly depends on the way you create the rank) - also you'd count the rank (or some kind of a sub-value) for the other users.
It look like a good use case for MongoDB + Hadoop.
This presentation show some of the possibilities of this combination.

Mongoose: populate() / DBref or data duplication?

I have two collections:
Users
Uploads
Each upload has a User associated with it and I need to know their details when an Upload is viewed. Is it best practice to duplicate this data inside the the Uploads record, or use populate() to pull in these details from the Users collection referenced by _id?
OPTION 1
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
_user: { type: Schema.ObjectId, ref: 'users'},
title: { type: String },
});
OPTION 2
var UploadSchema = new Schema({
_id: { type: Schema.ObjectId },
user: {
name: { type: String },
email: { type: String },
avatar: { type: String },
//...etc
},
title: { type: String },
});
With 'Option 2' if any of the data in the Users collection changes I will have to update this across all associated Upload records. With 'Option 1' on the other hand I can just chill out and let populate() ensure the latest User data is always shown.
Is the overhead of using populate() significant? What is the best practice in this common scenario?
If You need to query on your Users, keep users alone. If You need to query on your uploads, keep uploads alone.
Another question you should ask yourself is: Every time i need this data, do I need the embedded objects (and vice-versa)? How many time this data will be updated? How many times this data will be read?
Think about a friendship request:
Each time you need the request you need the user which made the request, then embed the request inside the user document.
You will be able to create an index on the embedded object too, and your search will be mono query / fast / consistent.
Just a link to my previous reply on a similar question:
Mongo DB relations between objects
I think this post will be right for you http://www.mongodb.org/display/DOCS/Schema+Design
Use Cases
Customer / Order / Order Line-Item
Orders should be a collection. customers a collection. line-items should be an array of line-items embedded in the order object.
Blogging system.
Posts should be a collection. post author might be a separate collection, or simply a field within posts if only an email address. comments should be embedded objects within a post for performance.
Schema Design Basics
Kyle Banker, 10gen
http://www.10gen.com/presentation/mongosf2011/schemabasics
Indexing & Query Optimization
Alvin Richards, Senior Director of Enterprise Engineering
http://www.10gen.com/presentation/mongosf-2011/mongodb-indexing-query-optimization
**These 2 videos are the bests on mongoddb ever seen imho*
Populate() is just a query. So the overhead is whatever the query is, which is a find() on your model.
Also, best practice for MongoDB is to embed what you can. It will result in a faster query. It sounds like you'd be duplicating a ton of data though, which puts relations(linking) at a good spot.
"Linking" is just putting an ObjectId in a field from another model.
Here is the Mongo Best Practices http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-SummaryofBestPractices
Linking/DBRefs http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-SimpleDirect%2FManualLinking

Resources