The "right way" to architecture voting with Mongoose? - node.js

I'm creating a web app using Mongoose/MongoDB to store information that will be voted on. I'll be storing usernames and IP addresses with the vote (so voters can update/modify their votes if desired).
Root Question: What's the best way to securely architecture voting in a Mongoose schema?
Currently, my schema looks like this (simplified):
var Thing = new Schema({
title: {
type: String
},
creator: {
type: String
},
options: [{
description: {
type: String
},
votes: [{
username: {
type: String
},
ip: {
type: String
}
}]
}]
});
mongoose.model('Thing', Thing);
While this makes querying the db for any given Thing super easy, it becomes more problematic for security for obvious reasons - I don't want to be returning out usernames and ip addresses to the browser.
The problem is, I'm not sure which is the best/least painful scenario for securely returning Thing data to the browser:
Loop through each option in Thing.options, then sub-loop through each vote in Thing.options[i].votes to find the vote cast by the user requesting the data, then delete all votes to get rid of other user data. This seems to be very resource intensive, but I couldn't find a way to use indexOf in subarrays (guidance welcome on this one), i.e. Thing.options.votes.indexOf(username) or something to that effect.
Store vote information in the already-existing User schema, then have to search through all users for vote data and stick it all together every time I want query a single Thing. This also seems inefficient/more resource intensive/more complicated than necessary.
Create a separate Vote schema that stores the data more conveniently, but then adds another database call (one for the Thing, one for the Vote).
This problem is somewhat compounded by the fact that there are different ways to vote, with this being the simplest.
Research...for posterity's sake:
This question addresses voting in databases, but for a relational db, not MongoDB/Mongoose.
This question addresses Mongoose/Node.js app architecture, but nothing about votes.
This NPM Module adds voting to Mongoose schemas, but doesn't quite fit my needs.
This post looks very promising, as the author is sort of doing what I'm describing in point 1 above (see Listing 13 on the author's post), but he still creates a nested loop, starting in line 22 of Listing 13, to loop through each choice/option, then through each vote for each choice/option.

As a quick hint - to prevent leaking of IP addresses from DB - I would suggest to add extra collection which will store all vote sensitive data, but still have other vote data in same document.
This gives small overhead when storing data, but by design IP info will be not provided to caller and there is no need for extra data scrubbing on every call, to secure data.

Related

Limit N documents per user for a specific Schema

I think it would be easier for me to explain starting with an example.
Let's say we have this Schema:
var entrySchema = new Schema({
_id: mongoose.Schema.Types.ObjectId,
user: {
type: String,
ref: User
},
// more fields here
});
So basically there is a 1:M kind of relationship where a user can have multiple entries stored in the DB.
Now for optimization's sake and to reduce costs I would like to only allow N (50-100) entries to be stored for a specific user.
Of course, the trivial solution would be to check each time I add an entry for a user if the limit has been reached and delete the oldest entry.
I was wondering if there is any built-in mechanism to make this easier to implement. I'm not saying it's hard to implement, just that it looks like something that maybe could be solved with a feature of mongo/mongoose that I'm not aware of.
I'm having a hard time finding anything in both mongo documentation and mongoose.
Note an npm package is welcome too.

In CouchDB, should I use the _id for relation and _changes?

I've been reading a lot of best practices and how I should embrace the _id. To be honest, I'm getting my kind of paranoid at the repercussions I might face if I don't do this for when I start scaling up my application.
Currently I have about 50k documents per database. It's only been a few months with heavy usage. I expect this to grow A LOT. I do a lot of .find() Mango Queries, not much indexing; and to be honest working off a relational style document structuring.
For example:
First get Project from ID.
Then do a find query that:
grabs all type:signature where project_id: X.
grabs all type:revisions where project_id: X.
The reason for this is I try VERY hard not to update documents. A lot of these documents are created offline, so doing a write once workflow is very important for me to avoid conflicts.
I'm currently at a point of no return as scheduling is getting pretty intense. If I want to change the way I'm doing things now is the best time before it gets too crazy.
I'd love to hear your thoughts about using the _id for data structuring and what people think.
Being able to make one call with a _all_docs grab like this sounds appealing to me:
{
"include_docs": true,
"startkey": "project:{ID}",
"endkey": "project:{ID}:\ufff0"
}
An example of how ONE type of my documents are set is like so:
Main Document
{
_id: {COUCH_GENERATED_1},
type: "project",
..
.
}
Signature Document
{
_id: {COUCH_GENERATED_2},
type: "signature",
project_id: {COUCH_GENERATED_1},
created_at: {UNIX_TIMESTAMP}
}
Change to Main Document
{
_id: {COUCH_GENERATED_3},
type: "revision",
project_id: {COUCH_GENERATED_1},
created_at: {UNIX_TIMESTAMP}
data: [{..}]
}
I was wondering whether I should do something like this:
Main Document: _id: project:{kuuid_1}
Signature Document: _id: project:{kuuid_1}:signature:{kuuid_2}
Change to Main Document: _id: project:{kuuid_1}:rev:{kuuid_3}
I'm just trying to set up my database in a way that isn't going to mess with me in the future. I know problems are going to come up but I'd like not to heavily change the structure if I can avoid it.
Another reason I am thinking of this is that I watch for _changes in my databases and being able to know what types are coming through without getting each document every time a document changes sound appealing also.
Setting up your database structure so that it makes data retrieval easier is good practice. It seems to me you have some options:
If there is a field called project_id in the documents of interest, you can create an index on project_id which would allow you to fetch all documents pertaining to a known project_id cheaply. see CouchDB Find
Create a MapReduce index keyed on project_id e.g if (doc.project_id) { emit(doc.project_id)}. The index that this produces would allow you to fetch documents by known project_id with judicious use of start_key& end_key when querying the view. see Introduction to views
As you say, packing more information into the _id field allows you to perform range queries on the _all_docs endpoint.
If you choose a key design of:
project{project_id}:signature{kuuid}
then the primary index of the database has all of a single project's documents grouped together. Putting the project_id before the ':' character is preparation for a forthcoming CouchDB feature called "partitioned databases", which groups logically related documents in their own partition, making it quicker and easier to perform queries on a single partition, in your case a project. This feature isn't ready yet but it's likely to have a {partition_key}:{document_key} format for the _id field, so there's no harm in getting your document _ids ready for it for when it lands (see CouchDB mailing list! In the meantime, a range query on _all_docs will work.

Which mongoose model would be more efficient?

I am new to no-sql. I am trying to build a simple e-commerce app in nodejs. Now for the product i need to build CRUD operations so that only owner can edit them, rest have READ-ONLY access. The main question is which would be a better implementation ?
The current code that i have is like
var mongoose = require('mongoose');
module.exports = mongoose.model('product',new mongoose.Schema({
owner : {type: String},
title : {type: String},
...
}));
The owner is actually the _id from my user model. Basically this is something like a foreign key. Is this the valid method to go around or should i add an array inside the user model to store the list of objects that he owns ?
Also i would like to have validation if what i just did for owner, storing UID in String is best practice or should i do something else to reference the user model.
Thanks in advance for helping me out.
The whole point of document databases is you shouldn't have foreign relationships; All the data your document needs should be denormalized in the document.
So inside your product document, you should be duplicating all the owner details you need. You can store their _id as well for lookup, but don't use a string for this, use an actual ObjectId().
For more about denormalization see The Little MongoDB Book
Yet another alternative to using joins is to denormalize your data. Historically, denormalization was reserved for performance-sensitive code, or when data should be snapshotted (like in an audit log). However, with the ever- growing popularity of NoSQL, many of which don’t have joins, denormalization as part of normal modeling is becoming increasingly common. This doesn’t mean you should duplicate every piece of information in every document. However, rather than letting fear of duplicate data drive your design decisions, consider modeling your data based on what information belongs to what document.

MongoDB query comments along with user information

I am creating an application with nodejs and mongod(Not mongoose). I have a problem that gave me headache over few days, anyone please suggest a way for this!!.
I have a mongodb design like this
post{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}
user{
_id:ObjectId('123456'),
name: 'some name', --> changable at any times
username: 'some_name', --> changable at any times
picture: 'url_link' --> changable at any times
}
I want to query the post along with all the user information so the query will look like this:
[{
_id:ObjectId(...),
picture: 'some_url',
comments:[
{_id:ObjectId(...),
user_id:Object('123456'),
user_data:{
_id:ObjectId('123456'),
name: 'some name',
username: 'some_name',
picture: 'url_link'
}
body:"some content"
},
{_id:ObjectId(...),
user_id:Object('...'),
body:"other content"
}
]
}]
I tried to use loop to manually get the user data and add to comment but it proves to be difficult and not achievable by my coding skill :(
Please anybody got any suggestion, I would be really appreciated.
P/s I am trying another approach that I would embedded all the user data in to the comment and whenever the user update their username, name or picture. They will update it in all the comment as well
The problem(s)
As written before, there are several problems when over-embedding:
Problem 1: BSON size limit
As of the time of this writing, BSON documents are limited to 16MB. If that limit is reached, MongoDB would throw an exception and you simply could not add more comments and in worst case scenarios not even change the (user-)name or the picture if the change would increase the size of the document.
Problem 2: Query limitations and performance
It is not easily possible to query or sort the comments array under certain conditions. Some things would require a rather costly aggregation, others rather complicated statements.
While one could argue that once the queries are in place, this isn't much of a problem, I beg to differ. First, the more complicated a query is, the harder it is to optimize, both for the developer and subsequently MongoDBs query optimizer. I have had the best results with simplyfying data models and queries, speeding up responses by a factor of 100 in one instance.
When scaling, the ressources needed for complicated and/or costly queries might even sum up to whole machines when compared to a simpler data model and according queries.
Problem 3: Maintainability
Last but not least you might well run into problems maintaining your code. As a simple rule of thumb
The more complicated your code becomes, the harder it is to maintain. The harder code is to maintain, the more time it needs to maintain the code. The more time it needs to maintain code, the more expensive it gets.
Conclusion: Complicated code is expensive.
In this context, "expensive" both refers to money (for professional projects) and time (for hobby projects).
(My!) Solution
It is pretty easy: simplify your data model. Consequently, your queries will become less complicated and (hopefully) faster.
Step 1: Identify your use cases
That's going to be a wild guess for me, but the important thing here is to show you the general method. I'd define your use cases as follows:
For a given post, users should be able to comment
For a given post, show the author and the comments, along with the commenters and authors username and their picture
For a given user, it should be easily possible to change the name, username and picture
Step 2: Model your data accordingly
Users
First of all, we have a straightforward user model
{
_id: new ObjectId(),
name: "Joe Average",
username: "HotGrrrl96",
picture: "some_link"
}
Nothing new here, added just for completeness.
Posts
{
_id: new ObjectId()
title: "A post",
content: " Interesting stuff",
picture: "some_link",
created: new ISODate(),
author: {
username: "HotGrrrl96",
picture: "some_link"
}
}
And that's about it for a post. There are two things to note here: first, we store the author data we immediately need when displaying a post, since this saves us a query for a very common, if not ubiquitous use case. Why don't we save the comments and commenters data acordingly? Because of the 16 MB size limit, we are trying to prevent the storage of references in a single document. Rather, we store the references in comment documents:
Comments
{
_id: new ObjectId(),
post: someObjectId,
created: new ISODate(),
commenter: {
username: "FooBar",
picture: "some_link"
},
comment: "Awesome!"
}
The same as with posts, we have all the necessary data for displaying a post.
The queries
What we have achieved now is that we circumvented the BSON size limit and we don't need to refer to the user data in order to be able to display posts and comments, which should save us a lot of queries. But let's come back to the use cases and some more queries
Adding a comment
That's totally straightforward now.
Getting all or some comments for a given post
For all comments
db.comments.find({post:objectIdOfPost})
For the 3 lastest comments
db.comments.find({post:objectIdOfPost}).sort({created:-1}).limit(3)
So for displaying a post and all (or some) of its comments including the usernames and pictures we are at two queries. More than you needed before, but we circumvented the size limit and basically you can have an indefinite number of comments for every post. But let's get to something real
Getting the latest 5 posts and their latest 3 comments
This is a two step process. However, with proper indexing (will come back to that later) this still should be fast (and hence resource saving):
var posts = db.posts.find().sort({created:-1}).limit(5)
posts.forEach(
function(post) {
doSomethingWith(post);
var comments = db.comments.find({"post":post._id}).sort("created":-1).limit(3);
doSomethingElseWith(comments);
}
)
Get all posts of a given user sorted from newest to oldest and their comments
var posts = db.posts.find({"author.username": "HotGrrrl96"},{_id:1}).sort({"created":-1});
var postIds = [];
posts.forEach(
function(post){
postIds.push(post._id);
}
)
var comments = db.comments.find({post: {$in: postIds}}).sort({post:1, created:-1});
Note that we have only two queries here. Although you need to "manually" make the connection between posts and their respective comments, that should be pretty straightforward.
Change a username
This presumably is a rare use case executed. However, it isn't very complicated with said data model
First, we change the user document
db.users.update(
{ username: "HotGrrrl96"},
{
$set: { username: "Joe Cool"},
$push: {oldUsernames: "HotGrrrl96" }
},
{
writeConcern: {w: "majority"}
}
);
We push the old username to an according array. This is a security measure in case something goes wrong with the following operations. Furthermore, we set the write concern to a rather high level in order to make sure the data is durable.
db.posts.update(
{ "author.username": "HotGrrrl96"},
{ $set:{ "author.username": "Joe Cool"} },
{
multi:true,
writeConcern: {w:"majority"}
}
)
Nothing special here. The update statement for the comments looks pretty much the same. While those queries take some time, they are rarely executed.
The indices
As a rule of thumb, one can say that MongoDB can only use one index per query. While this is not entirely true since there are index intersections, it is easy to deal with. Another thing is that individual fields in a compound index can be used independently. So an easy approach to index optimization is to find the query with the most fields used in operations which make use of indices and create a compound index of them. Note that the order of occurrence in the query matters. So, let's go ahead.
Posts
db.posts.createIndex({"author.username":1,"created":-1})
Comments
db.comments.createIndex({"post":1, "created":-1})
Conclusion
A fully embedded document per post admittedly is the the fastest way of loading it and it's comments. However, it does not scale well and due to the nature of possibly complex queries necessary to deal with it, this performance advantage may be leveraged or even eliminated.
With the above solution, you trade some speed (if!) against basically unlimited scalability and a much more straightforward way of dealing with the data.
Hth.
You are following Normalized data model approach. if you are following this model means, you have to write another query to get the user info or If you uses the embedded document store then all the user doc must change whenever updates on user doc.
http://docs.mongodb.org/v3.0/reference/database-references/ read this link for more information.

Database Design for "Likes" in a social network (MongoDB)

I'm building a photo/video sharing social network using MongoDB. The social network has a feed, profiles and a follower model. I basically followed a similar approach to this article for my "social feed" design. Specifically, I used the fan-out on write with bucket approach when users posts stories.
My issue is when a user "likes" a story. I'm currently also using the fan-out on write approach that basically increments/decrements a story's "like count" for every user's feed. I think this might be a bad design since users "like" more frequently than they post. Users can quickly saturate the server by liking and unliking a popular post.
What design pattern do you guys recommend here? Should I use fan-out on read? Keep using Fan-out on write with Background workers? If the solution is "background workers", what approach do you recommend using for background workers? 'm using Node.js.
Any help is appreciated!
Thanks,
Henri
I think the best approach is:
1. increasing-decreasing a counter in your database to keep track of the number of like
2. insert in a collection called 'like' each like as a single document, where you track the id of the users who likes the story and the id of the liked story.
Then if you just need the number of likes you can access the counter data and it's really fast, instead if you need to know where the likes where from you will query the collection called 'like' querying by story id and get all users' ids who liked the story.
The documents i am talking about in the like collection will be like so:
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'}
You can store the counter in the story's document itself:
{
...
likes: 56
}
You can also keep track of last likes in your story's document (for example 1000. last because mongodb's documents have limited size to 16 mb and if your application scales so much you will meet problem in storing potential unlimited data in a single document). With this approach you can easily query the 'likes' collection and get last likes.
When someone unlikes a story you can simply remove the like document from 'like' collection, or, as better approach, (e.g: you are sending notification when someone's story is liked), just store in that document that was unliked, so that if it will be liked again by the same user you will have checked that the like was already inserted and you won't send another notification.
example:
first time insert:
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'
active: true}
When unliked update to this
{_id: 'dfggsdjtsdgrhtd'
'story_id': 'ertyerdtyfret',
'user_id': 'sdrtyurertyuwert'
active: false}
When each like is added check if there's an existing document with the same story id and the same user id. If there is, if active is false it means the user already liked and unliked the story so that if it will be liked again you won't send already-sent notification!

Resources