I am working on Node.js + mongodb application and my question related to design of my database and queries. Firstly, describe my db structure. I have in my database two collections users and values. Document in users collection looks like this:
{
"_id" : 31450861,
"first_name" : "Jon",
"last_name" : "Doe",
"sex" : 2,
"bdate" : ISODate("1981-08-01T21:00:00Z"),
"city" : {
"id" : 282,
"title" : "Minsk"
},
"country" : {
"id" : 3,
"title" : "Belarussia"
},
"photo_max" : "https://foto.ru/RnhOKp2YJE4.jpg",
"relation" : 4,
"interests" : "science",
"music" : "pop",
"lang" : "RU",
}
This collection filled with users data (language, birthday, etc).
Documents from collection values looks like this:
{
"_id" : ObjectId("548023e5c16f310c23b75075"),
"userId" : "31450861", //reference to _id in users collection
"group1" : {
"value1" : 13,
" value2" : 7,
" value3" : 3,
" value4" : 14,
" value5" : 17
},
"group2" : {
" value1" : 9,
" value2" : 6,
" value3" : 17,
" value4" : 12,
" value5" : 13
}
So I need to make search for users on this database with lots of parameters from this two collection (their values (from collection values), sex, city, language etc.). I didn’t embed document values into users because I do a lot queries separately on them (but may be it’s anyway wrong design I need help on this). In future there will be more collections with similar structure like values (or at least there will be reference to userId), which I’ll have to include in search query, and I’ll need agility to extend my query on more collections.
So I need to run complex query on this collections (I know there is no JOINs in mongodb, so I know I have to query twice or use mapreduce).
So far my thoughts on this issue (aren’t tested in code just thoughts).
I need to write search function which performs 2 queries (and more in future):
Find users with same values and getting their ids.
var values = db.collection('values');
var ids = values.find({ value1: 1, value2: 2, value3: 3 }, {userId: 1 } ) //then transform ids so it became array with userId
Then in this found set find users on some more parameters (sex, birthday, language, etc)
var users = db.collection('users');
users.find({ $and: [{ _id: { $in: ids } }, {sex: 2 }, {lang: “RU” } ] });
My questions are:
1. Is it normal approach or I’ll end up with very slow performance and mess in code when adding new collections and queries?
2. If is it normal, how to easily add one more query to one more collection?
Any help, any thoughts are welcome! Thanks in advance!
Related
I have two different collections with same data structure. Both collection have same data the only difference is collection name. How to search between this two collection based on dateReceived as you see in the below code. We create a new collection each month, I want to search between two months. Exampele from 2022-10-15 to 2022-11-15. Which will 2 collection report_202210 and report_202211.
First Collection Name: report_202210
{
"_id" : ObjectId("6392a26e4e3209b2bb23a250"),
"customerId" : "CUST0002",
"content" : "This is a test message",
"dateReceived" : "2022-10-22",
"sentStatus" : "Passed"
}
Second Collection Name: report_202211
{
"_id" : ObjectId("6392a26e4e3209b2bb23a198"),
"customerId" : "CUST0003",
"content" : "This is a test message",
"dateReceived" : "2022-11-11",
"sentStatus" : "Passed"
}
I am trying to count the number of models in a collection based on a property:
I have an upvote model, that has: post (objectId) and a few other properties.
First, is this good design? Posts could get many upvotes, so I didn’t want to store them in the Post model.
Regardless, I want to count the number of upvotes on posts with a specific property with the following and it’s not working. Any suggestions?
upvote.count({‘post.specialProperty’: mongoose.Types.ObjectId(“id”), function (err, count) {
console.log(count);
});
Post Schema Design
In regards to design. I would design the posts collection for documents to be structured as such:
{
"_id" : ObjectId(),
"proprerty1" : "some value",
"property2" : "some value",
"voteCount" : 1,
"votes": [
{
"voter": ObjectId()// voter Id,
other properties...
}
]
}
You will have an array that will hold objects that can contain info such as voter id and other properties.
Updating
When a posts is updated you could simply increment or decrement the voteCountaccordingly. You can increment by 1 like this:
db.posts.update(
{"_id" : postId},
{
$inc: { voteCount: 1},
$push : {
"votes" : {"voter":ObjectId, "otherproperty": "some value"}
}
}
)
The $inc modifier can be used to change the value for an existing key or to create a new key if it does not already exist. Its very useful for updating votes.
Totaling votes of particular Post Criteria
If you want to total the amount for posts fitting a certain criteria, you must use the Aggregation Framework.
You can get the total like this:
db.posts.aggregate(
[
{
$match : {property1: "some value"}
},
{
$group : {
_id : null,
totalNumberOfVotes : {$sum : "$voteCount" }
}
}
]
)
I have 2 schemas:
var UserSchema = new Schema({
login: {type:String, unique: true, trim:true, index:true},
password: String
)};
var MessageSchema = new Schema({
text: String,
from_id: {type:mongoose.Schema.ObjectId, ref: 'user', index:true},
to_id: {type:mongoose.Schema.ObjectId, ref: 'user', index:true},
date: Number
});
Now for example I have 4 users in the database. Assuming user1 sent 5 messages to each other user, user2 and user3 each replied to user1 message. Now I need to return only one recent message from each dialog. How can i do this?
One excellent way to do this would be by using MongoDB's aggregation framework.
I'm not 100% clear on your specific use-case, but I would think that using $unwind to get all of the messages in the same place, and then filtering by perhaps a $setUnion for uniqueness, then a $sort and a $limit if you want it to only display the most recent messages.
Here's a link to the quick reference: http://docs.mongodb.org/manual/meta/aggregation-quick-reference/
Hope this helps, and good luck!
Everything done here is mongo console you will have to translate to your driver yourself
The easy way
If you want to query for specific conversations add a conversation_id to your message (maybe the usernames combined or just a random string) then you can aggregate with
db.messages.aggregate({$sort:{date:-1}},
{$group:{_id:"$conversation_id", message:{$first:"$$ROOT"}})
Or just embed you messages in a conversation document like David K. seems to have thought you where already doing.
The "I want this to work" way
I build an aggregate that does what you want but you may want to go with the easy way after you see this (maybe someone can think of a shorter one).
First the collection modeled after your schema:
> db.chat.find().pretty()
{ "from" : 1,
"to" : 2,
"text" : "First from 1 to 2",
"date" : 1}
{ "from" : 2,
"to" : 1,
"text" : "First answer from 2 to 1",
"date" : 2}
{ "from" : 1,
"to" : 2,
"text" : "Second from 1 to 2",
"date" : 3}
{ "from" : 1,
"to" : 3,
"text" : "First 1 to 3",
"date" : 3}
{ "from" : 3,
"to" : 1,
"text" : "Reply 3 to 1",
"date" : 4}
_id stipped for readbility and as you see two conversations, one between users 1 and 2 and one between 1 and 3.
Now the aggregate query:
db.chat.aggregate({$project: {to:1,from:1,date:1,text:1,duplicator:{$const: [0,1]}}},
{$unwind:"$duplicator"},
{$group:{_id:"$_id",
parts:{$push:{$cond:[{$eq:"$duplicator",0]},"$from","$to"]}},
messages:{$addToSet: {text:"$text",date:"$date"}}}},
{$unwind:"$parts"},
{$sort:{parts:1}},
{$group:{_id:"$_id",
parts:{$push:"$parts"},
messages:{$first:"$messages"}}},
{$sort:{"messages.date":-1}},
{$group:{_id:"$parts", last_message:{$first:"$messages"}}})
Output:
{ "_id" : [1,2],
"last_message" : [{ "text" : "Second from 1 to 2",
"date" : 3 }]
}
{ "_id" : [1,3],
"last_message" : [{"text" : "Reply 3 to 1",
"date" : 4}]
}
The last two parts ($sort and $group) are the same as in the easy way. The rest is to build a conversation_id named parts for participants. This is done by
Adding a new array duplicator
Unwinding that array to get a duplicate of each message
Pushing to and from in an array parts for each message. parts will work a conversation_id
Those steps above are inspired by this answer
Then again unwinding parts
to $sort them because [1,2] != [2,1]
and $group them into one array again
Tose steps result in a conversation id which can then be used like proposed in the two line aggreate above.
The lesson
Build you schema according to the way you want to query it because raping mongodb with strange aggregates is only fun until it kicks you in the balls in self defense...
The "you formulated your question wrong"
If you just want the last message of a specific conversation you could use
db.chat.find({to:{$in:[ user1,user2]},from:{$in:[user1,user2]}})
.sort({date:-1})
.limit(1)
I have created a node.js module that can already query MongoDB for a set of documents using a find and output those results to JSON. My question is, knowing that node.js is asynchronous, how can I use the results from this query (items) to create a query that goes back to MongoDB to find another set of documents. This query basically returns a list of employee ids that can be used to query documents containing information on those employees(i.e. firstName, lastName etc.). Then output those results instead as JSON. The first query is basically saying, give me all of the employees that can be viewed by a particular user. I then need to take the employee ids and do a query on another set of documents that contains those individuals information, like you see below.
Here are the two documents schema:
Employee
{
"_id" : ObjectId("5208db78ecc00915e0900699"),
"clientId" : 1,
"employeeId" : "12345",
"lastName" : "DOE",
"firstName" : "JOHN",
"middleName" : "A",
"badge" : "8675309",
"birthDate" : "10/12/1978"
}
Users an employee can access (User Cache)
{
"_id" : ObjectId("520920a99bc417b7c5e36abf"),
"clientSystem" : "SystemX",
"customerNumber" : "1",
"clientUserId" : "jdoe3",
"securityCode" : "authorize",
"employeeId" : "12345",
"creationDate" : "2013-Aug-12 13:51:37"
}
Here is my code:
exports.employeeList = function(req, res) {
console.log(req.params);
var clientSystem = req.query["clientSystem"];
var clientUserId = req.query["clientUserId"];
var customerNumber = req.query["customerNumber"];
var securityCode = req.query["securityCode"];
if (clientSystem != null && clientUserId != null && customerNumber != null && securityCode != null){
db.collection('ExtEmployeeList', function(err, collection){
collection.find({'clientSystem': clientSystem, 'clientUserId':clientUserId, 'customerNumber':customerNumber, 'securityCode': securityCode}).toArray(function (err, items){
console.log(items);
res.jsonp(items);
});//close find
});//close collection
}//close if
else {
res.send(400);
}//close else
};//close function
What you're wanting to do is possible, but probably not the most effective use of Mongo. I tend to design Mongo documents around how the data will actually be used. So if I needed the user's names to show up in a list of users I can view, I would embed that data so I don't have to do multiple round trips to mongo to get all the information I need. I would do something like the following:
{
"_id" : ObjectId("520920a99bc417b7c5e36abf"),
"clientSystem" : "SystemX",
"customerNumber" : "1",
"clientUserId" : "jdoe3",
"securityCode" : "authorize",
"employeeId" : "12345",
"creationDate" : "2013-Aug-12 13:51:37"
"employee": {
"_id" : ObjectId("5208db78ecc00915e0900699"),
"clientId" : 1,
"employeeId" : "12345",
"lastName" : "DOE",
"firstName" : "JOHN",
"middleName" : "A",
"badge" : "8675309",
"birthDate" : "10/12/1978"
}
}
Yes, you are duplicating data but you're dramatically reducing the number of round trips to the database. This is typically the tradeoff you make when using document based databases since you can't join tables.
I'm running a blog-style web application on AppFog (ex Nodester).
It's written in NodeJS + Express and uses Mongoose framework to persist to MongoDB.
MongoDB is version 1.8 and I don't know whether AppFog is going to upgrade it to 2.2 or not.
Why this intro? Well, now my "posts" are shown in a basic "paginated" visualization, I mean they're just picked up from mongo, sorted by date descending, a page at a time. Here's a snippet:
Post
.find({pubblicato:true})
.populate("commenti")
.sort("-dataInserimento")
.skip(offset)
.limit(archivePageSize)
.exec(function(err,docs) {
var result = {};
result.postsArray = (!err) ? docs : [];
result.currentPage = currentPage;
result.pages = howManyPages;
cb(null, result);
});
Now, my goal is to GROUP BY 'dataInserimento' and show posts like a "diary", I mean:
1st page => 2012/10/08: I show 3 posts
2nd page => 2012/10/10: I show 2 posts (2012/10/09 has no posts, so I don't allow a white page)
3rd page => 2012/10/11: 35 posts and so on...
My idea is to get first the list of all dates with grouping (and maybe counting posts for each day) then build the pages link and, when a page (date) is visited, query like above, adding date as parameter.
SOLUTIONS:
Aggregation framework would be perfect for that, but I can't get my hands on that version of Mongo, now
Using .group() in some way, but the idea it doesn't work in sharded environments does NOT excite me! :-(
writing a MAP-REDUCE! I think this is the right way to go but I can't imagine how map() and reduce() should be written.
Can you help me with a little example, please?
Thanks
EDIT :
The answer of peshkira is correct, however, I don't know if I need exactly that.
I mean, I will have URLs like /archive/2012/10/01, /archive/2012/09/20, and so on.
In each page, it's enough to have the date for querying for posts. But then I have to show "NEXT" or "PREV" links, so I need to know what's the next or previous day containing posts, if any. Maybe can I just query for posts with dates bigger or smaller than the current, and get the first one's date?
Assuming you have something similar as:
{
"author" : "john doe",
"title" : "Post 1",
"article" : "test",
"created" : ISODate("2012-02-17T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 2",
"article" : "foo",
"created" : ISODate("2012-02-17T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 3",
"article" : "bar",
"created" : ISODate("2012-02-18T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 4",
"article" : "foo bar",
"created" : ISODate("2012-02-20T00:00:00Z")
}
{
"author" : "john doe",
"title" : "Post 5",
"article" : "lol cat",
"created" : ISODate("2012-02-20T00:00:00Z")
}
then you can use map reduce as follows:
Map
It just emits the date as key and the post title. You can change the title to the _id, which will probably be more useful to you. If you store the time of the date you will want to use only the date (without time) as the key, otherwise mongo will group by date time and not only date. In my test case all posts have the same time 00:00:00 so it does not matter.
function map() {
emit(this.created, this.title);
}
Reduce
It does nothing more, then just push all values for a key to an array and then the array is wrapped in a result object, because mongo does not allow arrays to be the result of a reduce function.
function reduce(key, values) {
var array = [];
var res = {posts:array};
values.forEach(function (v) {res.posts.push(v);});
return res;
}
Execute
Using db.runCommand({mapreduce: "posts", map: map, reduce: reduce, out: {inline: 1}}) will output the following result:
{
"results" : [
{
"_id" : ISODate("2012-02-17T00:00:00Z"),
"value" : {
"posts" : [
"Post 2",
"Post 1"
]
}
},
{
"_id" : ISODate("2012-02-18T00:00:00Z"),
"value" : "Post 3"
},
{
"_id" : ISODate("2012-02-20T00:00:00Z"),
"value" : {
"posts" : [
"Post 5",
"Post 4"
]
}
}
],
...
}
I hope this helps