What is the best strategy for grouping linked users - node.js

There are people in a city. A city is represented by a mongodb collection called "cities".
The people in the city are either alone or walking togheter with another person in the same city.
The schema is:
{
name: String,
people: [
{
name: String,
status?: String,
walkingWith?: String
}
]
}
Fields "status" and "walkingWith" are the ones I would like to use, if my strategy is correct.
Here are some entries:
var newyorkPeople = [];
newyorkPeople[0] = {"name": "Jack", "status": "alone", "walkingWith": "none"};
newyorkPeople[1] = {"name": "James", "status": "meeting", "walkingWith": "Maria"};
newyorkPeople[2] = {"name": "Robert", "status": "meeting", "walkingWith": "Nina"};
newyorkPeople[3] = {"name": "Steven", "status": "alone", "walkingWith": "none"};
newyorkPeople[4] = {"name": "Maria", "status": "meeting", "walkingWith": "James"};
newyorkPeople[5] = {"name": "Nina", "status": "meeting", "walkingWith": "Robert"};
I then enter a new city with people in it:
db.cities.insert({"name": "New York", "people": newyorkPeople});
Now, the goal is to make it easy for a client(frontend) to describe what people there are in this city. And group them.
First show all the lone people. And then the "couples" that are walking togheter.
Im not sure if the grouping is better to be done in the backend or in the frontend (angular).
In backend (api) Im using express.js. The api could just return all the city document to the frontend. And then the frontend would be responsible to sort/group the people.
In that case, the strategy Im thinking about would be:
Loop through the people and only print the lone people. Those that are walking with somebody, should go in another array.
So the first step, to show all the lone people, is accomplished.
Now, I still need to show couples. First I need to show the couple "James and Maria" and then the couple "Robert and Nina".
Should I create an array for each couple? In the example above, it should create 2 arrays.
However, Im not sure this is the best strategy. Im fine in modifying the db-schema or even to let the backend deliver the grouped people if somebody could come with some good suggestion.

You can use the following(simplified of yours) schema
{
name:Stirng, //name of the person
city:String, //name of the city
status:String, //status
walkingWith:String //name of the person walking with
}
The benefit of using this schema is, it can make your queries easier.
Let's query your need.
1- all people in a city
db.city.aggregate([
{$group:{_id:"$city", people:{$push:"$name"}}}
])
2- all people in a city alone
db.city.aggregate([
{$match:{status:"alone"}},
{$group:{_id:"$city", people:{$push:"$name"}}}
])
3- all people in a city meeting with someone
db.getCollection('demos').aggregate([
{$match:{status:"meeting"}},
{$group:{_id:"$city", people:{$push:{name:"$name", walkingWith:"$walkingWith"}}}}
])

Related

Best practices for structuring hierarchical/classified data in mongodb

Summary:
I am building my first large scale full stack application(MERN stack) that is trying to mimic a large scale clothing store. Each article of clothing has many 'tags' that represent its features, top/bottom/accessory/shoes/ect, and subcategories, for example on top there is shirt/outerwear/sweatshirt/etc, and sub-sub-categories within it, for example on shirt there is blouse/t-shirt/etc. Each article has tags for primary colors, hemline, pockets, technical features, the list goes on.
Main question:
how should I best organize the data in mongodb with mongoose schemas in order for it to be quickly searchable when I plan on having 50,000 or more articles? And genuinely curious, how do large clothing retailers typically design databases to be easily searchable by customers when items have so many identifying features?
Things I have tried or thought of:
On the mongoDB website there is a recommendation to use a tree structure with child references. here is the link: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-child-references/ I like this idea but I read here: https://developer.mongodb.com/article/mongodb-schema-design-best-practices/ that when storing over a few thousand pieces of data, using object ID references is no longer sufficient, and could create issues because of datalimits.
Further, each clothing item would fall into many different parts of the tree. For example it could be a blouse so it would be in the blouse "leaf" of the tree, and then if its blue, it would be in the blue "leaf" of the tree, and if it is sustainably sourced, it would fall into that "leaf" of the tree as well. Considering this, a tree like data structure seems not the right way to go. It would be storing the same ObjectID in many different leaves.
My other idea was to store the article information (description, price, and picture) seperate from the tagging/hierarchical information. Then each tagging object would have a ObjectID reference to the item. This way I could take advantage of the propogate method of mongoose if I wanted to collect that information.
I also created part of the large tree structure as a proof of concept for a design idea I had, and this is only for the front end right now, but this also creates bad searches cause they would look like taxonomy[0].options[0].options[0].options[0].title to get to 'blouse'. Which from my classes doesnt seem like a good way to make the code readable. This is only a snippet of a long long branching object. I was going to try to make this a mongoose schema. But its a lot of work and I wanna make sure that I do it well.
const taxonomy = [
{
title: 'Category',
selected: false,
options: [
{
title: 'top',
selected: false,
options: [
{
title: 'Shirt',
selected: false,
options: [
{
title: 'Blouse',
selected: false,
},
{
title: 'polo',
selected: false,
},
{
title: 'button down',
selected: false,
},
],
},
{
title: 'T-Shirt',
selected: false,
},
{
title: 'Sweater',
selected: false,
},
{
title: 'Sweatshirt and hoodie',
selected: false,
},
],
},
Moving forward:
I am not looking for a perfect answer, but I am sure that someone has tackled this issue before (all big businesses that sell lots of categorized products have) If someone could just point me in the right direction, for example, give me some terms to google, some articles to read, or some videos to watch, that would be great.
thank you for any direction you can provide.
MongoDB is a document based database. Each record in a collection is a document, and every document should be self-contained (it should contain all information that you need inside it).
The best practice would be to create one collection for each logical whole that you can think of. This is the best practice when you have documents with a lot of data, because it is scalable.
For example, you should create Collections for: Products, Subproducts, Categories, Items, Providers, Discounts...
Now, when you creating Schemas, instead of creating nested structure, you can just store a reference of one collection document as a property of another collection document.
NOTE: The maximum document size is 16 megabytes.
BAD PRACTICE
Let us first see what would be the bad practice. Consider this structure:
Product = {
"name": "Product_name",
"sub_products": [{
"sub_product_name": "Subpoduct_name_1",
"sub_product_description": "Description",
"items": [{
"item_name": "item_name_1",
"item_desciption": "Description",
"discounts": [{
"discount_name": "Discount_1",
"percentage": 25
}]
},
{
"item_name": "item_name_2",
"item_desciption": "Description",
"discounts": [{
"discount_name": "Discount_1",
"percentage": 25
},
{
"discount_name": "Discount_2",
"percentage": 50
}]
},
]
},
...
]
}
Here product document has sub_products property which is an array of sub_products. Each sub_product has items, and each item has discounts. As you can see, because of this nested structure, the maximum document size would be quickly exceeded.
GOOD PRACTICE
Consider this structure:
Product = {
"name": "Product_name",
"sub_products": [
'sub_product_1_id',
'sub_product_2_id',
'sub_product_3_id',
'sub_product_4_id',
'sub_product_5_id',
...
]
}
Subproduct = {
"id": "sub_product_1_id",
"sub_product_name": "Subroduct_name",
"sub_product_description": "Description",
"items": [
'item_1_id',
'item_2_id',
'item_3_id',
'item_4_id',
'item_5_id',
...
]
}
Item = {
"id": "item_1_id",
"item_name": "item_name_1",
"item_desciption": "Description",
"items": [
'discount_1_id',
'discount_2_id',
'discount_3_id',
'discount_4_id',
'discount_5_id',
...
]
}
Discount = {
"id": "discount_1_id",
"discount_name": "Discount_1",
"percentage": 25
}
Now, you have collection for each logical whole and you are just storing a reference of one collection document as a property of another collection document.
Now you can use one of the best features of the Mongoose that is called population. If you store a reference of one collection document as a property of another collection document, when performing querying of the database, Mongoose will replace references with the actual documents.

How to sort data in mongodb - best practice

I'm rather new to working with MongoDB.
In my application, the user can create to-do-lists. I save the data of these to-do-lists to my database using node.js with express framework and mongoose (with a ReactJS front-end), however, the user is supposed to be able to create several to-do-lists and I'm not sure about how to best sort the data of these lists so I can always access the correct data in my corresponding to-do-list.
Let's say I have this schema:
var TodoSchema = new mongoose.Schema({
task: String,
prio: String,
updated_at: { type: Date, default: Date.now },
});
module.exports = mongoose.model("Todo", TodoSchema);
for my database called tododb.
I was first planning on creating a new collection for each new list, but in this question ( how to create a new collection automatically in mongodb ) it says that it would be much better to create one collection for all lists, however, I'm not sure about how you would filter out the correct data in this case.
I imagine that I'm not the first person to encounter this problem, so how is it done usually? What other options do I have besides collections? And how would I access exactly the data that I need?
Edit: I was also thinking about just adding an element called "name" or something similar, where the user could enter a name for the list, and when fetching the data I would iterate over all data and filter out the once whose name matches, however, that seems terribly inefficient.
I'd model a todo list like the following:
{
"_id": "id of the todo list",
"name": "name of the todo list (e.g. daily tasks)",
"tasks" : [
{"name": "drink coffee", priority: 1, updated: "sometime" },
{"name": "write code", priority: 2, updated: "sometime" },
{"name": "drink tea", priority: 3, updated: "sometime" }
]
}
and then put them all in the same collection, if you need to split by user, just add a userId field to the todo list document.

Conditionally update an array in mongoose [duplicate]

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

How to populate a non related field mongoose

Document Role =
{ "_id" = "12345",
Name = "Developer"
},
{ "_id" = "67890",
Name = "Manager"
}
Document Employee =
{ "_id" = "00000",
"Name"= "Jack",
"Roles"= [{_id:"12345"},{_id:"67890"}]
}
I want to select one Role and list all the users having the same role
How to do that?
I want to get some thing like.
{ "_id" = "12345",
Name = "Developer"
Employees = [{"_id":"00000"}]
}
Is it possible to use populate to achieve this?
Mongoose .populate() and other methods you might find are not "join magic" for MongoDB. What they in fact all do is execute "additional" query(ies) operations on the database and "merge" the results "under the hood" for your as opposed to you doing the work yourself.
So your best option as long as you can deal with it is to use "embedding" which keeps the "related" information in the document for which you are "pairing" it to, such as for "Roles":
{
"_id": "0000",
"name": "Developer",
"employees": [{ "_id": "12345", "name": "Jack" }]
}
Which is simple, but of course comes at it's own cost and dealing with the "embedded" entries and how you use it according to "updating" or "reading" as is appropriate. It's a single "read" operation, but "updates" may be more costly due to the need to update the embedded information in multiple places, and multiple documents.
If you can "live" with "referencing" and the cost it incurs then you can always do this:
var rolesSchema = Schema({
"name": String,
"emloyees": [{ "type": Schema.Types.ObjectId, "ref": "Employee" }]
});
var employeesSchema = Schema({
"name": String,
"roles": [{ "type": Schema.Types.ObjectId, "ref": "Role" }]
});
var Role = mongoose.model('Role',rolesSchema);
var Employee = mongoose.model('Employee',employeeSchema);
Role.find({ "_id": "12345"}).populate("employees").exec(function(err,docs) {
// populated "joined" results in here
})
What this does behind the scenes is effectively (basic JavaScript representation and "at best") :
var roles = db.role.find({ "_id": "12345" }).map(function(doc) {
doc.employees = doc.employees.map(function(employee) {
return db.employees.find({ "_id": { "$in": doc.employees } }).toArray();
})
})
Mongoose works on the concept of using the "schema" definition to "know" which collection to execute the "other query" on and then return the "joined" results to you. But it is not a single query but multiple hits to the database.
Other schemes might "keep" the referenced collection information in the document itself, as opposed to relying on the "model code" to get that information. But the same principle applies where you need to make another call to the database and perform some type of "merge" in the API provided.
So it all falls down to your choice. Either you "embed" the data and live with that cost, or you "reference" the data and live with the network "cost" that is associated with multiple database hits.
The key point here is "nothing is free", and not even the way that SQL RDBMS perform "joins" which also has a "cost" of it's own and is a lot of the reasoning why NoSQL solutions like MongoDB do it this way and "do not support joins" in a native fashion for the "cost" involved in distributed data systems.
The main lesson here is to "do what suits you and your application", and not just choose the "coolest thing right now", but basically expect what you get from choosing different storage solutions. They all have their own purposes. Horses for Courses as the saying goes.

CouchDB: is it possible to access linked documents inside filter function?

In a contact management app, each user will have his own database. When users wish to share certain categories of contacts with others, a backend will initiate a replication. Each contact is its own document, but also has various children documents such as notes and appointments.
Here is an example...
Contact:
{
"_id": 123,
"type": "contact",
"owner": "jimmy",
"category": "customer",
"name": "Bob Jones",
"email": "bob#example.com"
}
Note:
{
"_id": 456,
"type": "note",
"owner": "jimmy",
"contact_id": 123,
"timestamp": 1383919278,
"content": "This is a note about Bob Jones"
}
So let's say Jimmy wants to share his only his customers with sales manager Kevin, while his personal contacts remain private. When the note passes through the replication filter, is it possible to access the linked contact's category field?
Or do I have to duplicate the category field in every single child of a contact? I would prefer not to have to do this, as each contact may have many children which I would have to update manually every time the category changes.
Here is some pseudo-code for the filter function:
function(doc, req)
{
if(doc.type == “contact”) {
if(doc.category == req.query.category) {
return true;
}
}
else if(doc.contact_id) {
if(doc.contact.category == req.query.category) {
return true;
}
}
return false;
}
If this is possible, please describe how to do it. Thanks!
There are some other options.
There's a not-so-well-known JOIN trick in CouchDB. Instead of using replication, however, you'll have to share the results of a MapReduce View -- unfortunately you can use a view as a filter for replication. If you're using Cloudant (disclaimer: I'm employed by Cloudant) you can use chained-MapReduce to output the result to another database that you could then replication from...
Additionally, I think this SO post/answer on document structures and this join trick could be helpful: Modeling relationships on CouchDB between documents?
No, this is not possible. Each document must be consistent so it has no any explicit relations with others documents. Having contact_id value as reference is just an agreement from your side - CouchDB isn't aware about this.
You need to literally have category document be nested within contact one to do such trick e.g. have single document to process by filter function. This is good solution from point when you need to have consistent state of contact document.

Resources