I have a questions about what is the best way to structure "user connections" system in MongoDB, like Linkedin works.
I have the next options but I don't know what is the best way in the future, with millions rows in collections.
Option 1: Two collections, user and user-connect relation
one-to-one in user-connect with this fields: user_one, user_two,
status ("pending", "accepted", "rejected", "bloecked"). When user
request connect to another user, I need just insert a new document in
user-connect collection, but I have a problem: I need to follow all
queries to get "contacts" of a user in two fields, user_one and
user_two because just exists one insert between both.
Option 2: The same structure that option 1 but instead of 1 insert, 2
inserts: The first one will be the request, just a document with
status pending, and the second one will be when this connection is
accepted. Two documents, two insert but when I need to query for user
contacts, just need to get information in one field: user_one
Option 3: one-to-many structure, user-connect will has an user_one
field and contacts field, array of user ids, but reading another
forums, this not is recommended because a lot of "match" system in the
future.
Related
I am making an Admin dashboard. I want to show all user's details and their orders. When I want to fetch all documents inside the user collection its returning empty. For more In user collection, each document has some sub-collection. In the account sub-collection, there is a document exists with name details where user account details are available as shown in snapshots.
My code is
export function getUsers() {
return firebase.firestore().collection("users").get();
}
If you store user's details directly in the document instead of 'account' sub-collection then fetching "users" collection will return all users' documents with their data. If you say there's no reason then I'd recommend doing this.
Other option would be to use collectionGroup query on "account" which will fetch all the documents from sub-collections named as "account" i.e. giving you every user's account details.
const snap = await db.collectionGroup('account').get()
const users = snap.docs.map(d => ({id: doc.ref.parent.parent.id, data: d.data()))
Here, id is user's document ID.
Firestore queries only access a single collection, or all collections with a specific name. There is no way to query a collection based on values in another collection.
The most common options are:
Query the parent collection first, then check the subcollection for each document. This approach works best if you have relatively few false positives in the parent collection.
Query all child collections with a collection group query, then check the parent document for each result. This approach works best if you have relatively few false positive in your child collection query.
Replicate the relevant information from the child documents into the parent document, and then query the parent collection based on that. For example, you could add a hasOrders field or an orderCount in the user document. This approach always gives optimal results while querying, but requires that you modify the code that writes the data to accommodate.
The third approach is typically the best for a scalable solution. If you come from a background in relation databases, this sort of data duplication may seen unnatural, but it is actually very common in NoSQL databases where you often have to change your data model to allow the queries your app needs.
To learn more about this, I recommend reading NoSQL data modeling and watching Getting to know Cloud Firestore.
I have a database which contains data from two separate systems/servers. The first is generated locally [I develop and create this data] (users, activity logs, orders, ...). The second comes from a "product provider" [I only have READ access from API] These objects were created by MySQL and sent in JSON. They already have an "id" property.
With NodeJS, I use request to get a product by "id", and then store it with newProduct.save() appends an _id.
In products, "id" is necessary form relationships with the other collections in my database (such as products_price), and access dynamic endpoints, such as "products/:id/promos".
Note that products are constantly being updated externally and I need to be able to update my documents by "id" not by "_id" as the external server has no knowledge about "_id." [id is unique on a collection level, as each collection is a fresh iteration]
For my first question: should I treat "product.id" as a "regular" MongoDB field and use aggregate/lookup to merge documents from my collections? Or should I overwrite ObjectID() with id? (before saving rename "id" to "_id")
At some point, Orders (local) and Products (external) need to form a relationship where Order _id and Product id (or _id) are stored together for easy retrieval.
Which id do I use in this case?
if you are pretty sure that 'id' coming from your product provider API is unique you better use that as _id (overwrite _id), it will save you:
an unneeded index ('_id' is indexed any way)
some CPU cycles that mongoDB would take to produce the ObjectID
some disk and memory space
(*) even if you find yourself dealing with many different product providers, assuming its one is using his own unique product id you could use a combined _id to make it unique as:
_id = {provider: 'foo', id: xxx}
or _id = [provider_name, product_id]
or _id = provider_name + product_id
etc. etc.
in this use case of multiple providers format depends on how you plan to fetch those products later.
I'm building an address-book app that uses a back-end Cloudant database. The database stores 3 types of documents:
-> User Profile document
-> Group document
-> User-to-Group Link document
As the names of the document go, there are users in my database, there are groups for users(like whatsapp), and there are link documents for each user to a group (the link document also stores settings/privileges of that user in that group).
My client-side app on login, queries cloudant for the user document, and each group document using view collation over the link documents of that user.
Then using the groups that I have identified above, I find all the other users of that group.
Now, the challenge is that I need to monitor any changes on the group and user documents. I am using pouchdb on the app side, and can invoke the 'changes' API against the ids of all the group and user documents. But the scale of this can be maybe 500 users in each group, and a logged in user being part of 10-50 groups. That multiplied to 1000s of users will become a nightmare for the back-end to support.
Is my scalability concern warranted? Or is this normal for cloudant?
If I understand your schema correctly, you documents of this form:
{
_id: "user:glynn",
type: "user",
name: "Glynn Bird"
}
{
_id: "group:Developers",
type: "group",
name: "Software Developers"
}
{
_id: "user:glynn:developers"
}
In the above example, the primary key's sorting allows a user and all of its memberships to be retrieved by using startkey and endkey parameters do the database's _all_docs endpoint.
This is "scalable" in the sense that if is efficient for Cloudant retrieve data from a primary or secondary index because the index is held in a b-tree so data with adjacent keys is store next to each other. A limit parameter can be used to paginate through larger data sets.
yes the documents are more or less how you've specified.
Link documents are as follows:
{
"_id": <AutoGeneratedID>,
"type": "link",
"user": user_id,
"group": group_id
}
I've written the following view map function:
if(type == "link") {
emit(doc.user, {"_id": doc.user});
emit([doc.user, doc.group], {"_id": doc.group});
emit([doc.group, doc.user], {"_id": doc.user});
}
using the above 3 indexes and include-docs=true, 1st lets me get my logged-in user document, 2nd lets me get all group documents for my logged-in user (using start and end key), and 3rd lets me get all other user documents for a group (using start and end key again).
Fetching the documents is done, but now I need to monitor changes on users of each group, for this, don't I need to query the changes API with array of user ids ? Is there any other way ?
Cloudant retrieve data from a primary or secondary index because the
index is held in a b-tree so data with adjacent keys is store next to
each other
Sorry, I did not understand this statement ?
Thanks.
Part 1.
I recommend to get rid of the "link" type here - it's good for SQL world, but not for CouchDb.
Instead of this, it is better to utilize a benefit of Document Storage, i.e. store user groups in property "Groups" for "User"; and property "Users" for "Group".
With this approach you can set up filtered replication to process only changes of specific groups and these changes will already contain all the users of the group.
I want to notice, that I made an assumption, that number of groups for a user and number of groups is reasonable (hundreds at maximum) and doesn't change frequently.
Part 2.
You can just store ids in these properties and then use Views to "join" other data. Or I was also thinking about other approach (for my use case, but yours is similar):
1) Group contains only ids of users - no views needed.
2) You create a view of each user contacts, i.e. for each user get all users with whom he has mutual groups.
3) Replicate this view to client app.
When user opens a group, values (such as names and pics of contacts are taken from this local "dictionary").
This approach can save some traffic.
Please, let me know what do you think. Because right now I'm working on designing architecture of my solution. Thank you!)
Definitions
I have Post Model in mongoose:
{
sender: ObjectId, // User Id
title : String,
...
}
I want to list my Post with their User's title.
And I have two choice:
1- List Posts > Extract unique Senders > Query for User titles > Replace Ids with Titles in results
One query to list Posts and one query to list unique Users
2- Use mongoose populate method in schema: sender: {type:ObjectId, ref: User},
And use the new populated value for sender in result like: sender.title
Base on how mongoose populate values may has different number of queries
Question!:
When mongoose populate 'sender' property, What does it do?
Because I need to use the best option for my project (And readable one)!
1- Use a new query for each Id
To List 1000 Post we have 1001 queries!! even when we have repeated users!!
2- Or Use a query for each unique Id
To List 1000 Post from 100 Users we have 101 queries!
3- Or even better list unique User ids and query all together (like choice one)
We have only 2 queries!! (the best if possible)
Option 3 - Mongoose will get the posts and then query users exactly once with the $in operator.
Even after doing this, the performance of doing it manually will always be better because mongoose blocks the event emitter for the period of time it takes to complete both the queries whereas your code will block it individually for a shorter time, which has better performance. you can use blocked to benchmark is
Ok so I have a pretty simple DB setup in a MEAN app (node, mongoose, mongo) where I have Book records, and User records. A book has a single Owner, and can have any number of shared users which are stored in an array in a field called sharedWith:. Originally I was storing the user records with an email address as the _id field. I now realize this was a dumb move on my part because if someone wants to change their email address it effectively cuts them off from their books.
The app is not live yet, so it's not a fatal mistake.
My question is, once I revert the User documents to using the original hash value for _id, and store those in the Owner and sharedWith fields in the book documents, will I have to query each hash just to retrieve the actual usable user data?
I know mongoose has a .populate() method which will resolve sub documents, but as for inserting them? Will I POST the users as email addresses, then query each and store the resulting hashes? I can do this manually, but I wanted to make sure there is not some secret mongo-sauce that can do this in the DB itself.
Thanks!
If you have the _id available in the frontend for the user. You can directly share him a book by adding the _id to the sharedWith array of a book. But if you don't have the _id of the user available in the frontend, you need to manually get the _id by querying with the email and then store the _id in the sharedWith. As to retrieve the books, populate is indeed the best option to use to get user data.
And to get all books shared with a user you can do something like this,
Book.find({sharedWith:user1._id},function(err,docs){ });
This query can be made efficient if you use an index on sharedWith but that depends on your use case.