Ok so I have a pretty simple DB setup in a MEAN app (node, mongoose, mongo) where I have Book records, and User records. A book has a single Owner, and can have any number of shared users which are stored in an array in a field called sharedWith:. Originally I was storing the user records with an email address as the _id field. I now realize this was a dumb move on my part because if someone wants to change their email address it effectively cuts them off from their books.
The app is not live yet, so it's not a fatal mistake.
My question is, once I revert the User documents to using the original hash value for _id, and store those in the Owner and sharedWith fields in the book documents, will I have to query each hash just to retrieve the actual usable user data?
I know mongoose has a .populate() method which will resolve sub documents, but as for inserting them? Will I POST the users as email addresses, then query each and store the resulting hashes? I can do this manually, but I wanted to make sure there is not some secret mongo-sauce that can do this in the DB itself.
Thanks!
If you have the _id available in the frontend for the user. You can directly share him a book by adding the _id to the sharedWith array of a book. But if you don't have the _id of the user available in the frontend, you need to manually get the _id by querying with the email and then store the _id in the sharedWith. As to retrieve the books, populate is indeed the best option to use to get user data.
And to get all books shared with a user you can do something like this,
Book.find({sharedWith:user1._id},function(err,docs){ });
This query can be made efficient if you use an index on sharedWith but that depends on your use case.
Related
I am making an Admin dashboard. I want to show all user's details and their orders. When I want to fetch all documents inside the user collection its returning empty. For more In user collection, each document has some sub-collection. In the account sub-collection, there is a document exists with name details where user account details are available as shown in snapshots.
My code is
export function getUsers() {
return firebase.firestore().collection("users").get();
}
If you store user's details directly in the document instead of 'account' sub-collection then fetching "users" collection will return all users' documents with their data. If you say there's no reason then I'd recommend doing this.
Other option would be to use collectionGroup query on "account" which will fetch all the documents from sub-collections named as "account" i.e. giving you every user's account details.
const snap = await db.collectionGroup('account').get()
const users = snap.docs.map(d => ({id: doc.ref.parent.parent.id, data: d.data()))
Here, id is user's document ID.
Firestore queries only access a single collection, or all collections with a specific name. There is no way to query a collection based on values in another collection.
The most common options are:
Query the parent collection first, then check the subcollection for each document. This approach works best if you have relatively few false positives in the parent collection.
Query all child collections with a collection group query, then check the parent document for each result. This approach works best if you have relatively few false positive in your child collection query.
Replicate the relevant information from the child documents into the parent document, and then query the parent collection based on that. For example, you could add a hasOrders field or an orderCount in the user document. This approach always gives optimal results while querying, but requires that you modify the code that writes the data to accommodate.
The third approach is typically the best for a scalable solution. If you come from a background in relation databases, this sort of data duplication may seen unnatural, but it is actually very common in NoSQL databases where you often have to change your data model to allow the queries your app needs.
To learn more about this, I recommend reading NoSQL data modeling and watching Getting to know Cloud Firestore.
Lets say we have a collection of user documnets in the mongodb database and each document contains huge amount of fields and data.
We may want to save the email of the user to its document right? So first we need to return the documnet of the user and then save the email :
const account = db.Account.findOne({email}); account.email = value; account.save;
As I said the document has a lot of extra fields we don't want to modify we only want to select and modify the email feild, Now I wonder :
if this has any impact on performance and the duration of the request?
2)Do I need to only select the email field for purpose of enhancing the performance?
What is the best practice to do this?
I'm rebuilding my website which is a search engine for nicknames from the most active forum in France: you search for a nickname and you got all of its messages.
My current database contains more than 60Gb of data, stored in a MySQL database. I'm now rewriting it into a mongodb database, and after retrieving 1 million messages (1 message = 1 document) find() started to take a while.
The structure of a document is as such:
{
"_id" : ObjectId(),
"message": "<p>Hai guys</p>",
"pseudo" : "mahnickname", //from a nickname (*pseudo* in my db)
"ancre" : "774497928", //its id in the forum
"datepost" : "30/11/2015 20:57:44"
}
I set the id ancre as unique, so I don't get twice the same entry.
Then the user enters the nickname and it finds all documents that have that nickname.
Here is the request:
Model.find({pseudo: "danickname"}).sort('-datepost').skip((r_page -1) * 20).limit(20).exec(function(err, bears)...
Should I structure it differently? Instead of having one document for each message, I'm having a document for each nickname and I update the document once I get a new message from that nickname?
I was using the first method with MySQL et it wasn't taking that long.
Edit: Or maybe should I just index the nicknames (pseudo)?
Thanks!
Here are some recommendations for your problem about big data:
The ObjectId already contains a timestamp. You can also sort on it. You could save on some disk space by removing the datepost field.
Do you absolutely need the ancre field? The ObjectId is already unique and indexed. If you absolutely need it and need to keep the datepost seperate too, you could replace the _id field to be your ancre field.
As many mentioned, you should add an index on pseudo. This will make the "get all messages where the pseudo is mahnickname" search much faster.
If the amount of messages per user is low, you could store all of them inside a single Document per user. This would avoid having to skip to a specific page, which can be slow. However, be aware of the 16mb limit. I would personally still have them in multiple documents.
To keep fast query speeds, ensure that all your indexed fields fit in RAM. You can see the RAM consumption of indexed fields by typing db.collection.stats() and looking at the indexSizes sub-document.
Would there be a way for you to not skip documents, but use the time it got written to the database as your pages? If so, use the datepost field or the timestamp in _id for your paging strategy. If you decide on using the datepost, make a compound index on pseudo and datepost.
As for your benchmarks, you can closely monitor MongoDB by using mongotop and mongostat.
In past with my PHP / Rails - MYSQL apps I've used the unique ID of a table record to keep track of a record in an html file.
So I'd keep track of how to delete a record shown like this (15 being the ID of the record):
Delete this record
So now I'm using MongoDB. I've tried the same method but the objectID ._id attribute seems to be a loooong byte string that I can't use conveniently.
What's the most sensible way of binding a link in the view to a record (for deletion, or other purposes or whatever)?
If the answer is to create a new id that's unique for each document in the collection, then what's the best way to generate those unique id's?
Thank you.
You could use a counter instead of the ObjectID
But this could create a problem when inserting a new document after you deleted a previous one.
See this blog post for more detail info on Sequential unique identifiers with Node.js and MongoDB.
Or you could use the timestamp part of the ObjectID:
objectId.getTimestamp().toString()
See the node objectid docs
I've come to really love the couchdb style of organizing and updating data, but there are a few situations where I really need to be able to create an entry and determine if an equivalent entry is already in existence before returning to the user. The only situation that this is absolutely necessary for my application is user registration. I'm fine with having all user registration writes go to a particular, designated couchdb instance known as the "registration-instance".
I want to hash the user_id into some _id to use. Then execute a put with this _id, but fail if the _id is already inserted. I need to return to the user that the user name is already reserved, and I cannot detect the conflict later and resolve it at that point, because the user would be under the impression that they had reserved the user name.
I don't see why couchdb couldn't provide some way to do this, under the assumption that you designate that inserts for a particular "type" of document always are routed to a particular instance.
If you send a single CouchDB server a PUT request for a new user document you should get the behavior you want already.
If the document does not exist then it will create the new document.
If the document does exist then it is guaranteed to return a 409 conflict error. This is due to the fact that you did not supply a _rev property because you aren't trying to update the pre-existing document.
Only when the _id and _rev properties match will CouchDB update the existing document.
You might also want to read up on document update handlers:
http://wiki.apache.org/couchdb/Document_Update_Handlers
You might use an update handler to hash the user_id and dynamically assign the appropriate _id. You can also customize what kind of error response couch sends with an update handler.
Good luck!