For those of you who worked with Firebase (Firestore), then you can have a collection with documents in which each document has an id, then a collection can hold a sub collection (equivalent of an embedded document as an array of documents as a property).
Then, this sub collection can hold many documents while each has an id.
In Firestore the sub collection is lazy loaded.
It will fetch the documents on that collection, but if there is one or more sub collections, it won't retrieve it unless specifically going to that route. e.g: collection/document/subcollectionname/anotherdocument
So 2 questions:
Are embedded documents lazy loaded? I don't want to get a document with all of its embedded documents (possibly a million) unless I explicitly access to it.
How can I make sure each embedded document in MongoDB gets an "_id" in the form of ObjectID("blablabla")?
EDIT:
I currently have a firestore implementation which has a subcollection/s practice behind it.
Example: organization => documentId => projects => projectId => activities => :activityType => activityId
organization collection that holds documents (each document = organization).
Each organization document holds a schema (id, name, language, etc..) and a few subcollections in which one of them is projects subcollection
projects sub collection holds documents of projects.
a project document holds the project schema (id, name, location, etc..) and one subcollection named activities.
activities sub collection holds its own schema (id, type, category, etc...) and 6 more sub collections, each represents an activity type.
each activity sub collection holds its own schema. No more sub collections.
Now, the good thing about it is that if I choose to get all organizations, then I will get only the documents of the organization collection and NOT the embedded subcollections (projects, etc..) while in MongoDB, I would get EVERYTHING per document.
How do I achieve the same nested documents with their own nested documents structure with the lazy load effect in MongoDB?
How many activities can a single project have? If there's no limit then you're better off creating a root level collection for activities. In MongoDB, the maximum BSON document size is 16 MB. That being you may not be able to store all projects and their activities in a single document (organization document).
I would create 3 collections namely - organizations, projects and activities.
Each organization should have a document in organizations collection similar to that you have in Firestore.
Each project should have a document in projects collection containing a field "organizationID" so you can query projects of a specific organization using their ID. This is equivalent of a document in your projects sub-collection. Every project must also have it's own unique ID.
Each activity should have a document in activities collection containing a field "projectID" so activities of a specific project can be retrieved.
I've added those additional organizationID, projectID fields even though you have _id just in case you'd like to have Firestore Document IDs there for easier side-by-side queries.
You don't have to worry about 16 MB document size limit this way and it'll be easier to query both projects and activities as long as you have the correct IDs.
Querying activities of a certain project:
await db.collection("activities").find({projectID: "myProjectID"}).toArray()
Thereafter it's upto you how you want to write queries with projections, aggregation, etc.
Related
I am trying to query my Firestore collection (in Node.js for my flutter app), and to get the 10 documents which has the most objects in their subcolllection called Purchases (to get the best sellers).
Is it possible in Firestore? Or do I have to keep an int field outside of my subcollection to represent its length?
Thank you!
I thought this was answered recently, but can't find it right now, so...
Firestore queries (and other read operations) work on a single collection, or a group of collections with the same name. They don't consider any data in other (nested or otherwise) collections, nor can they query based on aggregates (such as the number of documents), unless those aggregates are stored in a document in the queried collection.
So the solution is indeed to keep a counter in a document in the collection you are querying against, and updating that counter with every add/delete to the subcollection.
I am trying to implement relations on collections. My requirement is
Post request 1, json body:
{
"username":"aaa",
"password":"bbb",
"role":"owner",
"company":"SAS"
}
Post request 2, creating from first document so I got company name from previous json body:
{
"username":"eee",
"password":"fff",
"role":"engineer",
"company":"SAS"
}
Post request 3, creating from first document so I got company name from previous json body:
{
"username":"uuu",
"password":"kkk",
"role":"engineer",
"company":"SAS"
}
Post request 4, next company json body:
{
"username":"hhh",
"password":"ggg",
"role":"owner",
"company":"GVG"
}
Here company is foreign key field. How can I achieve company with id field without failing one another like transactions.
In mysql I will create two tables company, user and using transactions i will insert in both tables in single post using id's if any update in company name id will remain same for owner and engineer.
How can I achieve these in mongodb, with node.js?
In online searches I have found most suggest avoid transactions and using mongodb functionalities like mongodb embedded.
I would suggest you to start with making schemas for user and company using mongoose. Its an ODM(object document mapper) which is almost always used with node.js and mongodb
Now this is one to many relations. In relational databases as you have mentioned, you would make a company table and a user table.
In mongodb it "depends". If its one to "few" relationship you would just nest the users array into company's collection. Then since you are only updating a single document(pushing user to users array in company's document), you wont be needing any transactions. Single document update is always atomic(no matter how many fields you update on the same document).
But if each company can have large number of users(ever growing nested array is not good, as it can cause data fragmentation and bad performance), then its better to store the company's id in user's document. And even in this case you would not need transaction, since you are not updating the company's document.
Another reason for storing user as separate collection, is query issues. If you just want to query users its difficult if they are nested in companies. So basically you need to consider how you will query and figure out the number of relations then decide to nest of store is separate collections.
First of all, you should notice that Mongo is document-oriented DB, not a relation one. So if you need transactions and relation model, probably you should try to use any SQL relative database? Especially if you are more familiar with them?
About relation and data modeling: you should this article (or even entire part) at official Mongo docs, Data Modelling.
TL:DR, you could create two separate collections (the same as tables in SQL) like employees, and companies (by default, collection's name will be in plural forms). And store data separately.
So you employees will be stored like you mention above, but companies will be like:
{
_id: ObjectID("35473645632")
name: "SAS"
}, ...
and as for your employees collection, you should store not like, "company":"SAS", but, "company":"ObjectID("35473645632"), or even as array if you want it too. But don't forgot to edit you schema than.
You could use not just MongoDB's default _id but your own one, it could be any unique number/string combination
So, if your company will be renamed, your connection with other documents (employees) still will be there.
To request all/any of your employees with company name's you should use .aggregation framework with $lookup, instead of .find.
Question:
i’m using the Node MongoDB driver. I’m trying to determine whether i should write a single query that gets data from three collections or whether the database needs to have one collection with references or embedded documents etc… that joins these three unrelated collections.
User case:
During search i get an array of objects, i take the first 10 from the array, each object is meta data about a document belonging in one of the three collections. The collections are unrelated but have some common fields and this meta data is the only way to go get information at later stages.
For example, during search i get and store this array in React state (see example object below), then when the user clicks on a search result, i have to go and loop inside this array so that i can go grab the relevant metadata to be able to retrieve more content…
Example Object inside Array of Objects (Meta data):
[{
collection: 'pmc_test',
id_field: 'id_int',
id_type: 'int',
id_value: 2657156
},
{
collection: 'arxiv',
id_field: 'id_int',
id_type: 'int',
id_value: 2651582
},
{
collection: 'crossref',
id_field: 'DOI',
id_type: 'string',
id_value: "10.1098/rsbm.1955.0005"
},
...] // different collections, usually passed with 10 objects
However to display the 10 search results to begin with i have to loop over each object in the array, modify and run a query which could result in 10 separate queries. So i can at least minimise this by doing 3 queries using the $in operator and provide three arrays of IDs representing each collection.
This is still multiple queries, i have to go to the 1st collection, then 2nd collection, then 3rd collection and then combine all the results together for display search results. This is what i'm trying to avoid. This is how each of the three collections roughly look like.
Any suggestions on what querying approach i could use? Will the database benefit from having a single collection / approach that will avoid having to use the meta data to look in three different collections?
Currently this is a massive breaking change to the application resulting in at least 15 features / api calls needing updates, i'd like to maintain the ability to query one collection and suggest this as an optimal change.
Thanks in advance.
Edit
Example collections here:
Arxiv collection: https://gist.github.com/Natedeploys/6734dffccea7b293ca16b5bd7c73a6b6
Crossref collection:
https://gist.github.com/Natedeploys/9b0d3b02c665d7507ed75c9d5fbff159
Pubmed collection (pmc_test):
https://gist.github.com/Natedeploys/09527e8ceaf5d3f0f70ba28984b87a73
You can do all these operations by mongodb aggregation , in your case lookup and group stages will applicable , for further please share (1 document) json data of each collection so it would easy to guide
I am very new to cosmosdb(documentdb), while going through the documentation I keep on reading one thing repeatedly that documentdb is schema free but I feel like collection in analogous to schema and both are logical view.
Wikipedia defined schema as 'The term "schema" refers to the organization of data as a blueprint of how the database is constructed'. I believe collection is also same it's the organization of document, stored prcedure, triggers and UDF.
So my question is, how schema is different from collection?
Collections really have nothing to do with schema. They are just an organizational construct for documents. With Cosmos DB, they serve as:
a transaction boundary. Within a collection, you can perform multiple queries / updates within a transaction, utilizing stored procedures. These updates are constrained to a single collection (more specifically, to a single partition within a collection).
a billing/performance boundary. Cosmos DB lets you specify the number of Request Units (RU) / second to allocate to a collection. Every collection can have a different RU setting. Every collection has a minimum cost (due to minimum amount of RU that must be allocated), regardless of how much storage you consume.
a server-side code boundary. Stored procedures, triggers, etc. are uploaded to a specific collection.
Whether you choose to create a single collection per object type, or store multiple object types within a single collection, is entirely up to you. And unrelated to the shape of your data.
The schema of relational databases is slightly different from the schema of document databases. In simple terms, a relational database is stricter than that of a document schema. In other words, records in an RDBMS table must strictly adhere to the schema, where as we have some amount of flexibility while storing a document into a Document collection.
Conventionally a collection is a set of documents which follows the same schema. But document DBs don't stop one from storing documents with different schema in a single collection. It is the flexibility it gives to the users.
Let us take an example. Let us assume we are storing some customer information.
In relational DB, we might have some structure like
Customer ID INT
Name VARCHAR(50)
Phone VARCHAR(15)
Email VARCHAR(255)
Depending on customer having an email or phone number, they will be recorded as proper values or null values.
ID, Name, Phone, Email
1, John, 83453452, -
2, Victor, -, -
3, Smith, 34535345, smith#jjjj
However in document databases, some columns need to appear in the collection, if they don't have any values.
[
{
id: "123",
name: "John",
phone:"2572525",
},
{
id: "456",
name: "Stephen",
},
{
id: "789",
name: "King",
phone:"2572525",
email:"king#asfaf"
}
]
However it is always advisable to stick to a schema in document db's even if they provide flexibility to store schema-less documents to a collection for maintainability purposes.
I have models based on schemas, for ex. Users, Events, Rooms. All of these entities can have comments enabled. Comments are stored on a seperate model Comments, since I cannot control how many comments each entity might end up having. I want to be able to Find the entity by id of a comment and make the search through all the models I have. Basically something like this:
[list of models to search].Find( { comment_id: id });
Any ideas?
mongodb can search only one collection per command, so for your use case you would need a separate query for each collection.
You may consider storing all your comments in a single collection and associating them back out to their parent model by reference with modelId and modelName fields. I have used that schema successfully. It sounds like you are storing your comments as an array of comment IDs in your parent model, which seems less practical than either fully embedding the comments or fully separating them.
And there's nothing to stop you from saving the data into multiple collections, so in the 'search' collection and also the separate 'users', 'events' collections.
You might even want to use a technology like solr or elastic search, to index your data.