I need to add a field to each document of a very large collection in Firestore via the admin sdk for node.js. Some of the documents already have the field so I need to check for its value before setting/updating it. The collection is around 150k documents. Even trying to get the documents with the code below times out.
const documents = await db.collection('collectionName').get()
Is there a special way to handle very large collections?
You should use pagination in order to avoid loading all the documents into memory with a single query. This will let you process documents in batches by specifying the last document snapshot you saw in the prior query in order to make a new query to get the next page of data.
You will want to place a reasonable limit on each query to make sure you're getting a reasonable number of documents in each page.
Related
I'm working on a project that will have a large number (thousands, possibly millions) of documents on a firebase collection, I need to access the average of the value by day of documents that are the same type, each one of them has a field "registered_value", "date" and a "code" to identify its value and type and registered date.
I need to show users the average value by day of the documents that have the same code.
Users can add new documents, edit existing ones or delete the ones created by them
Since I need to get this data frequently it will be too expensive to always read the entire collection every time a user loads the pages that display this info is there a better way store or get the avarege?
I'm working with ReactJS and Node.js
There's nothing built into Firestore to get aggregated values like that. The common approach is to store the aggregated value in a separate document somewhere, and update that document upon every relevant write operation. You can do this either from client-side code, or from Cloud Functions.
For more on this, see the Firebase documentation on aggregation queries and on distributed counters.
I have an application that most of the collections in it are heavily read then write, so I demoralized the data in them, and now I need to handle the normalization of the data, for some collections I used jobs in order to sync the data but that not good enough as for some cases I need the data to be normalized in real-time,
for example:
let's say I have orders collections and users collection.
orders have the user email(for search)
{
_id:ObjectId(),
user_email:'test#email.email'
....
}
now whenever I am changing the user email in users I want to change it in orders as well.
so I find that MongoDB has change stream which looks pretty awesome feature, I have played with it a bit and it gives me the results I need to update my other collections, my question is does anyone use it in production? can I trust on this stream to be always set the update data to update the other collections? how does it affect the DB performance if I have many streams open? also, I use the nodejs MongoDB driver does it has any effect
I've not worked yet with change stream but these cases are very common and can be easily solved by building more normalized schema
Normalization form 1 says among the others "don't repeat data" - so you will save the email in the users collection only
orders collection won't have the email field but will have user_id for joining with users collection with lookup command for joining collections
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
I'm developing a web service that is being load tested. So that I don't waste anyone's time, I won't go into detail about the REST API because it's too complicated to explain.
I'm developing with Node and using MongoDB's native Node driver. I use the following query:
let mediaIdQuery = {_id: {$in: media}};
let result = await db.collection(COLLECTION_MEDIA).findOne(mediaIdQuery);
'media' is an array of strings. I'm essentially searching the database to see if I have a document with an _id that is in the array.
I add these documents elsewhere. In testing, it seems to work fine when the creation of the documents and search query for them are far apart in time. However, if I send a request that creates a document and a search request within ~100 ms of each other it seems that this query fails to find the document.
This is actually an attempt at optimization on my part. In the past, I had a for loop that went through each individual array element and sent a separate search query each time. This, should be able to do it all in 1 query, so I figure it should be faster. When I used the for loop method, everything worked properly.
I'm just really confused why the database can't find the documents when the requests are close in time. I'm 100% sure that the insert queries of the documents occur before the search request occurs because it takes pretty much 0ms to insert the documents, and the insert queries are sent first. Is there some sort of delay in MongoDB's system before the documents are visible? I've never experienced issues with anything like that before, so I'm hesitant to blame it on a delay in the system. Any ideas?
Edit:
My database is not sharded. Everything is stored on 1 instance.
Lets say I have a document
{ "_id" : ObjectId("544946347db27ca99e20a95f"), "nameArray": [{"id":1 , first_name: "foo"}]
Now i need to push a array into nameArray using $push . How does document update in that case. Does document get's retrieved on client and updates happens on client and changes are then reflected to Mongodb database server. Entire operation is carried out in Mongodb Database.
What you are asking here is if MongoDB operations are client-side operations. The short answer is NO.
In MongoDB a query targets a specific collection of documents as mentioned in the documentation and a collection is a group of MongoDB documents which exists within a single database. Collections are simply what tables are in RDBMS. So if query targets a specific collection then it means their are perform on database level, thus server-side. The same thing applies for data modification and aggregation operations.
Sometimes, your operations may involve a client-side processing because MongoDB doesn't provides a way to achieve what you want out of the box. Generally speaking, you only those type of processing when you want to modify your documents structure in the collection or change your fields' type. In such situation, you will need to retrieve your documents, perform your modification using bulk operations.
See the documentation:
Your array is inserted into the existing array as one element. If the array does not exists it is created. If the target is not an array the operation fails.
There is nothing stated like "retriving the element to the client and update it there". So the operation is completely done on the database server side. I don't know any operation that works in the way like you described it. Unless you are chaining a query, with a modify of the item in your client and an update. But these are two separated operations and not one single command.
I am evaluating couchdb for a persistent cart functionality. If I create one docuemnt per user and have each cart item as a field, how many items can I store? In current scenario I can have upto 500 items in a cart.
doc-per-cart or a doc-per-item are both fine choices, neither document sounds like it would get very large (JSON encoding/decoding is slower for very large documents and they must be held entirely in memory). On balance, I'd prefer doc-per-item. Of course, you will need to create a (simple) view to display the cart if you went with doc-per-item.
One good reason to prefer doc-per-item is CouchDB's MVCC. Adding an item to a cart will always create a new document, so you will not need to know the current _rev of the item. When a user wants to delete an item, you will have the _id and the _rev and can easily delete it. If you went with doc-per-cart then you will be constantly updating a document, which requires you to have the current _rev all the time.
Note that doc-per-item will allow duplicates in your cart (the user hits Reload and makes two additions instead of one) but as long as the display of the cart shows this, and the final checkout page does too, then I think it's a reasonable failure mode.
A quick review of the CouchDB overview should make it clear that there is no inherent limit on the number of fields in a CouchDB document, and therefore no limit (aside from available memory) to the number of items you can store in your cart.