I need to either add new document with children or add child to already existing parent.
The only way I know how to do it is ugly:
public async Task AddOrUpdateCase(params)
{
try
{
await UpdateCase(params);
}
catch (RequestFailedException ex)
{
if (ex.Status != (int)HttpStatusCode.NotFound)
throw;
await AddCase(params);
}
}
private async Task UpdateCase(params)
{
// this line throws when document is not found
var caseResponse = await searchClient.GetDocumentAsync<Case>(params.CaseId)
// need to add to existing collection
caseResponse.Value.Children.Add(params.child);
}
I think there wouldn't be any problem if this document didn't contain collection. You cannot use MergeOrUpload if there are child collections. You need to load them from index and add element.
Is there better way to do it?
Azure Cognitive Search doesn't support partial updates to collection fields, so retrieving the entire document, modifying the relevant collection field, and sending the document back to the index is the only way to accomplish this.
The only improvement I would suggest to the code you've shown is to search for the documents you want to update instead of retrieving them one-by-one. That way, you can update them in batches. Index updates are much more expensive than queries, so to reduce overhead you should batch updates together wherever possible.
Note that if you have all the data needed to re-construct the entire document at indexing time, you can skip the step of retrieving the document first, which would be a big improvement. Azure Cognitive Search doesn't yet support concurrency control for updating documents in the index, so you're better off having a single process writing to the index anyway. This should hopefully eliminate the need to read the documents before updating and writing them back. This is assuming you're not using the search index as your primary store, which you really should avoid.
If you need to add or update items in complex collections often, it's probably a sign that you need a different data model for your index. Complex collections have limitations (see "Maximum elements across all complex collections per document") that make them impractical for scenarios where the cardinality of the parent-to-child relationship is high. For situations like this, it's better to have a secondary index that includes the "child" entities as top-level documents instead of elements of a complex collection. That has benefits for incremental updates, but also for storage utilization and some types of queries.
Related
I am trying to prevent duplicate insertion of an item into the collection due to multiple parallel requests at the same time.
My business logic is, if i dont have an unique item XYZ in the collection,
I will insert it. Otherwise i will just return the document.
Now these item cannot be duplicate in db.
In case of multiple concurrent requests in Nodejs, we are getting duplicate items in the database, as all the requests when read from database at same time, finds the item to not be present and then insert the item leading to duplication.
I know we can prevent this using unique indexes, but i don’t want to use indexes as the collection is very large and holds different kind of data, and we are already using heavy indexing on other collections.
Kindly suggest some other methods how can i handle this?
I can use indexes, but need other solution to avoid memory ram over usage.
Are you using insert? If so, I'd suggest using update with opts upsert=true. This, however, is only atomic when there is a unique index, according to this.
Other than that I don't know if you're using any sort of mutex for your concurrency, if not, you should look into it. Here is a good place to start.
Without either atomic operations or mutex locks, you're not guaranteed any data race safety in parallel threads.
I have this sample data of two objects. Both can be put in a document with below two structures. And it's easy to perform CRUD on both methods. But I want to know which is the more efficient one.
Structure 1:
key1:{ sr.:1, name:'Raj', city: 'Mumbai'}
key2:{ sr.:2, name:'Aman', city: 'Delhi'}
It's easy to create different objects inside a single document using merge property and deletion can be performed using the below code.
db.collection('colName')
.doc('docName')
.update({
[key1]: firebase.firestore.FieldValue.delete(),
})
Structure 2:
It is basically objects in an array.
arr:[ { sr.:1, name:'Raj', city: 'Mumbai'} ,
{ sr.:2, name:'Aman', city: 'Delhi'} ]
The data can be pushed in array arr using the below code.
['arr']: firebase.firestore.FieldValue.arrayUnion(object3)
And the deletion can be performed like this.
['arr']: firebase.firestore.FieldValue.arrayRemove(indexToBeDeleted)
Which one is more efficient when it comes to CRUD operations?
CRUD is 4 different qualities, each of which has additional measurable attributes. Talking about CRUD in the context of firestore adds even more attributes to those as well.
There are firestore limits/quotas: https://cloud.google.com/firestore/quotas
And, There are firestores costs: https://firebase.google.com/docs/firestore/pricing
firestore charges per read.
Storing all your data into one document is cost efficient.
Firestore is optimized for reads.
In the limits/quotas document you may notice that there is a max write rate, to a document, of 1 per second. How frequently would you plan on writing new data into the array of that 1 document? Is 1 document still efficient?
Firestore has a max document size of 1MB.
Are you going to write more than 1MB to a document. After adding the logic to split your document apart is it still efficient?
There are many aspects to think about in designing your data structures. An efficiency of one quality is bound to create inefficiencies in another.
So I've been trying to wrap my head around this one for weeks, but I just can't seem to figure it out. So MongoDB isn't equipped to deal with rollbacks as we typically understand them (i.e. when a client adds information to the database, like a username for example, but quits in the middle of the registration process. Now the DB is left with some "hanging" information that isn't assocaited with anything. How can MongoDb handle that? Or if no one can answer that question, maybe they can point me to a source/example that can? Thanks.
MongoDB does not support transactions, you can't perform atomic multistatement transactions to ensure consistency. You can only perform an atomic operation on a single collection at a time. When dealing with NoSQL databases you need to validate your data as much as you can, they seldom complain about something. There are some workarounds or patterns to achieve SQL like transactions. For example, in your case, you can store user's information in a temporary collection, check data validity, and store it to user's collection afterwards.
This should be straight forwards, but things get more complicated when we deal with multiple documents. In this case, you need create a designated collection for transactions. For instance,
transaction collection
{
id: ..,
state : "new_transaction",
value1 : values From document_1 before updating document_1,
value2 : values From document_2 before updating document_2
}
// update document 1
// update document 2
Ooohh!! something went wrong while updating document 1 or 2? No worries, we can still restore the old values from the transaction collection.
This pattern is known as compensation to mimic the transactional behavior of SQL.
I'd like to swap out all documents for a specific index's type. I'm thinking about this like a database transaction, where I'd:
Delete all documents inside of the type
Create new documents
Commit
It appears that this is possible with ElasticSearch's bulk API, but is there a more direct way?
Based on the following statement, from the elasticsearch Delete by Query API Documentation:
Note, delete by query bypasses versioning support. Also, it is not recommended to delete "large chunks of the data in an index", many times, it’s better to simply reindex into a new index.
You might want to reconsider removing entire types and recreating them from the same index. As this statement suggests, it is better to simply reindex. In fact I have a scenario where we have an index of manufacturer products and when a manufacturer sends an updated list of products, we load the new data into our persistent store and then completely rebuild the entire index. I have implemented the use of Index Aliases to allow for masking the actual index being used. When products changes occur a process is started to rebuild the new index in the background (a process that currently takes about 15 minutes) and then switch the alias to the new index once the data load is complete and delete the old index. So this is completely seamless and does not cause any downtime for our users.
I have a MongoDB database with 2 collections:
groups: { group_slug, members }
users: { id, display name, groups }
All changes to groups are done by changing the members array of the group to include the users ids.
I want to sync these changes across to the users collection by using map/reduce. How can I output the results of map/reduce into an existing collection (but not merging or reducing).
My existing code is here: https://gist.github.com/morgante/5430907
How can I output the results of map/reduce into an existing collection
You really can't do it this way. Nor is this really suggested behaviour. There are other solutions:
Solution #1:
Output the map / reduce into a temporary collection
Run a follow-up task that updates the primary data store from the temporary collection
Clean-up the temporary collection
Honestly, this is a safe way to do this. You can implement some basic retry logic in the whole loop.
Solution #2:
Put the change on a Queue. (i.e. "user subscribes to group")
Update both tables from separates workers that are listening for such events on the queue.
This solution may require a separate piece (the queue), but any large system is going to have such denormalization problems. So this will not be the only place you see this.