How to update a Cosmos DB document partition key element? - azure

I need to update a document changing the value of the element being used as the partition key. The documentation says that a document is uniquely identified by its id and partition key.
So, if I change the partition key will this always create new document?
Or, will it only create a new document if it is placed on another partition?
If a new document is always created then I think the safest way to update is
Create new document.
If successful, delete old document.
Failure to delete will result in duplicate data but at least the data is not lost.
If a new document is not always created, how can I identify the cases where a new document was created so that I can delete the old one? I don't want to delete anything without having the new one created first since there is no transactional way to do this.
Regards All.

Trying to update the partition key value will simply fail.
Trying to upsert the partition key value will create a new document with the same id in a different logical partition.
What the process should be is:
Keep the old document in memory
Delete the old document
Create the new document
If the later fails then recreate the old document
Cosmos DB doesn't support transactions so there is no way to do this otherwise, and you can't use a stored procedure as they only run against a single logical partition.

Related

Azure Cosmos Db as key value store indexing mode

What indexing mode / policy should I use when using cosmos db as a simple key/value store?
From https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy :
None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes.
Is this because the property used as partition key is indexed even when indexMode is set to “none”? I would expect to need to turn indexing on but specify just the partition key’s path as the only included path.
If it matters, I’m planning to use the SQL API.
EDIT:
here's the information I was missing to understand this:
The item must have an id property, otherwise cosmos db will assign one. https://learn.microsoft.com/en-us/azure/cosmos-db/account-databases-containers-items#properties-of-an-item
Since I'm using Azure Data Factory to load the items, I can tell ADF to duplicate the column that has the value I want to use as my id into a new column called id: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
I need to use ReadItemAsync, or better yet, ReadItemStreamAsync since it doesn't deserialize the response, to get the item without using a query.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemasync?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemstreamasync?view=azure-dotnet
When you set indexingMode to "none", the only way to efficiently retrieve a document is by id (e.g. ReadDocumentAsync() or read_item()). This is akin to a key/value store, since you wouldn't be performing queries against other properties; you'd be specifically looking up a document by some known id, and returning the entire document. Cost-wise, this would be ~1RU for a 1K document, just like point-reads with an indexed collection.
You could still run queries, but without indexes, you'll see unusually-high RU cost.
You would still specify the partition key's value with your point-reads, as you'd normally do.

does cosmosdb update delete record even only single field change

I am trying to understand how cosmosdb udpate works? In cosmosdb, there is a upsert operation to update or insert depending on whether the item exists in container or not. usually the flow is like this:
record = client.read_item(id, partition_key)
record['one_field'] = 'new_value'
client.upsert(record)
My doubt here is whether such update operation will delete the original record even only a singe field is changed? If that is the case, then update becomes expensive is the record is large in size. Is my understanding correct here?
Cosmos DB updates a document by replacing it, not by in-place update.
If you query (or read) a document, and then update some properties, you would then replace the document. Or, as you've done, call upsert() (which is similar to a replace, except that it will create a new document if the specified partition+id doesn't exist already).
The notion of "expensive" is not exactly easy to quantify; look at the returned headers to see the RU charge for a given upsert/replace, to determine the overall cost, and whether you'll need to scale your RU/sec setting based on overall usage patterns.

Adding Partition key to the existing collection- Azure Cosmos DB

Is there any way that we can add Partition Keys for the Collections we already have in Azure-Cosmos DB, or we need to drop them and create new collections with partition keys and import the data from the previous collections?
I tried googling a lot and checking the settings of the collection but nothing helped. if you could that would be great, thanks in advance.
Once created, a collections partition key definition cannot change. This means that you cannot add, remove or alter the partition key of a collection once created.
You can use the Cosmos DB change feed to migrate to a new collection with the appropriate partition key.

cosmosdb bulkdelete without partitionkey

Am trying to do a queried document bulk delete using storedprocedure on cosmosdb collection. I have used sample code from here .
https://github.com/Azure/azure-documentdb-js-server/blob/master/samples/stored-procedures/bulkDelete.js
When I try to execute the query, am forced to provide a partition key which I do not know. I want to execute a fan out delete query based on query criteria which do not include the partition key. What are other ways I can try and delete documents in bulk from a cosmosdb collection ?
If the collection the stored procedure is registered against is a
single-partition collection, then the transaction is scoped to all the
documents within the collection. If the collection is partitioned,
then stored procedures are executed in the transaction scope of a
single partition key. Each stored procedure execution must then
include a partition key value corresponding to the scope the
transaction must run under.
You could refer to the description above which mentioned here.
Surely, if your collection has been partitioned, you also need to offer partition key when you operate the collections or documents in it. More details from here.
So, base on your situation that you do not know the partition key, I suggest you set EnableCrossPartitionQuery to true in the FeedOptions when executing deletion.(has performance bottleneck)
Hope it helps you.

Can ElasticSearch delete all and insert new documents in a single query?

I'd like to swap out all documents for a specific index's type. I'm thinking about this like a database transaction, where I'd:
Delete all documents inside of the type
Create new documents
Commit
It appears that this is possible with ElasticSearch's bulk API, but is there a more direct way?
Based on the following statement, from the elasticsearch Delete by Query API Documentation:
Note, delete by query bypasses versioning support. Also, it is not recommended to delete "large chunks of the data in an index", many times, it’s better to simply reindex into a new index.
You might want to reconsider removing entire types and recreating them from the same index. As this statement suggests, it is better to simply reindex. In fact I have a scenario where we have an index of manufacturer products and when a manufacturer sends an updated list of products, we load the new data into our persistent store and then completely rebuild the entire index. I have implemented the use of Index Aliases to allow for masking the actual index being used. When products changes occur a process is started to rebuild the new index in the background (a process that currently takes about 15 minutes) and then switch the alias to the new index once the data load is complete and delete the old index. So this is completely seamless and does not cause any downtime for our users.

Resources