Why does my Azure Cosmos DB SQL API Container Refuse Multiple Items With Same Partition Key Value? - azure

In Azure Cosmos DB (SQL API) I've created a container whose "partition key" is set to /part_key and I am now trying to create and edit data in Data Explorer.
I created an item that looks like this:
{
"id": "test_id",
"value": "val000",
"magicNumber": 32,
"part_key": "asdf"
}
I am now trying to create an item that looks like this:
{
"id": "frank",
"value": "val001",
"magicNumber": 33,
"part_key": "asdf"
}
Based on the documentation I believe that each item within a partition key needs a distinct id, which to me implies that multiple items can in fact share a partition key, which makes a lot of sense.
However, I get an error when I try to save this second item:
{"code":409,"body":{"code":"Conflict","message":"Entity with the specified id already exists in the system...
I see that if I change the value of part_key to something else (say asdf2), then I can save this new item.
Either my expectations about this functionality are wrong, or else I'm doing this wrong somehow. What is wrong here?

Your understanding is correct, It could happen if you are trying to instead a new document with id equal to id of the existing document. This is not allowed, so operation fails.
Before you insert the modified copy, you need to assign a new id to it. I tested the scenario and it looks fine. May be try to create a new document and check

Related

Comparing data in an Azure Index to data that is about to be uploaded

I have an index using the Azure Cognitive Search service. I'm writing a program to automate the upload of new data to this index. I don't want to delete and re-create the index from scratch unnecessarily each time. Is there a way of comparing what is currently in the index with the data that I am about to upload, without having to download that data from there first and manually compare it? I have been looking at the MS documentation and other articles but cannot see a way to do this comparison?
you can use MergeOrUpload operation, so if it's not there it will insert, otherwise update.
Please make sure the IDs are the same otherwise you'll endup always adding new items.
IndexAction.MergeOrUpload(
new Customer()
{
Id = "....",
UpdatedBy = new
{
Id = "..."
}
}
)
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.search.models.indexactiontype?view=azure-dotnet

How to show unique keys on Cosmos DB container?

This link implies that unique keys can be seen in a Cosmos DB container by looking at the settings. However I can't seem to find them using both the portal and the storage explorer. How can you view the unique keys on an existing Cosmos DB container? I have a document that fails to load due to a key violation which should be impossible so I need to confirm what the keys are.
A slightly easier way to view your Cosmos DB unique keys is to view the ARM template for your resource.
On your Cosmos DB account, click Settings/ Export Template- let the template be generated and view online once complete. You will find them under the "uniqueKeyPolicy" label.
Based on this blob, unique keys policy should be visible like below:
"uniqueKeyPolicy": {
"uniqueKeys": [
{
"paths": [
"/name",
"/country"
]
},
{
"paths": [
"/users/title"
]
}
]
}
However, I could not see it on the portal as same as you. Maybe it's a bug here.
You could use cosmos db sdk as a workaround to get the unique keys policy, please see my java sample code.
ResourceResponse<DocumentCollection> response1 = documentClient.readCollection("dbs/db/colls/test", null);
DocumentCollection coll =response1.getResource();
UniqueKeyPolicy uniqueKeyPolicy = coll.getUniqueKeyPolicy();
Collection<UniqueKey> uniqueKeyCollections = uniqueKeyPolicy.getUniqueKeys();
for(UniqueKey uniqueKey : uniqueKeyCollections){
System.out.println(uniqueKey.getPaths());
}
Here is the basic code that worked for me. The code that writes the collection is output in Json format. I think this is similar to what you see in the portal but it skips or omits the uniqueKeyPolicy information.
As a side note I think I found a bug or odd behavior. Inserting a new document can throw unique index constraint violation but updates do not.
this.EndpointUrl = ConfigurationManager.AppSettings["EndpointUrl"];
this.PrimaryKey = ConfigurationManager.AppSettings["PrimaryKey"];
string dbname = ConfigurationManager.AppSettings["dbname"];
string containername = ConfigurationManager.AppSettings["containername"];
this.client = new DocumentClient(new Uri(EndpointUrl), PrimaryKey);
DocumentCollection collection = await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri(dbname, containername));
Console.WriteLine("\n4. Found Collection \n{0}\n", collection);
Support for showing unique key policy in collection properties will be added soon. Meanwhile you can use DocumentDBStudio to see unique keys in collection. Once unique key policy is set, it cannot be modified.
WRT odd behavior, can you please share full isolated repro and explain expected and actual behavior.
Here you can view the ARM template in your Azure Portal, and as the winner comment says You will find the unique keys under the "uniqueKeyPolicy" label.

Case insensitive search in arrays for CosmosDB / DocumentDB

Lets say I have these documents in my CosmosDB. (DocumentDB API, .NET SDK)
{
// partition key of the collection
"userId" : "0000-0000-0000-0000",
"emailAddresses": [
"someaddress#somedomain.com", "Another.Address#someotherdomain.com"
]
// some more fields
}
I now need to find out if I have a document for a given email address. However, I need the query to be case insensitive.
There are ways to search case insensitive on a field (they do a full scan however):
How to do a Case Insensitive search on Azure DocumentDb?
select * from json j where LOWER(j.name) = 'timbaktu'
e => e.Id.ToLower() == key.ToLower()
These do not work for arrays. Is there an alternative way? A user defined function looks like it could help.
I am mainly looking for a temporary low-effort solution to support the scenario (I have multiple collections like this). I probably need to switch to a data structure like this at some point:
{
"userId" : "0000-0000-0000-0000",
// Option A
"emailAddresses": [
{
"displayName": "someaddress#somedomain.com",
"normalizedName" : "someaddress#somedomain.com"
},
{
"displayName": "Another.Address#someotherdomain.com",
"normalizedName" : "another.address#someotherdomain.com"
}
],
// Option B
"emailAddressesNormalized": {
"someaddress#somedomain.com", "another.address#someotherdomain.com"
}
}
Unfortunately, my production database already contains documents that would need to be updated to support the new structure.
My production collections contain only 100s of these items, so I am even tempted to just get all items and do the comparison in memory on the client.
If performance matters then you should consider one of the normalization solution you have proposed yourself in question. Then you could index the normalized field and get results without doing a full scan.
If for some reason you really don't want to retouch the documents then perhaps the feature you are missing is simple join?
Example query which will do case-insensitive search from within array with a scan:
SELECT c FROM c
join email in c.emailAddresses
where lower(email) = lower('ANOTHER.ADDRESS#someotherdomain.com')
You can find more examples about joining from Getting started with SQL commands in Cosmos DB.
Note that where-criteria in given example cannot use an index, so consider using it only along another more selective (indexed) criteria.

DocumentDB and Azure Search: Document removed from documentDB isn't updated in Azure Search index

When i remove a document from DocumentDB it wont be removed from the Azure Search Index. The index will update if i change something in a document.
I'm not quite sure how i should use this "SoftDeleteColumnDeletionDetectionPolicy" in the datasource.
My datasource is as follows:
{
"name": "mydocdbdatasource",
"type": "documentdb",
"credentials": {
"connectionString": "AccountEndpoint=https://myDocDbEndpoint.documents.azure.com;AccountKey=myDocDbAuthKey;Database=myDocDbDatabaseId"
},
"container": {
"name": "myDocDbCollectionId",
"query": "SELECT s.id, s.Title, s.Abstract, s._ts FROM Sessions s WHERE s._ts > #HighWaterMark"
},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
},
"dataDeletionDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName": "isDeleted",
"softDeleteMarkerValue": "true"
}
}
And i have followed this guide:
https://azure.microsoft.com/en-us/documentation/articles/documentdb-search-indexer/
What am i doing wrong? Am i missing something?
I will describe what I understand about SoftDeleteColumnDeletionDetectionPolicy in a data source. As the name suggests, it is Soft Delete policy and not the Hard Delete policy. Or in other words, the data is still there in your data source but it is somehow marked as deleted.
Essentially the way it works is periodically Search Service will query the data source and checks for the entries that are deleted by checking the value of the attribute defined in SoftDeleteColumnDeletionDetectionPolicy. So in your case, it will query the DocumentDB collection and find out the documents for which isDeleted attribute's value is true. It then removes the matching documents from the Index.
The reason it is not working for you is because you are actually deleting the records instead of changing the value of isDeleted from false to true. Thus it never finds matching values and no changes are done to the index.
One thing you could possibly do is instead of doing Hard Delete, you do Soft Delete in your DocumentDB collection to begin with. When the Search Service re-indexes your data, because the document is soft deleted from the source it will be removed from the index. Then to save storage costs at the DocumentDB level, you simply delete these documents through a background process some time later.

Point-in-time restores of databases and documents using Cloudant

How can I save changes in CouchDB / Cloudant in order to later do point-in-time restores of my databases, or even specific documents?
We’re working on making this a first-class feature, but until we roll it out, this is how one of our customers did it:
You have collections, and within those collections, resources. So, you keep a logging database where every document has an ID like collection-resource, so for a collection named "cars" and a resource named "Ford", you'd have a document in your logging database named cars-ford. That document looks like this:
{
versions: [...]
}
Any time that resource is touched or modified, your application updates the logging document by appending the new version to the end of the versions field. That version might look like this:
{
timestamp: '...', # some integer timestamp, for sorting
doc: {...} # attributes of the document as of the save
}
We'll use that view to return a list of all versions of all documents, sorted by when each change occurred.
Then, here's how you use that to do restores and the like:
Getting the most recent version of a resource
Get the document in its entirety, and grab the last element in the versions field. That's the most recent version.
See all versions relative to a timestamp
We'll create a view to sort by timestamp. The view looks like this:
{
map: "function(doc) {
for(var i in doc.versions){
emit(doc.versions[i].timestamp, doc.versions[i].doc);
}
}"
}
Say our database is named loggy, the design doc where our views live is named restore, and the view itself is named time. Then we'll make a GET request to this URL:
{CLOUDANT_HOST}/loggy/_design/restore/_view/time?startkey='...'
...where the value for startkey is some timestamp. This, unmodified, will return every version after the indicated timestamp. Add limit=X and you'll get the X versions after the timestamp. Add descending=true and you'll get versions before the timestamp, instead of after.
See the Nth revision for a resource
Much like above, but we'll tweak our view a little:
{
map: "function(doc){
for(var i in doc.versions){
emit(i, doc.versions[i].doc);
}
}"
}
Now our view results are keyed by index rather than timestamp. So, instead of passing a timestamp to startkey, we just pass N to versions around the Nth revision.
Getting the number of revisions for a collection or resource
We'll use another view to group by collection and resource:
{
map: "function(doc){
// split te ID into collection and resource
var parts = doc._id.split('-');
// emit them as keys so we can group by them
emit([doc.parts[0], doc.parts[1]], null);
}",
reduce: "_count"
}
Use the query parameter group and group_level to group results by their keys. So, if we want the number of events that have touched resources in the cars collection, we would use a querystring like this:
?group=true&group_level=1&key="cars"
group groups results whose keys are the same, but group_level=1 says "only group on the first key", which in our case is the collection. key specifies to only return documents whose key matches the given value.
Getting all resources for a given collection
Using the _all_docs view, we'll use a querystring like this:
?reduce=false&startkey="{collection}-"&endkey="{collection}0"
Remember the reduce part of our function? That _count value means "return the number of records emitted by map". reduce=false means "Don't do that." Instead, only the map function is run.
That startkey and endkey pair uses how Cloudant sorts results to exclude everything but the values matching IDs that start with the given collection.
Updating docs
Once you've got the versions you'd like to restore, GET the current version of the resource, GET the past version from the loggy database, and PUT the past version to the resource using the current version's _rev value. Bam, restored. Rinse and repeat for point-in-time restore.

Resources