Azure Cosmos Db as key value store indexing mode - azure

What indexing mode / policy should I use when using cosmos db as a simple key/value store?
From https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy :
None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes.
Is this because the property used as partition key is indexed even when indexMode is set to “none”? I would expect to need to turn indexing on but specify just the partition key’s path as the only included path.
If it matters, I’m planning to use the SQL API.
EDIT:
here's the information I was missing to understand this:
The item must have an id property, otherwise cosmos db will assign one. https://learn.microsoft.com/en-us/azure/cosmos-db/account-databases-containers-items#properties-of-an-item
Since I'm using Azure Data Factory to load the items, I can tell ADF to duplicate the column that has the value I want to use as my id into a new column called id: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
I need to use ReadItemAsync, or better yet, ReadItemStreamAsync since it doesn't deserialize the response, to get the item without using a query.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemasync?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemstreamasync?view=azure-dotnet

When you set indexingMode to "none", the only way to efficiently retrieve a document is by id (e.g. ReadDocumentAsync() or read_item()). This is akin to a key/value store, since you wouldn't be performing queries against other properties; you'd be specifically looking up a document by some known id, and returning the entire document. Cost-wise, this would be ~1RU for a 1K document, just like point-reads with an indexed collection.
You could still run queries, but without indexes, you'll see unusually-high RU cost.
You would still specify the partition key's value with your point-reads, as you'd normally do.

Related

Can you make a Azure Data Factory data flow for updating data using a foreign key?

I've tried this a few ways and seem to be blocked.
This is nothing more than a daily ETL process. What I'm trying to do is to use ADF and pull in a csv as one of my datasets. With that data I need to update docs in a CosmosDb container, which is the other dataset in this flow. My data really simple.
ForeignId string
Value1 int
Value2 int
Value3 int
The Cosmos docs all have these data items and more. ForeignId is unique in the container and is the partition key. The docs are a composite dataset that actually have 3 other id fields that would be considered the PK in the system of origin.
When you try and use a data flow UPDATE with this data the validation complains that you have to map "Id" to use UPDATE. I have an Id in my document, but it only relates to my collection, not to old, external systems. I have no choice but to use the ForeignId. I have it flowing using UPSERT but, even though I have the ForeignId mapped between the datasets, I get inserts instead of updates.
Is there something I'm missing or is ADF not set up to sync data based on anything other than the a data item named "id"? Is there another option ADF aside from the straight-forward approach? I've read that you can drop updates into the Lookup tasks but that seems like a hack.
The row ID is needed by CosmosDB to know which row to update. It has nothing to do with ADF.
To make this work in ADF, add an Exists transformation in your data flow to see if the row already exists in your collection. Check using the foreign key column in your incoming source data against the existing collection.
If a row is found with that foreign key, then you can the corresponding ID to your metadata, allowing you to include it in your sink.

How to check if multiple keys exist in an EntityKind without also fetching the data at the same time?

I am using Cloud Firestore in Datastore mode. I have a list of keys of the same Kind, some exist already and some do not. For optimal performance, I want to run a compute-intensive operation only for the keys that do not yet exist. Using the Python client library, I know I can run client.get_multi() which will retrieve the list of keys that exist as needed. The problem is this will also return unneeded Entity data associated with existing keys, increasing the latency and cost of the request.
Is there a better way to check for existence of multiple keys?
You could check whether a key exists using keys-only queries as they return only the keys instead of the entities themselves, at lower latency and cost than retrieving entire entities.

Azure Table Storage data modeling considerations

I have a list of users. A user can either login either using username or e-mail address.
As a beginner in azure table storage, this is what I do for the data model for fast index scan.
PartitionKey RowKey Property
users:email jacky#email.com nickname:jack123
users:username jack123 email:jacky#email.com
So when a user logs in via email, I would supply PartitionKey eq users:email in the azure table query. If it is username, Partition eq users:username.
Since it doesn't seem possible to simulate contains or like in azure table query, I'm wondering if this is a normal practice to store multiple row of data for 1 user ?
Since it doesn't seem possible to simulate contains or like in azure
table query, I'm wondering if this is a normal practice to store
multiple row of data for 1 user ?Since it doesn't seem possible to
simulate contains or like in azure table query, I'm wondering if this
is a normal practice to store multiple row of data for 1 user ?
This is a perfectly valid practice and in fact is a recommended practice. Essentially you will have to identify the attributes on which you could potentially query your table storage and somehow use them as a combination of PartitionKey and RowKey.
Please see Guidelines for table design for more information. From this link:
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with
different keys) to enable more efficient queries.

Get all the Partition Keys in Azure Cosmos DB collection

I have recently started using Azure Cosmos DB in our project. For the reporting purpose, we need to get all the Partition Keys in the collection. I could not find any suitable API to achieve it.
UPDATE: According to Brian in the comments below, DISTINCT is now supported. Try something like:
SELECT DISTINCT c.partitionKey FROM c
Prior answer: Idea that could work but for one thing...
The only way to get the actual partition key values is to do a unique aggregate on that field.
You can directly hit the REST endpoint at https://{your endpoint domain}.documents.azure.com/dbs/{your collection's uri fragment}/pkranges to pull back the minInclusive and maxExclusive ranges for each partition but those are hash space ranges and I don't know how to convert those into partition key values nor do a fanout using the actual minInclusive hash.
Also, there is a slim possibility that the pkranges can change between the time you retrieve them and the time you go to do something with them.

Get value of multiple keys for sorted set using stackexchange.redis

I'm using stackexchange.redis for my azure redis processes. Im storing my data as (sorted set) zset. Its key - (value - score).
What i need is to query multiple keys at the same time in a request. Im using "SortedSetRangeByRankWithScores" but it only allows one key for a request.
If its not possible, i need to change all my data structure to simple string as value(json). Then i may be able to query multiple keys.
Any suggested way of doing that?

Resources