Get all the Partition Keys in Azure Cosmos DB collection - azure

I have recently started using Azure Cosmos DB in our project. For the reporting purpose, we need to get all the Partition Keys in the collection. I could not find any suitable API to achieve it.

UPDATE: According to Brian in the comments below, DISTINCT is now supported. Try something like:
SELECT DISTINCT c.partitionKey FROM c
Prior answer: Idea that could work but for one thing...
The only way to get the actual partition key values is to do a unique aggregate on that field.
You can directly hit the REST endpoint at https://{your endpoint domain}.documents.azure.com/dbs/{your collection's uri fragment}/pkranges to pull back the minInclusive and maxExclusive ranges for each partition but those are hash space ranges and I don't know how to convert those into partition key values nor do a fanout using the actual minInclusive hash.
Also, there is a slim possibility that the pkranges can change between the time you retrieve them and the time you go to do something with them.

Related

Azure Cosmos Db as key value store indexing mode

What indexing mode / policy should I use when using cosmos db as a simple key/value store?
From https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy :
None: Indexing is disabled on the container. This is commonly used when a container is used as a pure key-value store without the need for secondary indexes.
Is this because the property used as partition key is indexed even when indexMode is set to “none”? I would expect to need to turn indexing on but specify just the partition key’s path as the only included path.
If it matters, I’m planning to use the SQL API.
EDIT:
here's the information I was missing to understand this:
The item must have an id property, otherwise cosmos db will assign one. https://learn.microsoft.com/en-us/azure/cosmos-db/account-databases-containers-items#properties-of-an-item
Since I'm using Azure Data Factory to load the items, I can tell ADF to duplicate the column that has the value I want to use as my id into a new column called id: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
I need to use ReadItemAsync, or better yet, ReadItemStreamAsync since it doesn't deserialize the response, to get the item without using a query.
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemasync?view=azure-dotnet
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.container.readitemstreamasync?view=azure-dotnet
When you set indexingMode to "none", the only way to efficiently retrieve a document is by id (e.g. ReadDocumentAsync() or read_item()). This is akin to a key/value store, since you wouldn't be performing queries against other properties; you'd be specifically looking up a document by some known id, and returning the entire document. Cost-wise, this would be ~1RU for a 1K document, just like point-reads with an indexed collection.
You could still run queries, but without indexes, you'll see unusually-high RU cost.
You would still specify the partition key's value with your point-reads, as you'd normally do.

Azure Table Storage data modeling considerations

I have a list of users. A user can either login either using username or e-mail address.
As a beginner in azure table storage, this is what I do for the data model for fast index scan.
PartitionKey RowKey Property
users:email jacky#email.com nickname:jack123
users:username jack123 email:jacky#email.com
So when a user logs in via email, I would supply PartitionKey eq users:email in the azure table query. If it is username, Partition eq users:username.
Since it doesn't seem possible to simulate contains or like in azure table query, I'm wondering if this is a normal practice to store multiple row of data for 1 user ?
Since it doesn't seem possible to simulate contains or like in azure
table query, I'm wondering if this is a normal practice to store
multiple row of data for 1 user ?Since it doesn't seem possible to
simulate contains or like in azure table query, I'm wondering if this
is a normal practice to store multiple row of data for 1 user ?
This is a perfectly valid practice and in fact is a recommended practice. Essentially you will have to identify the attributes on which you could potentially query your table storage and somehow use them as a combination of PartitionKey and RowKey.
Please see Guidelines for table design for more information. From this link:
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with
different keys) to enable more efficient queries.

Get value of multiple keys for sorted set using stackexchange.redis

I'm using stackexchange.redis for my azure redis processes. Im storing my data as (sorted set) zset. Its key - (value - score).
What i need is to query multiple keys at the same time in a request. Im using "SortedSetRangeByRankWithScores" but it only allows one key for a request.
If its not possible, i need to change all my data structure to simple string as value(json). Then i may be able to query multiple keys.
Any suggested way of doing that?

Setting RowKey in Azure Table Storage

I am storing some logs in Azure Table Storage. I've identified the PartitionKey I should use. However, I'm having trouble determining what I should use for the RowKey. If I was using Sql Server, I would use an auto-incrementing integer. From what I can tell, having an auto-generated RowKey is not an option with Azure Table Storage. I'm fine using a GUID, however, everyone seems to warn against using a GUID. Yet, I'm not sure what I should be using.
Can anyone provide me a pointer for what I should use as the RowKey for storing log data? I've seen the following syntax (RowKey: {'_': '1'}), as shown below, but can't find out what it means:
var task = {
PartitionKey: {'_':'hometasks'},
RowKey: {'_': '1'}
};
Thanks!
There are many approaches you can take. One such approach would be to store date/time value in ticks as RowKey. This would help you in fetching logs data for a particular time range. Just remember that since RowKey is of String data type, you may want to pre-pad it with zeros so that all values are of same length. For example,
DateTime.UtcNow.Ticks.ToString("d20")
With this, you could take 2 approaches:
Store them in chronological order as shown in example above.
Store them in reverse chronological order. The advantage of this approach is that the latest entries will always be added on top of the table. So you could just query the table on PartitionKey and get top 'x' rows and they will be latest. You will do something like:
(DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks).ToString("d20")
You also mentioned in your comment that I expect the data sets to be quite large.. I hope you are not using a single PartitionKey. Because if number of records are quite large and all of them are put in same partition, the performance might be impacted.

Query Azure table storage by list of partition and row key pairs

I have a list of known partition and row key pairs from the same table, (e.g P1R1, P2R2, P3R3, P-PartitionKey, R-Row key), anyone know how to query from Azure Table Storage to get these 3 entities in one request?
I don't think there's an option besides just spelling it out in the Where/Filter clause explicitly.
You can do that, but you don't want to. It will result in a table scan and be extremely slow. You'd be much better served by firing off each request separately.
If you're dead set on doing it with one query, comment here and I'll get you the code. I'm on my iPad right now so I don't have it handy.

Resources