Write-heavy partition key strategy for Azure Cosmos DB

Write-heavy partition key strategy for Azure Cosmos DB - azure

We’re using CosmosDB in production to store HTTP request/response audit data. The structure of this data generally looks as follows:
{
"id": "5ff4c51d3a7a47c0b5697520ae024769",
"Timestamp": "2019-06-27T10:08:03.2123924+00:00",
"Source": "Microservice",
"Origin": "Client",
"User": "SOME-USER",
"Uri": "GET /some/url",
"NormalizedUri": "GET /SOME/URL",
"UserAgent": "okhttp/3.10.0",
"Client": "0.XX.0-ssffgg;8.1.0;samsung;SM-G390F",
"ClientAppVersion": "XX-ssffgg",
"ClientAndroidVersion": "8.1.0",
"ClientManufacturer": "samsung",
"ClientModel": "SM-G390F",
"ResponseCode": "OK",
"TrackingId": "739f22d01987470591556468213651e9",
"Response": "[ REDACTED ], <— Usually quite long (thousands of chars)
"PartitionKey": 45,
"InstanceVersion": 1,
"_rid": "TIFzALOuulIEAAAAAACACA==",
"_self": "dbs/TIFzAA==/colls/TIFzALOuulI=/docs/TIFzALOuulIEAAAAAACACA==/",
"_etag": "\"0d00c779-0000-0d00-0000-5d1495830000\"",
"_attachments": "attachments/",
"_ts": 1561630083
}
We’re currently writing around 150,000 - 200,000 of documents similar to the above a day with /PartitionKey as the partition key path that's configured on the container. The value of the PartitionKey is a randomly generated number in C#.net between 0 and 999.
However, we are seeing daily hotspots where a single physical partition can hit a max of 2.5K - 4.5K RU/s and others are very low (around 200 RU/s). This has a knock on to cost implications as we need to provision throughput for our largest utilised partition.
The second factor is we're storing a fair bit of data, close to 1TB of documents, and we add a few GB each day. As a result we have currently have around 40 physical partitions.
Combining these two factors means we end up having to provision for at minimum somewhere between 120,000 - 184,000 RU/s.
I should mention that we barely ever need to query this data; apart from very occasional for ad-hoc manually constructed queries in Cosmos data explorer.
My question is... would we be a lot better off in terms of RU/s required and distribution of data by simply using the “id” column as our partition key (or a randomly generated GUID) - and then setting a sensible TTL so we don't have a continually growing dataset?
I understand this would require us to re-create the collection.
Thanks very much.
Max throughput per physical partition

While using the id or a GUID would give you better cardinality than the random number you have today, any query you run would be very expensive as it would always be cross-partition and over a huge amount of data.
I think a better choice would be to use a synthetic key that combines multiple properties that both have high cardinality and also are used to query for the data. Can learn more about these here, https://learn.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keys
As far as TTL I would definitely set that for whatever retention you need for this data. Cosmos will TTL the data off with unused throughput so will never get in the way.
Lastly, you should also consider (if you haven't already) using a custom indexing policy and exclude any paths which are never queried for. Especially the "response" property since you say it is thousands of characters long. This can save considerable RU/s in write-heavy scenarios like yours.

From my experience what I see is cosmos tends to degrade with new data. More data mean more physical partitons. So you meed more throughput to be allocated to each of them . Currently we are starting to archive old data into blob storage to avoid this kind of problems and keep the number of physical partition unchangeable. We use cosmos as hot storage and then the old data go to blobs storage as cold storage. We reduce RU allocated to each physical partitions and we save money.

Related

Azure Cosmos DB: Cross-Partition Queries v's In-Partition Queries

We have a cosmos-db container which has about 1M records containing information about customers. The partition key for the documentDb is customerId which holds a unique GUID reference for the customer. I have read the partitioning and scaling document which would suggest that our choice of key appears appropriate, however if we want to query this data using a field such as DOB or Address, the query will be considered as a cross-partition query and will essentially send the same query to every record in the documentDb before returning.
The query stats in Data Explorer suggests that a query on customer address will return the first 200 documents at a cost of 36.9 RU's but I was under the impression that this would be far higher given the amount of records that this query would be sent to? Are these query stats accurate?
It is likely that we will want to extend our app to be able to query on multiple non-partition data elements so are we best replicating the customer identity and searchable data element within another documentDb using the desired searchable data element as the partition key. We can then return the identities of all customers who match the query. This essentially changes the query to be an in-partition query and should prevent additional expenditure?
Our current production database has a 4000 (Max Throughput)(Shared) so there appears to be adequate provision for cross-partition queries so would I be wasting my time building out a change-feed to maintain a partitioned representation of the data to support in-partition queries over cross-partition queries?

To get accurate estimate of query cost you need to do the measurement on a container that has a realistic amount of data within it. For example, if I have a container with 5000 RU/s and 5GB of data my cross-partition query will be fairly inexpensive because it only ran on a single physical partition.
If I ran that same query on a container with 100,000 RU/s I would have > 10 physical partitions and the query would show much greater RU/s reported due to the query having to execute across all 10 physical partitions. (Note: 1 physical partition has maximum 10,000 RU/s or 50GB of storage).
It is impossible to say at what amount of RU/s and storage you will begin to get a more realistic number for RU charges. I also don't know how much throughput or storage you need. If the workload is small then maybe you only need 10K RU and < 50GB of storage. It's only when you need to scale out that is where you need to first scale out, then measure your query's RU charge.
To get accurate query measurements, you need to have a container with the throughput and amount of data you would expect to have in production.

You don't necessarily need to be afraid of cross-partition queries in CosmosDB. Yes, single-partition queries are faster, but if you need to query "find any customers matching X" then cross-partition query is naturally required (unless you really need the hassle of duplicating the info elsewhere in optimized form).
The cross-partition query will not be sent to "each document" as long as you have good indexes in partitions. Just make sure every query has a predicate on a field that is:
indexed
with good-enough data cardinality
.. and the returned number of docs should be limited by business model or forced (top N). This way your RU should be more-or-less top-bound.
36RU per 200 returned docs does not sound too bad as long as it's not done too many times per sec. But if in doubt, test with predicted data volume and fire up some realistic queries..

Azure cosmos DB partition key design selection

Selecting partition key is a simple but important design choice in Azure Cosmos DB. In terms of improving performance and costs (RUs). Azure cosmos DB does not allow us to change partition key. So it is very important to select right partition key.
I gone through Microsoft documents Link
But I still have confusion to choose partition key
Below is the item structure, I am planning to create
{
"id": "unique id like UUID", # just to keep some unique ID for item
"file_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/nvidia.mp4", # This value some times contains special symbols like spaces, dollars, caps and many more
"createatedby": "andrew",
"ts": "2022-01-10 16:07:25.773000",
"directory_location": "/videos/news/finance/category/sharemarket/it-sectors/semiconductors/",
"metadata": [
{
"codec": "apple",
"date_created": "2020-07-23 05:42:37",
"date_modified": "2020-07-23 05:42:37",
"format": "mp4",
"internet_media_type": "video/mp4",
"size": "1286011"
}
],
"version_id": "48ad8200-7231-11ec-abda-34519746721"
}
I am using Azure cosmos SQL API. By Default, Azure cosmos take cares of indexing all data. In above case all properties are indexed.
for reading items I use file_location property. Can I make file_location as primary key ? or anything else to consider.
Fews notes:
file_location values contains special characters like spaces, commas, dollars and many more.
Few containers contains 150 millions entries and few containers contains just 20 millions.
my operations are
more reads, frequent writes as new videos are added, less updates in case videos changed.

Few things to keep in mind while selecting partition keys:
Observe the query parameters while reading data, they give you good hints to what partition key candidates are.
You mentioned that few containers contain 150 million documents and few containers contain 20 million documents. Instead of number of documents stored in a container what matters is which containers are getting higher number of requests. If few containers are getting too many requests, that is a good indicator of poorly designed partition keys.
Try to distribute the request load as evenly as possible among containers so that it gets distributed evenly among the physical partitions. Otherwise, you will get hot-partition issues and will workaround by increasing throughput which will cost you more $.
Try to limit cross-partition queries as much as possible

CosmosDb search on the Index vs partition Key

By default in cosmosDb, all properties in documents are indexed, so why should I care to do researches on the partition key while the searches on index works perfectly as well and cost nothing ?
I have a cosmosDb with one million of document like this with each of them contain an array, the partition key is "tankId" e.g.:
{
"id": "67acdb16-80dd-4a6c-a5b0-118d5f5fdb97",
"tankId": "67acdb16-80dd-4a6c-a5b0-118d5f5fdb97"
"UserIds": [
"905336a5-bf96-444f-bb11-3eedb65c3760",
"432270f5-780f-401b-9772-72ec96166be1",
"cfecdf7e-5067-46b1-ab4e-25ca7d597248"
],
}
If I do a request on "UserIds" on this million documents which is not a partition key but indexed property, it takes only 3.32 RU !!! Wow.
SELECT *
FROM c
WHERE ARRAY_CONTAINS(c.UserIds, "905336a5-bf96-444f-bb11-3eedb65c3760")
Is it a good practice to do that kind of request ? I am a little bit worried on my design.

It start's mattering once your number of physical partitions starts growing. Using the partition key will allow Cosmos to map the query to a logical partition that resides in a physical partition. Therefore the query won't be a so called 'cross-partition query' and it won't have to check the index of other physical partitions (that also would consume RU).
In your case you are talking about a million documents which likely use a lot less than 50GB of data (the max size of a physical partition) so it's all stored in the same physical partition. Therefore you won't have any noticable effects on the RU usage.
So to anwser your underlying question whether you should make any changes. Is your database mostly read heavy? Do you have any property that is often used for querying? Are you assured that your partitions remain under the logical partition size limit (20GB)? If yes, then you should likely consider it in your design. Even then it'll only matter once your data starts to split in physical partitions.

Cosmos DB partition key and query design for sequential access

We would like to store a set of documents in Cosmos DB with a primary key of EventId. These records are evenly distributed across a number of customers. Clients need to access the latest records for a subset of customers as new documents are added. The documents are immutable, and need to be stored indefinitely.
How should we design our partition key and queries to avoid clients all hitting the same partitions and/or high RU usage?
If we use just CustomerId as the partition key, we would eventually run over the 10GB limit for a logical partition, and if we use EventId, then querying becomes inefficient (would result in a cross-partition query, and high RU usage, which we'd like to avoid).
Another idea would be to group documents into blocks. i.e. PartitionKey = int(EventId / PartitionSize). This would result in all clients hitting the latest partition(s), which presumably would result in poor performance and throttling.
If we use a combined PartitionKey of CustomerId and int(EventId / PartitionSize), then it's not clear to me how we would avoid a cross-partition query to retrieve the correct set of documents.
Edit:
Clarification of a couple of points:
Clients will access the events by specifying a list of CustomerId's, the last EventId they received, and a maximum number of records to retrieve.
For this reason, the use of EventId alone won't perform well, as it will result in a cross partition query (i.e. WHERE EventId > LastEventId).
The system will probably be writing on the order of 1GB a day, in 15 minute increments.
It's hard to know what the read volume will be, but I'd guess probably moderate, with maybe a few thousand clients polling the API at regular intervals.

So first thing first, logical partitions size limit has now been increased to 20GB, please see here.
You can use EventID as a partition as well, as you have limit of logical partition's size in GB but you have no limit on amount of logical partitions. So using EventID is fine, you will get a point to point read which is very fast if you query using the EventID. Now you mention using this way you will have to do cross-partition queries, can you explain how?
Few things to keep in mind though, Cosmos DB is not really meant for storing this kind of Log based data as it stores everything in SSDs so please calculate how much is your 1 document size and how many in a second would you have to store then how much in a day to how much in a month. You can use TTL to delete from Cosmos when done though and for long term storage store it in Azure BLOB Storage and for fast retrievals use Azure Search to query the data in BLOB by using CustomerID and EventID in your search query.

How should we design our partition key and queries to avoid clients all hitting the same partitions and/or high RU usage?
I faced a similar issue some time back and a PartitionKey with customerId + datekey e.g. cust1_20200920 worked well for me.
I created the date key as 20200920 (YYYYMMDD), but you can choose to ignore the date part or even the month (cust1_202009 /cust1_2020), based on your query requirement.
Also, IMO, if there are multiple known PartitionKeys at a query time it's kind of a good thing. For example, if you keep YYYYMM as the PartitionKey and want to get data for 4 months, you can run 4 queries in parallel and combine the data. Which is faster if you have many clients and these Partition Keys are distributed among multiple physical partitions.
On a separate note, Cosmos Db has recently introduced an analytical store for the transactional data which can be useful for your use case.
More about it here - https://learn.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction

One approach is using multiple Cosmos containers as "hot/cold" tiers with different partitioning. We could use two containers:
Recent: all writes and all queries for recent items go here. Partitioned by CustomerId.
Archive: all items are copied here for long term storage and access. Partitioned by CustomerId + timespan (e.g. partition per calendar month)
The Recent container would provide single partition queries by customer. Data growth per partition would be limited either by setting reasonable TTL during creation, or using a separate maintenance job (perhaps Azure Function on timer) to delete items when they are no longer candidates for recent-item queries.
A Change Feed processor, implemented by an Azure Function or otherwise, would trigger on each creation in Recent and make a copy into Archive. This copy would have partition key combining the customer ID and date range as appropriate to limit the partition size.
This scheme should provide efficient recent-item queries from Recent and safe long-term storage in Archive, with reasonable Archive query efficiency given a desired date range. The main downside is two writes for each item (one for each container) -- but that's the tradeoff for efficient polling. Whether this tradeoff is worthwhile is probably best determined by simulating the load and observing performance.

CosmosDB - Querying Across All Partitions

I'm creating a logging system to monitor our (200 ish) main application installations, and Cosmos db seems like a good fit due to the amount of data we'll collect, and to allow a varying schema for the log data (particularly the Tags array - see document schema below).
But, never having used CosmosDb before I'm slightly unsure of what to use for my partition key.
If I partitioned by CustomerId, there would likely be several Gb of data in each of the 200 partitions, and the data will usually be queried by CustomerId, so this was my first choice for the partition key.
However I was planning to have a 'log stream' view in the logging system, showing logs coming in for all customers.
Would this lead to running a horribly slow / expensive cross partition query?
If so, is there an obvious way to avoid / limit the cost & speed implications of this cross partition querying? (Other than just taking out the log stream view for all customers!)
{
"CustomerId": "be806507-7cc4-4db4-881b",
"CustomerName": "Our Customer",
"SystemArea": 1,
"SystemAreaName": "ExchangeSync",
"Message": "Updated OK",
"Details": "",
"LogLevel": 2,
"Timestamp": "2018-11-23T10:59:29.7548888+00:00",
"Tags": {
"appointmentId": "109654",
"appointmentGroupId": "86675",
"exchangeId": "AAMkA",
"exchangeAlias": "customer.name#customer.com"
}
}
(Note - There isn't a defined list of SystemArea types we'll use yet, but it would be a lot fewer than the 200 customers)

Cross partition queries should be avoided as much as possible. If your querying is likely to happen with customer id then the customerid is a good logical partition key. However you have to keep in mind that there is a limit of 10GB per logical partition data.
A cross partition query across the whole database will lead to a very slow and very expensive operation but if it's not functionality critical and it's just used for infrequent reporting, it's not too much of a problem.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string