I'm looking into moving to the new partitioned collections for DocumentDB and have a few questions that the documentation and pricing calculator seems to be a little unclear on.
PRICING:
In the below scenario my partitioned collection will be charged $30.02/mo at 1GB of data with a constant hourly RU use of 500:
So does this mean if my users only hit the data for an average of 500 RU's for about 12 hours per day which means that HALF the time my collection goes UNUSED, but is still RUNNING and AVAILABLE (not shut down) the price goes down to $15.13/mo as the calculator indicates here:
Or will I be billed the full $30.01/mo since my collection was up and running?
I get confused when I go to the portal and see an estimate for $606/mo with no details behind it when I attempt to spin up the lowest options on a partition collection:
Is the portal just indicating the MAXIMUM that I COULD be billed that month if I use all my allotted 10,100 RU's a second every second of the hour for 744 consecutive hours?
If billing is based on hourly use and the average RU's used goes down to 100 on some of the hours used in the second scenario does the cost go down even further? Does Azure billing for partitioned collections fluctuates based on hourly usage and not total up time like the existing S1/S2/S3 tiers?
If so then how does the system determine what is billed for that hour? If most of the hour the RU's used are 100/sec but for a few seconds that hour it spikes to 1,000 does it average out by the seconds across that entire hour and only charge me for something like 200-300 RU's for that hour or will I be billed for the highest RU's used that hour?
PERFORMANCE:
Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access? If so what can I expect, or will it be so minimal that it would be undetected by my users?
RETRIES & FAULT HANDLING:
I'm assuming the TransientFaultHandling Nuget package I use in my current scenario will still work on the new scenario, but may not be used as much since my RU capacity is much larger, or do I need to rethink how I handle requests that go over the RU cap?
So they way that pricing works for an Azure documentDB is that you pay to reserve a certain amount of data storage size (in GB's) and/or throughput (in Request units (RU)). These charges are charged per hour that the reserve is in place (does not require usage). Additionally, just having a Document Account active is deemed to be an active S1 subscription, until a documentDB gets created then the pricing of your db takes over. There are two options available:
Option 1 (Original Pricing)
You can a choose between S1, S2 or S3. Each offering the same 10GB of storage but varying in throughput 250RU/1000RU/2500RU.
Option 2 (User-defined performance)
This is the new pricing structure which better decouples size and throughout. This option additionally provides for partitioning. Note that with user defined performance you are charge per GB of data storage used (Pay as you go storage).
With user-defined performance levels, storage is metered based on
consumption, but with pre-defined performance levels, 10 GB of storage
is reserved at the time of collection creation.
Single Partition Collection
The minimum is set at 400RU and 1GB of data storage.
The maximum is set at 10,000RU and 250GB of data storage.
Partitioned Collections
The minimum is set at 10,000RU and 1GB of data storage.
The maximum is set at 250,000RU and 250GB of data storage (EDIT can request greater).
So at a minimum you will be paying the cost per hour related to the option you selected. The only way to not pay for an hour would be to delete the db and the account, unfortunately.
Cost of Varying RU
In terms of varying your RU within the time frame of 1 hours, you are charged for that hour at the cost of the peak reserve RU you requested. So if you were at 400RU and you up it to 1000RU for 1sec you will be charge at the 1000RU rate for that hour. Even if for the other 59minutes 59secounds you set it back to 400RU.
Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access?
One the topic of perfromance hit there's a few things to think about, but in general no.
If you have a sane partition key with enough values you should not see a performance penalty. This means that you need to partition data so that you have the partition key available when querying and you need to keep the data you want from a query in the same partition by using the same partiton key.
If you do queries without partitionkey, you will see a sever penalty, as the query is parsed and executed per partition.
One thing to keep in mind when selecting a partition key is the limits for each partition, which are 10GB and 10K RU. This means that you want an even distributions over the partitions in order to avoid a "hot" partition which means that even if you scale to more than enough RU in total, you may recieve 429 for a specific partition.
Related
We have a FunctionApp which inserts every 6 minutes around 8k documents in a CosmosDb. Currently we set Cosmos to autoscale, but since our RUs are very predictable I have the feeling we could save some money because it's quite expensive.
I found out it's possible to set the througput to manually and according to this article I could decrease/increase the RUs with a timer. But now I'm wondering if its a good idea because we have small time interval and even if I time the FunctionApp correctly (error prone?) there are maybe 3 minutes where I can decrease the throughput. Another thing is that manual throughput costs 50% RUs less.
What do you think, is it worth implementing a time-triggered FunctionApp which increase/decrease the throuhput or its not a good idea in terms of error prone etc? Do you have any experience with it?
The timer with manual throughput will likely save you money because throughput is billed as the highest amount of RU/s per hour. Since your workload needs to scale up every 6 minutes your cost is the highest RU/s during that hour. Given that autoscale is 50% more expensive, you'd save by manually scaling up and down.
However, if you were able to stream this data to Cosmos rather than batch it you would save even more. Throughput is measured per second. The more you are able to amortize your usage over a greater period of time, the less throughput you need at any given point in time. So if you were able to use say a message queue and do load-leveling in front of Cosmos and stream the changes in, you will have better throughput utilization overall and thus lower total cost. Of course you'd need to evaluate the cost for utilizing a message queue to do this but in general, streaming is the more cost effective than batching.
I'm trying to provision Azure SQL DWH mostly Gen 2 type, but i'm not sure about the DWU that i need to set.
After analyzing source systems, on an average per day the DWH might be expecting nearly 1.5 million records. It will be inserted/updated to different set of tables.
With the no. of records is it possible to ascertain the DWU that needs to be set at DWH level.
Please advice.
While the volume of inserts is a useful number, it is more important to know the volume of data that will be frequently queried. We call this the active data set.
Let's say that you have 10Tb of data. If most of your queries address that whole 10Tb, then your active data set is 10Tb. However, if most of your queries only deal with 10% of your data, your active data set is 1Tb.
Some general guideline examples for DWUc by active data set:
1Tb: 500c
3Tb: 1000c
10Tb: 3000c
That said, in my experience the 1Tb/500c recommendation is a little small. That is because you're still working on a single node at less than 1000c; and your number of concurrent queries is limited to 20, with 20 concurrency slots. I like to see customers start at 1000c, and only use lesser DWU for dev/test, or during very quiet periods when the DW can't otherwise be paused.
Overview
I have a user registration / onboarding flow that I am currently trying to optimise and better understand before scaling out to much larger load tests.
Test Collection: (500 RU)
PartitionKey: tenant_email
Multi-Master: 5 Regions
Below is the single region statistics on a database with only one region.
Step 1 - Register new user (10.17 RU)
Step 2 - Update some data (3.4 RU)
Step 3 - Create a subscription (13.23 RU)
Step 4 - Update some data (3.43 RU)
Step 4 - Update some data (3.43 RU)
Step 5 - Update some data (3.83 RU)
Step 6 - Refresh access token (3.13RU)
Total: ~40.5 RU per onboard
Problem
Expected throughput: ~12 registrations (84req/sec)
Actual throughput: Heavy rate limiting at ~3 registrations per second (21req/sec). At ~40RU this seems like I'm only getting 120RU utilisation of the 500?
The storage distribution below, and the partitionKey should be unique enough to evenly distribute load over the collection to maximise throughput? not sure why the Max Consumed RU/s is so high.
Storage distribution for the collection and chosen partitionKey looks to be evenly distributed.
Update - Under utilisation
Here is a screenshot showing a collection with a single 500 RU partition. You can clearly see from this that the max consumed RU per partition sat around ~350 the whole time yet notice the heavy rate limiting even though we never hit 500 RU/s.
Your rate-limiting is likely because you don't have access to all 500 RU in a single physical partition.
Take a close look at your 2nd graph, which has a hint to what's likely going on:
Collection UsersTest has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (100 RU/s per partition).
Under the covers, Cosmos DB creates a set of physical partitions, and your RU are divided across those physical partitions. In your case, Cosmos DB created 5 physical partitions.
Logical partitions may be mapped to any of your 5 physical partitions. So it's possible that, during your test, more than one logical partition mapped to the same physical partition. And given that each physical partition would top out at roughly 2-3 registrations per second, this likely explains why you're seeing throttling.
We migrated our mobile app (still being developed) from Parse to Azure. Everything is running, but the price of DocumentDB is so high that we can't continue with Azure without fix that. Probably we're doing something wrong.
1) Price seams to have a bottleneck in the DocumentDB requests.
Running a process to load the data (about 0.5 million documents), memory and CPU was ok, but the DocumentDB request limit was a bottleneck, and the price charged was very high.
2) Even after the end of this data migration (few days of processing), azure continue to charge us every day.
We can't understand what is going on here. The graphic for use are flat, but the price is still climbing, as you can see in the imagens.
Any ideas?
Thanks!
From your screenshots, you have 15 collections under the Parse database. With Parse: Aside from the system classes, each of your user-defined classes gets stored in its own collection. And given that each (non-partitioned) collection has a starting run-rate of ~$24/month (for an S1 collection), you can see where the baseline cost would be for 15 collections (around $360).
You're paying for reserved storage and RU capacity. Regardless of RU utilization, you pay whatever the cost is for that capacity (e.g. S2 runs around $50/month / collection, even if you don't execute a single query). Similar to spinning up a VM of a certain CPU capacity and then running nothing on it.
The default throughput setting for the parse collections is set to 1000 RUPS. This will cost $60 per collection (at the rate of $6 per 100 RUPS). Once you finish the parse migration, the throughput can be lowered if you believe the workload decreased. This will reduce the charge.
To learn how to do this, take a look at https://azure.microsoft.com/en-us/documentation/articles/documentdb-performance-levels/ (Changing the throughput of a Collection).
The key thing to note is that DocumentDB delivers predictable performance by reserving resources to satisfy your application's throughput needs. Because application load and access patterns change over time, DocumentDB allows you to easily increase or decrease the amount of reserved throughput available to your application.
Azure is a "pay-for-what-you-use" model, especially around resources like DocumentDB and SQL Database where you pay for the level of performance required along with required storage space. So if your requirements are that all queries/transactions have sub-second response times, you may pay more to get that performance guarantee (ignoring optimizations, etc.)
One thing I would seriously look into is the DocumentDB Cost Estimation tool; this allows you to get estimates of throughput costs based upon transaction types based on sample JSON documents you provide:
So in this example, I have an 8KB JSON document, where I expect to store 500K of them (to get an approx. storage cost) and specifying I need throughput to create 100 documents/sec, read 10/sec, and update 100/sec (I used the same document as an example of what the update will look like).
NOTE this needs to be done PER DOCUMENT -- if you're storing documents that do not necessarily conform to a given "schema" or structure in the same collection, then you'll need to repeat this process for EVERY type of document.
Based on this information, I cause use those values as inputs into the pricing calculator. This tells me that I can estimate about $450/mo for DocumentDB services alone (if this was my anticipated usage pattern).
There are additional ways you can optimize the Request Units (RUs -- metric used to measure the cost of the given request/transaction -- and what you're getting billed for): optimizing index strategies, optimizing queries, etc. Review the documentation on Request Units for more details.
So I am hitting the 10 GB maximum storage capacity for Azure DocumentDb every few months.
I noticed recently Microsoft has added a partitioned mode that raises the maximum storage capacity to 250 GB but the problem is that the minimum throughput (RU/s) is 10,100 which jumps the price to ~$606 a month from around $25 a month.
Is there anyway to increase storage capacity while keeping the throughput around throughput around 400?
Without using partitioned collections, you'd need to create multiple non-partitioned collections, as needed. The SDKs have partitioning support (or you can shard data across partitions as you see fit).
EDIT: Please note that the minimum RU for partitioned collections is no longer 10,100 RU (as mentioned in the original question). It's now 400 RU (as of late 2018).