CosmosDB: Minimum throughput does not make sense - azure

we got a cosmos DB containing 24 containers as of now.
The throughput is provisioned on database Level.
I would expect the Minimum throughput to be 2400 RUs but actually 4500 is expected. (Shown in Azure Portal as well as an error message in .NET SDK)
Expectation:
Count of containers * 100 RU/s = Min. RU/s
or if container count is less or equal to four
400 RU/s
I observed the behaviour, that after I delete the database and recreate it, the throughput works as expected.
This behaviour only occurs after some days working with the database.
Is there an explanation why this is expected or is this a bug in
CosmosDB itself?
Thanks for your help

You have two regions configured, per your screenshot:
Each region is consuming the 2400 or so RUs. 2x2400 = 4800. Seems like you have slightly less than 2400 RU per region.
In any case: the two-region setup is what's costing you double the expected RU.

Related

Is my understanding of cosmosdb pricing correct?

I’m struggling to understand how the pricing mechanism for RU/s works. Specifically my confusion comes in when the word “first” is used.
I’ve done my research here:https://devblogs.microsoft.com/cosmosdb/build-apps-for-free-with-azure-cosmos-db-free-tier/?WT.mc_id=aaronpowell-blog-aapowell
In the second paragraph it’s mentioned:
“With Azure Cosmos DB Free Tier enabled, you’ll get the first 400 RU/s throughput and 5 GB storage in your account for free each month, for the lifetime of the account.”
So hypothetically speaking if I have an app that does one query and that 1 query evaluates to 1RU. Can I safely assume that
400 users can execute the query once per second for free?
500 users can execute the query once per second and I will only be charged for 100RU
If the RU is consistently less than 401 per second, there will be no charge
Please do make mention if there’s any other costing I should be aware of. Ie. Any cosmosDb dependencies, or app service costing
You're not really thinking about RU/sec correctly.
If you have, say, 400 RU/sec, then that is your allocated # of RU within a one-second window. It has nothing to do with # of users (as I doubt you're giving end-users direct access to your Cosmos DB instance).
in the case of operations only costing 1 RU, then yes, you should be able to execute 400 operations within a 1-second window, without issue (although there is only one type of operation costing 1 RU, and that's a point-read).
in the case you run some type of operation that puts you over the 400RU quota for that 1-second period, that operation completes, but now you're "in debt" so to speak: you will be throttled until the end of the 1-second period, and likely a bit of time into the next period (or, depending on how deep in RU-debt you went, maybe a few seconds).
when you exceed your RU/sec allocation, you do not get charged. In your example, you asked what happens if you try to consume 500 RU in a 1-second window, and asserted you'd be charged for 100 RU. Nope. You'll just be throttled after exhausting the 400 RU allocation.
The only way you'll end up being charged for more RU/sec, is if you increase your RU/sec allocation.
There is some more reading out there you can do:
azure cosmos db free tier
pricing examples

How do I identify the source of Cosmos DB RU consumption?

Context
I have an Azure Function which executes daily, it is change tracking a pricing feed with the results stored in a Cosmos DB. Each time it runs it compares the latest price from the feed, against the most recent values in the DB collection and writes a new item if there is a difference. It is scheduled to run at 23:55 each day. The overall setup is tiny, with 10 items in the DB, and changes seen usually once a week. My consumption is 3.84 RUs for the daily execution when the price hasn't changed.
Question
In addition to the expected activity at 23:55 each day, there is additional activity appearing almost every 3 hrs 45 mins. This unexpected activity consumes 4 RUs each time (around 24 RUs per day). How can I identify the source of the additional activity?
Other info
I noticed that the backups were scheduled to 4 hours, so I changed that to daily. That didn't help.
Since noticing the additional usage I have added diagnostic settings to save all logs into a Log Analytics workspace. I can see that the RUs are Read operations, Type = AzureDiagnostics, requestResourceType = Collection, Source IP = 51.11.40.180, 1 of the 4 is a read against the __Cosmos/colls/__Query collection. This suggests to me that it's the diagnostics causing the cost. I disabled the diagnostic settings to see if that reduced my RU consumption, but it does not.
Is it just a case that diagnostics are run by Microsoft and that is simply part of having a cosmos db?
And that usually it's a smaller proportion of the overall cost, therefore not an issue?

Calculating limit in Cosmos DB [duplicate]

I have a cosmosGB gremlin API set up with 400 RU/s. If I have to run a query that needs 800 RUs, does it mean that this query takes 2 sec to execute? If i increase the throughput to 1600 RU/s, does this query execute in half a second? I am not seeing any significant changes in query performance by playing around with the RUs.
As I explained in a different, but somewhat related answer here, Request Units are allocated on a per-second basis. In the event a given query will cost more than the number of Request Units available in that one-second window:
The query will be executed
You will now be in "debt" by the overage in Request Units
You will be throttled until your "debt" is paid off
Let's say you had 400 RU/sec, and you executed a query that cost 800 RU. It would complete, but then you'd be in debt for around 2 seconds (400 RU per second, times two seconds). At this point, you wouldn't be throttled anymore.
The speed in which a query executes does not depend on the number of RU allocated. Whether you had 1,000 RU/second OR 100,000 RU/second, a query would run in the same amount of time (aside from any throttle time preventing the query from running initially). So, aside from throttling, your 800 RU query would run consistently, regardless of RU count.
A single query is charged a given amount of request units, so it's not quite accurate to say "query needs 800 RU/s". A 1KB doc read is 1 RU, and writing is more expensive starting around 10 RU each. Generally you should avoid any requests that would individually be more than say 50, and that is probably high. In my experience, I try to keep the individual charge for each operation as low as possible, usually under 20-30 for large list queries.
The upshot is that 400/s is more than enough to at least complete 1 query. It's when you have multiple attempts that combine for overage in the timespan that Cosmos tells you to wait some time before being allowed to succeed again. This is dynamic and based on a more or less black box formula. It's not necessarily a simple division of allowance by charge, and no individual request would be faster or slower based on the limit.
You can see if you're getting throttled by inspecting the response, or monitor by checking the Azure dashboard metrics.

Why my CosmosDB Count operation that exceeeds RU Throughput does not throttle?

I have a database with a shared Throughput of 400. This database contains two containers.
When I run the following query on one of these containers, I get charged for 1183 RUs :
SELECT VALUE COUNT(1) FROM c where c.GroupClaim = 'None'
GroupClaim is the partition key of the container.
How can a 1183 RUs query do not get rejected/throttled when I have a Throughput set to 400 ?
Here is a screen of my query stats :
I can't explain why your query costs so much but... if you run a query that exceeds subscribed RU/sec, Cosmos DB will complete the query. But now you'll be "in debt" and you'll be throttled until your debt is paid off.
In your case, you're over the 400 by just over 700 (your debt), so you'll see the throttle time period be somewhere between 2 and 3 seconds (since you would have 1200 RU available over a 3-second period, per your service tier).

Understanding Azure SQL Performance

The facts:
1 Azure SQL S0 instance
a few tables one of them containing ~ 8.6 Million Rows and 1 PK
Running a Count-query on this table takes nearly 30 minutes (!) to complete.
Upscaling the instance from S0 to S1 reduces the query time to 13 minutes:
Looking into Azure Portal (new version) the resource-usage-monitor shows the following:
Questions:
Does anyone else consider even 13 minutes as rediculos for a simple COUNT()?
Does the second screenshot meen that during the 100%-period my instance isn't responding to other requests?
Why are my metrics limited to 100% in both S0 and S1? (see look under "Which Service Tier is Right for My Database?" stating " These values can be above 100% (a big improvement over the values in the preview that were limited to a maximum of 100).") I'd expect the S0 to bee like on 150% or so if the quoted statement is true.
I'm interested in experiences regarding usage of databases with more than 1.000 records or so from other people. I don't see how a S*-scaled Azure SQL for 22 - 55 € per month could help me in upscaling-strategies at the moment.
Azure SQL Database editions provide increasing levels of DTUs from Basic -> Standard -> Premium levels (CPU,IO,Memory and other resources - see https://msdn.microsoft.com/en-us/library/azure/dn741336.aspx). Once your query reaches its limits of DTU (100%) in any of these resource dimensions, it will continue to receive these resources at that level (but not more) and that may increase the latency in completing the request. It looks like in your scenario above, the query is hitting its DTU limit (10 DTUs for S0 and 20 for S1). You can see the individual resource usage percentages (CPU, Data IO or Log IO) by adding these metrics to the same graph, or by querying the DMV sys.dm_db_resource_stats.
Here is a blog that provides more information on appropriately sizing your database performance levels. http://azure.microsoft.com/blog/2014/09/11/azure-sql-database-introduces-new-near-real-time-performance-metrics/
To your specific questions
1) As you have 8.6 million rows, database needs to scan the index entries to get the count back. So, it may be hitting the IO limit for the edition here.
2) If you have multiple concurrent queries running against your DB, they will be scheduled appropriately to not starve one request or the other. But latencies may increase further for all queries since you will be hitting the available resource limits.
3) For older Web/Business editions, you may be able to see the metric values going beyond 100% (they are normalized to the limits of an S2 level), as they don't have any specific limits and run in a resource-shared environment with other customer loads. For the new editions, metrics will never exceed 100%, because system guarantees you resources upto 100% of that edition's limits, but no more. This provides predictable, guaranteed amount of resources for your DB unlike Web/Business editions, where you may get very little or lot more resources at different times depending on other competing customer DB workloads running on the same machine.
Hope this helps.
-- Srini

Resources