calculate maximum total of RU/s in 24 hours for CosmosDB - azure

I set 400 RU/s for my collection in CosmosDB. I want to estimate, maximum total of RU/s in 24 hours. please let me know, how can I calculate it?
is the calculation right?
400 * 60 * 60 * 24 = 34.560.000 RU in 24 hours

When talking about maximum RU available within a given time window, you're correct (except that would be total RU, not total RU/second - it would remain constant at 400/sec unless you adjusted your RU setting).
But... given that RUs are reserved on a per-second basis: What you consume within a given one-second window is really up to you and your specific operations, and if you don't consume all allocated RUs (400, in this case) in a given one-second window, it's gone (you're paying for reserved capacity). So yes, you're right about absolute maximum, but that might not match your real-world scenario. You could end up consuming far less than the maximum you've allocated (imagine idle-times for your app, when you're just not hitting the database much). Also note that RUs are distributed across partitions, so depending on your partitioning scheme, you could end up not using a portion of your available RUs.
Note that it's entirely possible to use more than your allocated 400 RU in a given one-second window, which then puts you into "RU Debt" (see my answer here for a detailed explanation).

Related

Is my understanding of cosmosdb pricing correct?

I’m struggling to understand how the pricing mechanism for RU/s works. Specifically my confusion comes in when the word “first” is used.
I’ve done my research here:https://devblogs.microsoft.com/cosmosdb/build-apps-for-free-with-azure-cosmos-db-free-tier/?WT.mc_id=aaronpowell-blog-aapowell
In the second paragraph it’s mentioned:
“With Azure Cosmos DB Free Tier enabled, you’ll get the first 400 RU/s throughput and 5 GB storage in your account for free each month, for the lifetime of the account.”
So hypothetically speaking if I have an app that does one query and that 1 query evaluates to 1RU. Can I safely assume that
400 users can execute the query once per second for free?
500 users can execute the query once per second and I will only be charged for 100RU
If the RU is consistently less than 401 per second, there will be no charge
Please do make mention if there’s any other costing I should be aware of. Ie. Any cosmosDb dependencies, or app service costing
You're not really thinking about RU/sec correctly.
If you have, say, 400 RU/sec, then that is your allocated # of RU within a one-second window. It has nothing to do with # of users (as I doubt you're giving end-users direct access to your Cosmos DB instance).
in the case of operations only costing 1 RU, then yes, you should be able to execute 400 operations within a 1-second window, without issue (although there is only one type of operation costing 1 RU, and that's a point-read).
in the case you run some type of operation that puts you over the 400RU quota for that 1-second period, that operation completes, but now you're "in debt" so to speak: you will be throttled until the end of the 1-second period, and likely a bit of time into the next period (or, depending on how deep in RU-debt you went, maybe a few seconds).
when you exceed your RU/sec allocation, you do not get charged. In your example, you asked what happens if you try to consume 500 RU in a 1-second window, and asserted you'd be charged for 100 RU. Nope. You'll just be throttled after exhausting the 400 RU allocation.
The only way you'll end up being charged for more RU/sec, is if you increase your RU/sec allocation.
There is some more reading out there you can do:
azure cosmos db free tier
pricing examples

Calculating limit in Cosmos DB [duplicate]

I have a cosmosGB gremlin API set up with 400 RU/s. If I have to run a query that needs 800 RUs, does it mean that this query takes 2 sec to execute? If i increase the throughput to 1600 RU/s, does this query execute in half a second? I am not seeing any significant changes in query performance by playing around with the RUs.
As I explained in a different, but somewhat related answer here, Request Units are allocated on a per-second basis. In the event a given query will cost more than the number of Request Units available in that one-second window:
The query will be executed
You will now be in "debt" by the overage in Request Units
You will be throttled until your "debt" is paid off
Let's say you had 400 RU/sec, and you executed a query that cost 800 RU. It would complete, but then you'd be in debt for around 2 seconds (400 RU per second, times two seconds). At this point, you wouldn't be throttled anymore.
The speed in which a query executes does not depend on the number of RU allocated. Whether you had 1,000 RU/second OR 100,000 RU/second, a query would run in the same amount of time (aside from any throttle time preventing the query from running initially). So, aside from throttling, your 800 RU query would run consistently, regardless of RU count.
A single query is charged a given amount of request units, so it's not quite accurate to say "query needs 800 RU/s". A 1KB doc read is 1 RU, and writing is more expensive starting around 10 RU each. Generally you should avoid any requests that would individually be more than say 50, and that is probably high. In my experience, I try to keep the individual charge for each operation as low as possible, usually under 20-30 for large list queries.
The upshot is that 400/s is more than enough to at least complete 1 query. It's when you have multiple attempts that combine for overage in the timespan that Cosmos tells you to wait some time before being allowed to succeed again. This is dynamic and based on a more or less black box formula. It's not necessarily a simple division of allowance by charge, and no individual request would be faster or slower based on the limit.
You can see if you're getting throttled by inspecting the response, or monitor by checking the Azure dashboard metrics.

Am I applying Little's Law correctly to model a workload for a website?

Using these metrics (shown below), I was able to utilize a workload modeling formula (Little’s Law) to come up with what I believe are the correct settings to sufficiently load test the application in question.
From Google Analytics:
Users: 2,159
Pageviews: 4,856
Avg. Session Duration: 0:02:44
Pages / Session: 2.21
Sessions: 2,199
The formula is N = Throughput * (Response Time + Think Time)
We calculated Throughput as 1.35 (4865 pageviews / 3600 (seconds in an hour))
We calculated (Response Time + Think Time) as 74.21 (164 seconds avg. session duration / 2.21 pages per session)
Using the formula, we calculate N as 100 (1.35 Throughput * 74.21 (Response Time + Think Time)).
Therefore, according to my calculations, we can simulate the load the server experienced on the peak day during the peak hour with 100 users going through the business processes at a pace of 75 seconds between iterations (think time ignored).
So, in order to determine how the system responds under a heavier than normal load, we can double (200 users) or triple (300 users) the value of N and record the average response time for each transaction.
Is this all correct?
When you do a direct observation of the logs for the site, blocked by session duration, what are the maximum number of IP addresses counted in each block?
Littles law tends to undercount sessions and their overhead in favor of transactional throughput. That's OK if you have instantaneous recovery on your session resources, but most sites are holding onto them for a period longer than 110% of the longest inter-request window for a user (the period from one request to the next).
Below formula always worked good for me, If you are looking to calculate pacing
"Pacing = No. of Users * Duration of Test (in seconds) / Transactions you want to achieve in said Test Duration"
You should be able to get closer to the Transactions you want to achieve using this Formula. If Its API, then its almost always accurate.
For Example, You want to achieve 1000 transactions using 5 users in one hour of Test Duration
Pacing = 5 * 3600/1000
= 18 seconds

Azure VM stats - Network In/Out - what are the measurements?

I feel perturbed, but I don't understand the measurement Azure uses for Network In/Out and a few other things.
On Azure portal -> my VM -> Metrics -> [Host] Network In/Out, it says that it is measured in bytes, but then it also draws graph over time. If it were plain, bytes, it should be cumulative and therefore grow indefinitely, but it isn't, therefore I am inclined to believe it is measured per second or something like that. But Azure docs claim that it is bytes and not bytes per second (link here)
Am I missing something obvious?
I am inclined to believe the data is in bytes per minute. At least for mine it appears that way. I set my graph for a 10 minute interval. Taking the mouse off the graph the total bytes show at the bottom. Hovering over the individual sample points (10 in total) they average between 31-34MB each. Adding them up you get close to the total for the graph interval 326MB. 10*32.5 is very close to the this total leading me to believe that each interval on the graph is a sum of the individual interval (1 minute). That is what I am seeing anyway. Terrible documentation from Microsoft. Why not just specify this in the (i) hover point on the individual graph?
#eddyP23 - if you add up all your points in your graph it appears you would come to the same conclusion. Each point is a sum of the interval (1 minute). I am not sure how else to read this.
If it were bytes per second the data total for the complete interval would be vastly larger. 10 minute interval
Sorry for the delay.
therefore I am inclined to believe it is measured per second or
something like that. But Azure docs claim that it is bytes and not
bytes per second
You can find the Network In here:
The Network In (bytes per second) used for monitor your VM's performance.

DynamoDB Downscales within the last 24 hours?

Is it anyhow possible to get the count, how often DynamoDB throughput (write units/read units) was downscaled within the last 24 hours?
My idea is to downscale as soon as an hugo drop e.g. 50% in the needed provisioned write units occur. I have really peaky traffic. Thus it is interessting to me to downscale after every peak. However I have a analytics jobs running at night which is provisioning a huge amount of read units making it necessary to be able to downscale after it. Thus I need to limit downscales to 3 times within 24 hours.
The number of decreases is returned in a DescribeTable result as part of the ProvisionedThroughputDescription.

Resources