I have a cosmosGB gremlin API set up with 400 RU/s. If I have to run a query that needs 800 RUs, does it mean that this query takes 2 sec to execute? If i increase the throughput to 1600 RU/s, does this query execute in half a second? I am not seeing any significant changes in query performance by playing around with the RUs.
As I explained in a different, but somewhat related answer here, Request Units are allocated on a per-second basis. In the event a given query will cost more than the number of Request Units available in that one-second window:
The query will be executed
You will now be in "debt" by the overage in Request Units
You will be throttled until your "debt" is paid off
Let's say you had 400 RU/sec, and you executed a query that cost 800 RU. It would complete, but then you'd be in debt for around 2 seconds (400 RU per second, times two seconds). At this point, you wouldn't be throttled anymore.
The speed in which a query executes does not depend on the number of RU allocated. Whether you had 1,000 RU/second OR 100,000 RU/second, a query would run in the same amount of time (aside from any throttle time preventing the query from running initially). So, aside from throttling, your 800 RU query would run consistently, regardless of RU count.
A single query is charged a given amount of request units, so it's not quite accurate to say "query needs 800 RU/s". A 1KB doc read is 1 RU, and writing is more expensive starting around 10 RU each. Generally you should avoid any requests that would individually be more than say 50, and that is probably high. In my experience, I try to keep the individual charge for each operation as low as possible, usually under 20-30 for large list queries.
The upshot is that 400/s is more than enough to at least complete 1 query. It's when you have multiple attempts that combine for overage in the timespan that Cosmos tells you to wait some time before being allowed to succeed again. This is dynamic and based on a more or less black box formula. It's not necessarily a simple division of allowance by charge, and no individual request would be faster or slower based on the limit.
You can see if you're getting throttled by inspecting the response, or monitor by checking the Azure dashboard metrics.
Related
I’m struggling to understand how the pricing mechanism for RU/s works. Specifically my confusion comes in when the word “first” is used.
I’ve done my research here:https://devblogs.microsoft.com/cosmosdb/build-apps-for-free-with-azure-cosmos-db-free-tier/?WT.mc_id=aaronpowell-blog-aapowell
In the second paragraph it’s mentioned:
“With Azure Cosmos DB Free Tier enabled, you’ll get the first 400 RU/s throughput and 5 GB storage in your account for free each month, for the lifetime of the account.”
So hypothetically speaking if I have an app that does one query and that 1 query evaluates to 1RU. Can I safely assume that
400 users can execute the query once per second for free?
500 users can execute the query once per second and I will only be charged for 100RU
If the RU is consistently less than 401 per second, there will be no charge
Please do make mention if there’s any other costing I should be aware of. Ie. Any cosmosDb dependencies, or app service costing
You're not really thinking about RU/sec correctly.
If you have, say, 400 RU/sec, then that is your allocated # of RU within a one-second window. It has nothing to do with # of users (as I doubt you're giving end-users direct access to your Cosmos DB instance).
in the case of operations only costing 1 RU, then yes, you should be able to execute 400 operations within a 1-second window, without issue (although there is only one type of operation costing 1 RU, and that's a point-read).
in the case you run some type of operation that puts you over the 400RU quota for that 1-second period, that operation completes, but now you're "in debt" so to speak: you will be throttled until the end of the 1-second period, and likely a bit of time into the next period (or, depending on how deep in RU-debt you went, maybe a few seconds).
when you exceed your RU/sec allocation, you do not get charged. In your example, you asked what happens if you try to consume 500 RU in a 1-second window, and asserted you'd be charged for 100 RU. Nope. You'll just be throttled after exhausting the 400 RU allocation.
The only way you'll end up being charged for more RU/sec, is if you increase your RU/sec allocation.
There is some more reading out there you can do:
azure cosmos db free tier
pricing examples
I set 400 RU/s for my collection in CosmosDB. I want to estimate, maximum total of RU/s in 24 hours. please let me know, how can I calculate it?
is the calculation right?
400 * 60 * 60 * 24 = 34.560.000 RU in 24 hours
When talking about maximum RU available within a given time window, you're correct (except that would be total RU, not total RU/second - it would remain constant at 400/sec unless you adjusted your RU setting).
But... given that RUs are reserved on a per-second basis: What you consume within a given one-second window is really up to you and your specific operations, and if you don't consume all allocated RUs (400, in this case) in a given one-second window, it's gone (you're paying for reserved capacity). So yes, you're right about absolute maximum, but that might not match your real-world scenario. You could end up consuming far less than the maximum you've allocated (imagine idle-times for your app, when you're just not hitting the database much). Also note that RUs are distributed across partitions, so depending on your partitioning scheme, you could end up not using a portion of your available RUs.
Note that it's entirely possible to use more than your allocated 400 RU in a given one-second window, which then puts you into "RU Debt" (see my answer here for a detailed explanation).
I just launched an Azure SQL Database, and the DTU and CPU usage is behaving strangely. The database is only receiving about 30 requests per minute, and the CPU/DTU will be extremely low for hours, and then jump up to 100% and stay there (with no increase in the number of requests that triggers this). When I click to view the top queries, none of them are above 1% cpu usage. I started out on a 5 DTU plan, and yesterday upgraded to 20 DTUs and the same behavior is occurring. Any idea what else might cause the DTU/CPU to get maxed out? See images below:
https://i.imgur.com/LdbYTPw.png
https://i.imgur.com/jlus3FM.png
Thanks in advance for any advice!
Joe
EDIT: I'm getting closer, I found these repeated entries in the error log. (about 8 - 10 per SECOND)
"The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request."
The thing is, the App Service that queries the database is only doing simple selects, updates, and inserts... none of which uses any complex WHERE IN statement. Furthermore, every query is wrapped in a try/catch block, and I'm never seeing an exception like this.
Where could these large queries be originating from?
You are only seeing the CPU component of the DTU graph, what about the "Data IO" and the "Log IO" components? Look at the top 5 queries on the 3 sections, and let me know if you find a query that start with "SELECT Statman ...". If you see that, then the Auto Update Statistics process is creating those DTU spikes.
I would suggest to install the sp_whoisactive script so that you can see what's going on more easily:
http://whoisactive.com/
Is it anyhow possible to get the count, how often DynamoDB throughput (write units/read units) was downscaled within the last 24 hours?
My idea is to downscale as soon as an hugo drop e.g. 50% in the needed provisioned write units occur. I have really peaky traffic. Thus it is interessting to me to downscale after every peak. However I have a analytics jobs running at night which is provisioning a huge amount of read units making it necessary to be able to downscale after it. Thus I need to limit downscales to 3 times within 24 hours.
The number of decreases is returned in a DescribeTable result as part of the ProvisionedThroughputDescription.
I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.