Am I applying Little's Law correctly to model a workload for a website? - performance-testing

Using these metrics (shown below), I was able to utilize a workload modeling formula (Littleā€™s Law) to come up with what I believe are the correct settings to sufficiently load test the application in question.
From Google Analytics:
Users: 2,159
Pageviews: 4,856
Avg. Session Duration: 0:02:44
Pages / Session: 2.21
Sessions: 2,199
The formula is N = Throughput * (Response Time + Think Time)
We calculated Throughput as 1.35 (4865 pageviews / 3600 (seconds in an hour))
We calculated (Response Time + Think Time) as 74.21 (164 seconds avg. session duration / 2.21 pages per session)
Using the formula, we calculate N as 100 (1.35 Throughput * 74.21 (Response Time + Think Time)).
Therefore, according to my calculations, we can simulate the load the server experienced on the peak day during the peak hour with 100 users going through the business processes at a pace of 75 seconds between iterations (think time ignored).
So, in order to determine how the system responds under a heavier than normal load, we can double (200 users) or triple (300 users) the value of N and record the average response time for each transaction.
Is this all correct?

When you do a direct observation of the logs for the site, blocked by session duration, what are the maximum number of IP addresses counted in each block?
Littles law tends to undercount sessions and their overhead in favor of transactional throughput. That's OK if you have instantaneous recovery on your session resources, but most sites are holding onto them for a period longer than 110% of the longest inter-request window for a user (the period from one request to the next).

Below formula always worked good for me, If you are looking to calculate pacing
"Pacing = No. of Users * Duration of Test (in seconds) / Transactions you want to achieve in said Test Duration"
You should be able to get closer to the Transactions you want to achieve using this Formula. If Its API, then its almost always accurate.
For Example, You want to achieve 1000 transactions using 5 users in one hour of Test Duration
Pacing = 5 * 3600/1000
= 18 seconds

Related

Calculating limit in Cosmos DB [duplicate]

I have a cosmosGB gremlin API set up with 400 RU/s. If I have to run a query that needs 800 RUs, does it mean that this query takes 2 sec to execute? If i increase the throughput to 1600 RU/s, does this query execute in half a second? I am not seeing any significant changes in query performance by playing around with the RUs.
As I explained in a different, but somewhat related answer here, Request Units are allocated on a per-second basis. In the event a given query will cost more than the number of Request Units available in that one-second window:
The query will be executed
You will now be in "debt" by the overage in Request Units
You will be throttled until your "debt" is paid off
Let's say you had 400 RU/sec, and you executed a query that cost 800 RU. It would complete, but then you'd be in debt for around 2 seconds (400 RU per second, times two seconds). At this point, you wouldn't be throttled anymore.
The speed in which a query executes does not depend on the number of RU allocated. Whether you had 1,000 RU/second OR 100,000 RU/second, a query would run in the same amount of time (aside from any throttle time preventing the query from running initially). So, aside from throttling, your 800 RU query would run consistently, regardless of RU count.
A single query is charged a given amount of request units, so it's not quite accurate to say "query needs 800 RU/s". A 1KB doc read is 1 RU, and writing is more expensive starting around 10 RU each. Generally you should avoid any requests that would individually be more than say 50, and that is probably high. In my experience, I try to keep the individual charge for each operation as low as possible, usually under 20-30 for large list queries.
The upshot is that 400/s is more than enough to at least complete 1 query. It's when you have multiple attempts that combine for overage in the timespan that Cosmos tells you to wait some time before being allowed to succeed again. This is dynamic and based on a more or less black box formula. It's not necessarily a simple division of allowance by charge, and no individual request would be faster or slower based on the limit.
You can see if you're getting throttled by inspecting the response, or monitor by checking the Azure dashboard metrics.

Prometheus recording rule to keep the max of (rate of counter)

Iam facing one dillema.
For performance reasons, I'm creating recording rules for my Nginx request/second metrics.
Original Query
sum(rate(nginx_http_request_total[5m]))
Recording Rule
rules:
- expr: sum(rate(nginx_http_requests_total[5m])) by (cache_status, host, env, status)
record: job:nginx_http_requests_total:rate:sum:5m
In original query I can see that my max traffic is 6.6k but in recording rule, its 6.2k. That's 400 TPS difference.
This is the metric for last one week
Question :
Is there any way to take the max of the original query and save it as recording rule. As it's TPS, I only care about the max, not the min.
I think having having 6% difference on value in some very short burst is pretty OK.
In your query your are getting (and recording) an average TPS during the last 5 minutes. There is no "max" being performed there.
The value will change depending on the exact time of the query evaluation - possibly why you see difference between raw query and values stored by recording rule.
Prometheus will extrapolate data some when executing functions like rate(). If you have last data point at time t, but running query at t+30s, then Prometheus will try to extrapolate value at t+30s (often this is noticed as a counter for discrete events will show fractional values)
You may want to use irate() function if you are really after peak values. It will use at each evaluation two most recent points to calculate most current increase as opposed to X minutes average that rate() provides.

Throughput calculation in performance testing

If there are 10k busses and peak point is 9:30Am to 10:30Am, there will be 2% increase of Vusers every year then what is the throughput after 10 years?
Please help me how to solve this type of questions without using a tool.
Thanks in Advance.
The formula would be:
10000 * (1.02)^10 = 12190
With regards to implementation, 10000 busses (whatever it is) per our is 166 per minute or 2.7 per second which doesn't seem to be a very high load for me. Depending on your load testing tool there are different options on how to simulate it, for Apache JMeter it will be Constant Throughput Timer, for LoadRunner there are Pages per minute / Hits Per Second goals, etc.

Performance testing - Jmeter results

I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.

Transaction per second - vuser relation

I want to setup loadtest with Loadrunner. System requirements are as below
1- max 30K users can be online i want to test if system can reach 15TPS.
2- i want to test if system can reach 2000TPS while some of online
users can visit 5 different pages. With how many vusers i should do this test ?
For both browsing and login operations response time is 0.1 or 0.2 seconds but think-time is ignored for login operation but 5 minutes for browsing operations. ( This value can be changed for sake of simplecity.) For login operation i setup vusers count to 30 and used 1000 iterations for reaching 15TPS.
i know that we can calculate vusers with below
number of required VUsers = required transaction per seconds * user
scenario length (sec)
but i m not sure how to apply this to second scenario.
Required TPS =15
users 5
Pacing =5/15
use this and it will work

Resources