Calculating the throughput (requests/sec) and plot it - excel

I'm using JMeter client to test the throughtput of a certain workload (PHP+MySQL, 1 page) on a certain server. Basically I'm doing a "capacity test" with an increasing number of threads over the time.
I installed the "Statistical Aggregate Report" JMeter plugin and this was the result (ignore the "Response time" line):
At the same time I used the "Simple Data Writer" listener to write a log file ("JMeter.csv"). Then I tried to "manually" calculate the throughput for every second of the test.
Each line of "JMeter.csv" has this format:
timestamp elaspedtime responsecode success bytes
1385731020607 42 200 true 325
... ... ... ... ...
The timestamp is referred to the time when the request is made by the client, and not when the request is served by the server. So I simply did: totaltime = timestamp + elapsedtime.
In the next step I converted the totaltime to a date format, like: 13:17:01.
I have more than 14K samples and with Excel I was able to do this quickly.
Then I counted how many samples there were for each second. Example:
totaltime samples (requestsServed/second)
13:17:01 204
13:17:02 297
... ...
When I tried to plot the results I obtained the following graphic:
As you can notice it is far different from the first graphic.
Given that the first graphic is correct, what is the mistake of my formula/procedure to calculate the throughput?

It turns out that this plugin is plotting something that I don't know... I tried many times and my considerations were actually correct. Be careful with this plugin (or check its source code).

Throughput can be view in Jmeter Summary Report and you can calculate by saving your Test Results file in xml file in Summary Report.
Throughput = Number of samples/(Max (ts+t) - Min ts)*1000
Throughput = (Number of samples/The difference between Maximum and minimum response time)*1000
By this formula you can calculate Throughput for each and every http requests in Summary Report.
Example:
Max Response Time = 1485538701633+569 = 1485538702202
Min Response Time = 1485538143112
Throughput = (2/1485538702202-1485538143112)*1000
Throughput = (2/1505) *1000
Throughput = 0.00132890*1000
Throughput = 1.3/sec
You can read more with examples Here(http://www.wikishown.com/how-to-calculate-throughput-in-jmeter/), i got a good idea about Throughput Calculation.

Related

Prometheus recording rule to keep the max of (rate of counter)

Iam facing one dillema.
For performance reasons, I'm creating recording rules for my Nginx request/second metrics.
Original Query
sum(rate(nginx_http_request_total[5m]))
Recording Rule
rules:
- expr: sum(rate(nginx_http_requests_total[5m])) by (cache_status, host, env, status)
record: job:nginx_http_requests_total:rate:sum:5m
In original query I can see that my max traffic is 6.6k but in recording rule, its 6.2k. That's 400 TPS difference.
This is the metric for last one week
Question :
Is there any way to take the max of the original query and save it as recording rule. As it's TPS, I only care about the max, not the min.
I think having having 6% difference on value in some very short burst is pretty OK.
In your query your are getting (and recording) an average TPS during the last 5 minutes. There is no "max" being performed there.
The value will change depending on the exact time of the query evaluation - possibly why you see difference between raw query and values stored by recording rule.
Prometheus will extrapolate data some when executing functions like rate(). If you have last data point at time t, but running query at t+30s, then Prometheus will try to extrapolate value at t+30s (often this is noticed as a counter for discrete events will show fractional values)
You may want to use irate() function if you are really after peak values. It will use at each evaluation two most recent points to calculate most current increase as opposed to X minutes average that rate() provides.

HLS protocol: get absolute elapsed time during a live streaming

I have a very basic question and I didn't get if I googled wrong or if the answer is so simple that I haven't seen it.
I'm implementing a web app using hls.js as Javascript library and I need a way to get the absolute elapsed time of a live streaming e.g. if a user join the live after 10 minutes, I need a way to detect that the user's 1st second is 601st second of the streaming.
Inspecting the streaming fragments I found some information like startPTS and endPTS, but all these information were always related to the retrieved chunks instead of the whole streaming chunks e.g. if a user join the live after 10 minutes and the chunks duration is 2 seconds, the first chunk I'll get will have startPTS = 0 and endPTS = 2, the second chunk I'll get will have startPTS = 2 and endPTS = 4 and so on (rounding the values to the nearest integer).
Is there a way to extract the absolute elapsed time as I need from an HLS live streaming ?
I'm having the exact same need on iOS (AVPlayer) and came with the following solution:
read the m3u8 manifest, for me it looks like this:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:410
#EXT-X-TARGETDURATION:8
#EXTINF:8.333,
410.ts
#EXTINF:8.333,
411.ts
#EXTINF:8.334,
412.ts
#EXTINF:8.333,
413.ts
#EXTINF:8.333,
414.ts
#EXTINF:8.334,
415.ts
Observe that the 409 first segments are not part of the manifest
Multiply EXT-X-MEDIA-SEQUENCE by EXT-X-TARGETDURATION and you have an approximation of the clock time for the first available segment.
Let's also notice that each segment is not exactly 8s long, so when I'm using the target duration, I'm actually accumulating an error of about 333ms per segment:
410 * 8 = 3280 seconds = 54.6666 minutes
In this case for me the segments are always 8.333 or 8.334, so by EXTINF instead, I get:
410 * 8.333 = 3416.53 seconds = 56.9421 minutes
These almost 56.9421 minutes is still an approximation (since we don't exactly know how many time we accumulated the new 0.001 error), but it's much much closer to the real clock time.

Am I applying Little's Law correctly to model a workload for a website?

Using these metrics (shown below), I was able to utilize a workload modeling formula (Littleā€™s Law) to come up with what I believe are the correct settings to sufficiently load test the application in question.
From Google Analytics:
Users: 2,159
Pageviews: 4,856
Avg. Session Duration: 0:02:44
Pages / Session: 2.21
Sessions: 2,199
The formula is N = Throughput * (Response Time + Think Time)
We calculated Throughput as 1.35 (4865 pageviews / 3600 (seconds in an hour))
We calculated (Response Time + Think Time) as 74.21 (164 seconds avg. session duration / 2.21 pages per session)
Using the formula, we calculate N as 100 (1.35 Throughput * 74.21 (Response Time + Think Time)).
Therefore, according to my calculations, we can simulate the load the server experienced on the peak day during the peak hour with 100 users going through the business processes at a pace of 75 seconds between iterations (think time ignored).
So, in order to determine how the system responds under a heavier than normal load, we can double (200 users) or triple (300 users) the value of N and record the average response time for each transaction.
Is this all correct?
When you do a direct observation of the logs for the site, blocked by session duration, what are the maximum number of IP addresses counted in each block?
Littles law tends to undercount sessions and their overhead in favor of transactional throughput. That's OK if you have instantaneous recovery on your session resources, but most sites are holding onto them for a period longer than 110% of the longest inter-request window for a user (the period from one request to the next).
Below formula always worked good for me, If you are looking to calculate pacing
"Pacing = No. of Users * Duration of Test (in seconds) / Transactions you want to achieve in said Test Duration"
You should be able to get closer to the Transactions you want to achieve using this Formula. If Its API, then its almost always accurate.
For Example, You want to achieve 1000 transactions using 5 users in one hour of Test Duration
Pacing = 5 * 3600/1000
= 18 seconds

Performance testing - Jmeter results

I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.

Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data

We have a metric that we increment every time a user performs a certain action on our website, but the graphs don't seem to be accurate.
So going off this hunch, we invested the updates.log of carbon and discovered that the action had happened over 4 thousand times today(using grep and wc), but according the Integral result of the graph it returned only 220ish.
What could be the cause of this? Data is being reported to statsd using the statsd php library, and calling statsd::increment('metric'); and as stated above, the log confirms that 4,000+ updates to this key happened today.
We are using:
graphite 0.9.6 with statsD (etsy)
After some research through the documentation, and some conversations with others, I've found the problem - and the solution.
The way the whisper file format is designed, it expect you (or your application) to publish updates no faster than the minimum interval in your storage-schemas.conf file. This file is used to configure how much data retention you have at different time interval resolutions.
My storage-schemas.conf file was set with a minimum retention time of 1 minute. The default StatsD daemon (from etsy) is designed to update to carbon (the graphite daemon) every 10 seconds. The reason this is a problem is: over a 60 second period StatsD reports 6 times, each write overwrites the last one (in that 60 second interval, because you're updating faster than once per minute). This produces really weird results on your graph because the last 10 seconds in a minute could be completely dead and report a 0 for the activity during that period, which results in completely nuking all of the data you had written for that minute.
To fix this, I had to re-configure my storage-schemas.conf file to store data at a maximum resolution of 10 seconds, so every update from StatsD would be saved in the whisper database without being overwritten.
Etsy published the storage-schemas.conf configuration that they were using for their installation of carbon, which looks like this:
[stats]
priority = 110
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974
This has a 10 second minimum retention time, and stores 6 hours worth of them. However, due to my next problem, I extended the retention periods significantly.
As I let this data collect for a few days, I noticed that it still looked off (and was under reporting). This was due to 2 problems.
StatsD (older versions) only reported an average number of events per second for each 10 second reporting period. This means, if you incremented a key 100 times in 1 second and 0 times for the next 9 seconds, at the end of the 10th second statsD would report 10 to graphite, instead of 100. (100/10 = 10). This failed to report the total number of events for a 10 second period (obviously).Newer versions of statsD fix this problem, as they introduced the stats_counts bucket, which logs the total # of events per metric for each 10 second period (so instead of reporting 10 in the previous example, it reports 100).After I upgraded StatsD, I noticed that the last 6 hours of data looked great, but as I looked beyond the last 6 hours - things looked weird, and the next reason is why:
As graphite stores data, it moves data from high precision retention to lower precision retention. This means, using the etsy storage-schemas.conf example, after 6 hours of 10 second precision, data was moved to 60 second (1 minute) precision. In order to move 6 data points from 10s to 60s precision, graphite does an average of the 6 data points. So it'd take the total value of the oldest 6 data points, and divide it by 6. This gives an average # of events per 10 seconds for that 60 second period (and not the total # of events, which is what we care about specifically).This is just how graphite is designed, and for some cases it might be useful, but in our case, it's not what we wanted. To "fix" this problem, I increased our 10 second precision retention time to 60 days. Beyond 60 days, I store the minutely and 10-minutely precisions, but they're essentially there for no reason, as that data isn't as useful to us.
I hope this helps someone, I know it annoyed me for a few days - and I know there isn't a huge community of people that are using this stack of software for this purpose, so it took a bit of research to really figure out what was going on and how to get a result that I wanted.
After posting my comment above I found Graphite 0.9.9 has a (new?) configuration file, storage-aggregation.conf, in which one can control the aggregation method per pattern. The available options are average, sum, min, max, and last.
http://readthedocs.org/docs/graphite/en/latest/config-carbon.html#storage-aggregation-conf

Resources