In my test,i observed that value of average 90th percentile and value of average response time is same,say 28.
Can someone assist in which cases it might happened??
It might be the case that response time for all samplers is equal or similar.
Average Response Time is basically arithmetic mean, to wit sum of response times for all samplers divided by their count.
90% percentile is a statistical measurement, in case of JMeter it means that 90% of the sampler response times were smaller than or equal to this time
More information:
JMeter Glossary
Request Statistics Report
Generating Report Dashboard
Related
Calculating a percentile (95th, 99th) in my data set is expensive due to the large number of time series and time ranges ranging from weeks to months. The cost incurred is proportional to number of samples fetched from data store and computational overhead in processing the calculations. I am attempting to optimize the solution by calculating the statistics for smaller time ranges in parts, in stream as data points are ingested -- then estimating metrics for population from those samples. This approach works accurately for mean, peak (max) but require a good approximation for percentiles.
population_mean = mean(sample_mean_t0, sample_mean_t1, ... ,sample_mean_tn)
population_max = max(sample_max_t0, sample_max_t1 , ... , sample_max_tn)
To calculate p95, I am calculating 95th percentile over 95th percentile of all samples. Is this a reasonable approximation for calculating 95th percentile? (we are not attempting to solve the problem when there is high degree of skewness). Is there a better approximation I can use for calculating percentiles?
population_p95 = p95(sample_p95_t0, sample_p95_t1, ... , sample_p95_tn)
Does taking the average over sample p95s make more sense? Any reference here to approximate this solution and estimate errors will be helpful.
In the book Designing Data-Intensive Applications, there is this sentence:
For example, if the 95th percentile response time is 1.5 seconds, that means 95 out of 100 requests take less than 1.5 seconds, and 5 out of 100 requests take 1.5 seconds or more.
The confusing part is the saying that 95 of these requests will take less than 1.5 seconds. Isn't that supposed to be that 95 of requests take 1.5 seconds or less, and the remaining 5 takes more than 1.5 seconds? Or, the one percent in the 95th percentile takes exactly 1.5 seconds, 89th percentile and below take less than 1.5, and the 96th and above percentiles take more than 1.5? What is the correct reading of these numbers?
I have done some research on this and found several articles. The interesting part is that some say what I say and some don't.
Some of the links that read the percentile similar to 95 of the requests take 1.5 or less:
average 90th percentile response time and average response time
90% percentile is a statistical measurement, in case of JMeter it means that 90% of the sampler response times were smaller than or equal to this time
https://www.dynatrace.com/news/blog/why-averages-suck-and-percentiles-are-great/
so 90 percent of the requests are processed in 3.0 seconds or less
https://www.adfpm.com/adf-performance-monitor-monitoring-with-percentiles
If the 90th percentile of the same transaction is at 1000ms it means that 90% are as fast or faster and only 10% are slower.
Other links that read the percentile similar to 95 of the requests take less than 1.5:
https://www.elastic.co/blog/averages-can-dangerous-use-percentile
In contrast, the 99th percentile says “99% of your values are less than 850ms”, which is a very different picture.
I got the answer from this website and according to them, both of them is true. It just depends on how the percentile rank is calculated:
The word “percentile” is used informally in the above definition. In common use, the percentile usually indicates that a certain percentage falls below that percentile. For example, if you score in the 25th percentile, then 25% of test takers are below your score. The “25” is called the percentile rank. In statistics, it can get a little more complicated as there are actually three definitions of “percentile.” Here are the first two (see below for definition 3), based on an arbitrary “25th percentile”:
Definition 1: The nth percentile is the lowest score that is greater than a certain percentage (“n”) of the scores. In this example, or n is 25, so we’re looking for the lowest score that is greater than 25%.
Definition 2: The nth percentile is the smallest score that is greater than or equal to a certain percentage of the scores. To rephrase this, it’s the percentage of data that falls at or below a certain observation. This is the definition used in AP statistics. In this example, the 25th percentile is the score that’s greater or equal to 25% of the scores.
Can we calculate the overall kth percentile if we have kth percentile over 1 minute window for the same time period?
The underlying data is not available. Only the kth percentile and count of underlying data is available.
Are there any existing algorithms available for this?
How approximate will the calculated kth percentile be?
No. If you have only one percentile (and count) for every time period, then you cannot reasonably estimate that same percentile for the entire time period.
This is because percentiles are only semi-numerical measures (like Means) and don't implicitly tell you enough about their distributions above and below their measured values at each measurement time. There are a couple of exceptions to the above.
If the percentile that you have is the 50th percentile (i.e., the Mean), then you can do some extrapolation to the Mean of the whole time, but it's a bit sketchy and I'm not sure how bad the variance would be.
If all of your percentile measure are very close together (compared to the actual range of the measured population), then obviously you can use that as a reasonable estimate of the overall percentile.
If you can assume with high assurance that every minute's data is an independent sampling of the exact same population distribution (i.e., there is no time-dependence), then you may be able to combine them, possibly even if the exact distribution is not fully known (has parameter that are unknown, but still known to be fixed over the time-period). Again I am not sure what the valid functions and variance calculations are for this.
If the distribution is known (or can be assumed) to be a specific function or shape with some unknown value or values and where time-dependence has a known role in that function, then you should be able to using weighting and time-adjustments to transform into the same situation as #3 above. So for instance if the distributions were a time-varying exponential distribution of the form pdf(k,t) = (k*t)e^-(k*t) then I believe that you could derive an overall percentile estimate by estimating the value of k for by adjust it for each different minute (t).
Unfortunately I am not a professional statistician. I have Math/CS background, enough to have some idea of what's mathematically possible/reasonable, but not enough to tell exactly how to do it. If you think that your situation falls into one of the above categories, then you might be able to take it to https://stats.stackexchange.com but you will need to also provide the information I mentioned in those categories and/or detailed and specific information about what you are measuring and how you are measuring it.
Based on statistical instincts ,The error rate will be proportional to Standard Deviation of the total set. If you are creating a approximation for a longer time span , that includes the discrete chunks of kth percentile . [ clarification may be need for proving this theory.]
This may be more of a statistics question, and I'd like to find a solution with Excel. I'd rather use simple VBA if any coding is necessary.
Is there a way to estimate the percentile of a specific data point in a skewed distribution? I don't need exact percentiles and only need a reasonable estimate. I work on analyses that rely on weighted average benchmarks reported by multiple sources. All of my sources report the 25th, 50th, 75th, and 90th percentiles as well as the mean and standard deviation. We use these benchmarks to set a target range, and our goal is for our results from a specific analysis to land somewhere within the published percentiles. I'm often asked to indicate what percentile our specific result is at, and all I can provide is broad ranges like 25th-50th, etc. So, I'm then asked to use simple extrapolation to determine the specific percentile of the specific result, and I know that using this method is inaccurate.
Mean and median differ in 99% of cases in my data set, but % difference between mean and median on average is only 6%. Only about 10% of cases have mean and median with greater than 10% difference.
For the 90% of cases with relatively low % difference between mean and median, can I assume the normal distribution?
For cases with higher % difference between mean and median, can I make an assumption that will help me estimate more accurately? I could for these cases just use the normal distribution and send my percentile estimate along with a note indicating that the estimate is likely off in one direction or another, but I'd rather give a better estimate.
Responding to cybernetic.nomad:
First, thanks for commenting! Second, it doesn't seem to work. I think I don't have enough data. The attached image shows an example. The first 5 rows show one set of my weighted average benchmarks for a single case. Below that, I added two lines--one with my "target" amount. This could be any number but, to test out the formula you suggested, I entered my 50th percentile weighted average. The row below that has the results of the formula =percentrank.exc(25th:90th,target). The result should be 0.5 but it's not, so I don't think this works. example
I'm calculating the NPS (Net Promoter Scores) for about 50 different sessions at a recent event. Each session was attended by about 50-500 people, and the number of survey responses for each session ranges between 15-400.
If I know:
The number of respondents for each session (sample size)
The number of attendees for each session (population size)
The NPS score for each session (average rating, basically—more info below)
How can I figure out the margin of error and/or confidence intervals for each session in Excel?
What formula would I use where, for example, X = sample size, Y = population size, and Z = avg rating?
I don't need this to be incredibly correct as long as I'm in the ballpark—so you can ignore the NPS part which might throw things off slightly:
This is slightly complicated by the fact that NPS is a weird metric.
It asks "How likely would you be to recommend X to a friend or
colleague?" with a scale from 0-10 (10 = extremely likely, 0 = not at
all likely). You then count every 10 and 9 as a "promoter," count
every 8 and 7 as a "neutral" or "passive," and count everything
between 6 and 0 as a "detractor."
You then get the NPS by subtracting the detractors from the promoters, dividing that number by the total responses, then multiplying it by 100, so: ((Promoters - Detractors)/(Total responses))*100. NPS sort of flattens every response to a +1, 0, or -1, so it might complicate the calculations.
Assume I've already calculated the NPS for each session. I'm trying to figure out the margin of errors and/or confidence intervals for each session using Excel.
So, for example, my data would look like this:
Again, you can ignore the NPS stuff if it makes it easier and just assume it's an average rating where people were asked to rate each session from -100 to +100. What function(s) would I use in Excel to find the margin of error and/or confidence intervals for each session, given the sample size and target population size, and the average rating?