What factors influence the "Avoids Enormous Network Payloads" message? - pagespeed-insights

According to the official documentation, the target is 1600KB:
https://developers.google.com/web/tools/lighthouse/audits/network-payloads
However, even when the payload is larger than the target, it sometimes still shows as a passed audit:
payload of 2405KB still passes
What are the other conditions that allow large payloads to pass the audit?

Lighthouse scores are 0–100 based on some log normal distribution math.
1600KB is not a passing score, it is approximately a maximum possible 100 score.
As of right now the values used for distribution calculation are 2500KB point of diminishing returns and 4000KB median, which would correspond to scores of about 93 and 50 respectively.
That puts 2405KB result at ~94 score which is sufficient to pass.

Related

How to aggregate data by period in a rrdtool graph

I have a rrd file with average ping times to a server (GAUGE) every minute and when the server is offline (which is very frequent for reasons that doesn't matter now) it stores a NaN/unknown.
I'd like to create a graph with the percentage the server is offline each hour which I think can be achieved by counting every NaN within 60 samples and then dividing by 60.
For now I get to the point where I define a variable that is 1 when the server is offline and 0 otherwise, but I already read the docs and don't know how to aggregate this:
DEF:avg=server.rrd:rtt:AVERAGE CDEF:offline=avg,UN,1,0,IF
Is it possible to do this when creating a graph? Or I will have to store that info in another rrd?
I don't think you can do exactly what you want, but you have a couple of options.
You can define a sliding window average, that shows the percentage of the previous hour that was unknown, and graph that, using TRENDNAN.
DEF:avg=server.rrd:rtt:AVERAGE:step=60
CDEF:offline=avg,UN,100,0,IF
CDEF:pcavail=offline,3600,TREND
LINE:pcavail#ff0000:Availability
This defines avg as the 1-min time series of ping data. Note we use step=60 to ensure we get the best resolution of data even in a smaller graph. Then we define offline as 100 when the server is there, 0 when not. Then, pcavail is a 1-hour sliding window average of this, which will in effect be the percentage of time during the previous hour during which the server was available.
However, there's a problem in that RRDTool will silently summarise the source data before you get your hands on it, if there are many data points to a pixel in the graph (this won't happen if doing a fetch of course). To get around that, you'd need to have the offline CDEF done at store time -- IE, have a COMPUTE type DS that is 100 or 0 depending on if the avg DS is known. Then, any averaging will preserve data (normal averaging omits the unknowns, or the xff setting makes the whole cdp unknown).
rrdtool create ...
DS:rtt:GAUGE:120:0:9999
DS:offline:COMPUTE:rtt,UN,100,0,IF
rrdtool graph ...
DEF:offline=server.rrd:offline:AVERAGE:step=3600
LINE:offline#ff0000:Availability
If you are able to modify your RRD, and do not need historical data, then use of a COMPUTE in this way will allow you to display your data in a 1-hour stepped graph as you wanted.

The 95% of non-normally distributed points around a mean/median

I asked users to tap a location repeatedly. To calculate the size of a target in that location, such that 95% of users will hit that target successfully, I usually measure 2 std of the tap offsets from the centroid. That works if the tap offsets are normally distributed, but my data now is not distributed normally. How can I figure out the equivalent of a 2 std around the mean/median?
If you're only measuring in one dimension, the region encompassed by +/-2 std in a Normal distribution corresponds fairly well to the central 95% of the distribution. Perhaps it's worth working with quantiles instead - take the interval corresponding to that within the 2.5th and 97.5th percentiles - this will be robust to skew or any other departure from normality.

How to calculate waiting time for customer based on previous customers's waiting time

Can anyone suggest a method to calculate customer waiting time for a restaurant based on previous waiting times. My system stores the waiting time of each customer and based on this values i want to predict the waiting time for next customer.
You can't predict an exact figure.
But a simple statistical approach would be:
average( waiting_time ) + ( 2 * standard_deviation( waiting_time ) )
That is, take the average and add two standard deviations.
Assuming that wait time is normally distributed, the result from the above equation is the maximum amount of waiting time that approximately 95% of your customers would experience.
A Poisson process is a stochastic process which counts the number of events and the time that these events occur in a given time interval. The time between each pair of consecutive events, for example, customer waiting time, has an Exponential distribution. From wiki:
The exponential distribution occurs naturally when describing the lengths of the inter-arrival times in a homogeneous Poisson process.
Prediction
Using maximum likelihood estimation, you can use the inverse sample mean to get the rate parameter of exponential distribution.
Confidence Interval
From wiki:
A simple and rapid method to calculate an approximate confidence interval for the estimation of λ is based on the application of the central limit theorem. This method provides a good approximation of the confidence interval limits, for samples containing at least 15 – 20 elements. Denoting by N the sample size, the upper and lower limits of the 95% confidence interval are given by:
For more details, see Poisson process and Exponential distribution.

In New Relic RPM, I get reports with an Apdex index listed. What is the subscript meaning?

This sounds ridiculous, but New Relic RPM reports an Apdex index in a form like this:
0.92(3.5)
Where the 3.5 is subscripted.
What does the 3.5 mean? I can't find the definition anywhere, and yet there it is in my reports, staring me in the face.
The bracketed/subscripted number is the threshold (in seconds) for your Apdex score. So, in your case, if the full application response (page load) is less than 3.5s then that satisfies the requirement. If your app responds slower than the threshold then your Apdex score is impacted.
This threshold is customizable, so you can select what is appropriate for your application type.
You can read more about Apdex in our docs.
The sub-scripted number is your target response time for that tier. On the user agent (browser) the high water mark is 7 seconds. You should check US-Only and make this number 2 to 4 seconds to be world class.
The app server tier must respond much faster. The high water mark default that NR sets is .5 seconds or 500 milliseconds, a world class page buffer flush would be in the 50-200 ms on average.
Remember all this information is about aggregated averages and not instance data which will have many outliers and have a broad distribution.

Creating a measure that combines a percentage with a low decimal number?

I'm working on a project in Tableau (which uses functions very similar to Excel, if that helps) where I need a single measurement derived from two different measurements, one of which is a low decimal number (2.95 on the high end, 0.00667 on the low) and the other is a percentage (ranging from 29.8 to 100 percent).
Put another way, I have two tables detailing bus punctuality -- one is for high frequency routes and measured in Excess Waiting Time (EWT, in minutes), the other for low frequency routes and measured in terms of percent on time. I have a map of all the routes, and want to colour the lines based on how punctual that route is (thinner lines for routes with a low EWT or a high percentage on time; thicker lines for routes with high EWT or low percentage on time). In preparation for this, I've combined both tables and zero'd out the non-existent value.
I thought I'd do something like log(EWT + PercentOnTime), but am realizing that might not give the value I'm wanting (especially because I ultimately need an inverse of one or the other, since low EWT is favourable and high % on time favourable).
Any idea how I'd do this? Thanks!
If you are combining/comparing the metrics in an even manner and the data is relatively linear then all you need to do is normalise them.
If you have the EWT expected ranges (eg. 0.00667 to 2.95). Then a 2 would be
(2 - 0.00667)/(2.95 - 0.00667) = 0.67723 but because EWT is the inverse semantically to punctuality we need to use 1-0.67723 = 0.32277.
If you do the same for the Punctuality percentage range:
Eg. 80%
(80 - 29.8)/(100-29.8) = 0.7169
You can compare these metrics because they are normalised (between 0-1 : multiply by 100 to get percentages) if you are assuming the underlying metrics (1-EWT) and on time percentage (OTP) are analogous.
Thus you can combine these into a single table. You will want to ignore all zero'd values as this is actually an indication you have no data at these points.
you'll have to use an if statement to say something like :
if OTP > 0 then ([OTP-29.8])/(100-29.8) else (1-(([EWT]-0.00667)/(2.95-0.00667)))
Hope this helps.

Resources