Google PageSpeed Insights score differs from dev tools

Google PageSpeed Insights score differs from dev tools - pagespeed-insights

I've got different results running Google Page Speed Insights (mobile) from Chrome Dev Tools and Google Page Speed Insights page.
When I run Audits Performance (mobile, 3G) from Chrome Dev Tools I get higer score than "official page".
Running from Chrome dev tools it says that I've implemented some optimization but running the test from Google Page Speed Insights page it suggests that optimization.
I've tried test in different timing but the score on Google Page Speed Insights are always lower than Chrome dev tools.
I've implemented some optimization like "defer images not in view" with a lazy loading, I've deferred the css loading but only Page Speed in Chrome dev tools recognize this optimizations.
Can someone help me with this?

I've been having the same issue and I guess the reply here is in this guide at the following FAQ:
Why do the field data and lab data contradict each other? The Field data says the URL is slow, but the Lab data says the URL is fast!
The field data is a historical report about how a particular URL has performed, and represents anonymized performance data from users in the real-world on a variety of devices and network conditions. The lab data is based on a simulated load of a page on a single device and fixed set of network conditions. As a result, the values may differ.

Related

How to measure performance of browser extension on websites (> 100)

I'm looking for tools for measuring webpage performance with certain browser(Chrome) extension installed. I would like to know things like # of requests, time to first byte, slowest call, average call, FCP, and LCP et. al..
I've used development tool that comes with the browser and extensions such as Page load time, Performance-Analyser.
I'm look for some method/tool that can load pages one by one from a list and be able to download the results, so I can test many webpages and batch process the results.
Thanks.

You can use any suitable browser automation framework i.e. Selenium which is some form a de-facto standard
Check out 6 Easy Steps to Testing Your Chrome Extension With Selenium for example
There is also Lighthouse tool which can also be considered and executed from shell scripts or programmatically, however it's more web-oriented hence not all metrics will be applicable so you might get a lot of false-negative results.

PageSpeed Insights number of distinct samples to show data for a URL logic

I'm reading the PageSpeed Insights documentation and am wondering if anyone knows how Google is determining what is considered a sufficient number of distinct samples per this FAQ:
Why is the real-world Chrome User Experience Report speed data not available for a URL?
Chrome User Experience Report aggregates real-world speed data from opted-in users and requires that a URL must be public (crawlable and indexable) and have sufficient number of distinct samples that provide a representative, anonymized view of performance of the URL.
I'm building a report centered around Core Web Vitals data and realizing some URLs have few data points with CWV timings, and I'm curious exactly how Google is handling these situations. I've been searching through docs and articles, haven't found anything with a specific reference.

The exact threshold is kept secret, so that's why you won't find it documented anywhere. However, as a site owner there are a few things you can do to work around a URL not having sufficient data:
Use the Core Web Vitals report by Search Console, which groups similar pages together, making them more likely to collectively exceed that threshold.
Look at origin-level aggregations in PSI or the CrUX API. These include user experiences from all pages on the origin, so it's much less granular, but it gives you a sense of typical experiences overall.
Instrument your site with your own first-party Core Web Vitals monitoring. web-vitals.js can be integrated with your existing analytics provider to track vitals on all of your pages. If you're integrating with Google Analytics, you can link your data with the Web Vitals Report to see how your site is doing.
Use your site with the Web Vitals extension enabled to see the Core Web Vitals data for your own experience. Your local experiences may not be representative of most users, but this can be a great tool for validating expectations vs reality.
Use lab data as a proxy. For example, lab data from Lighthouse in PSI can tell you how a mobile user on a slow connection might experience your page. This should really only be used as a last resort when no other field data is available.

Drastically different Google PageSpeed Insights "Lab Data" speeds between Mobile and Desktop experiences?

When running the pages of this website through Google Pagespeed Insights tool, I receive drastically different "Lab Data" (Time to Interactive, First Contentful Paint, Speed Index) speeds when comparing Mobile and Desktop. Desktop tends to receive values under 2 seconds, and as a result, the Pagespeed Insights score is generally in the 80s or 90s on each page. The Mobile score, however, suggests the page load speed is much slower, upwards to 10 seconds. As you may guess, I cannot reproduce anything close to these loading times on mobile. The mobile and desktop experience do not differ dramatically with the primary differences being styling using CSS media queries. Would love any help understanding why these values are so dramatically different!
Images for reference:
Desktop metrics
Mobile metrics

Page Speed Insights uses simulated CPU and Connection throttling to simulate mobile conditions people may experience when displaying your mobile score (no throttling exists on Desktop score).
Not everyone has a flagship phone (far from it) so they slow the CPU speed of their server by a factor of 4 to simulate the slower CPU speeds of mid and low end phones.
Similarly they also simulate a slow 4G connection to account for when people are out and about / have no WiFi connection. SO they add additional latency and slow the upload and download speeds to reflect this.
This is why you see such big differences on your site score between mobile and desktop.
If you want to simulate a similar speed yourself you can open developer tools in Google Chrome -> Network -> Look for the drop down that says "online" and change it to "Fast 3G".
Now reload your page and you can see the effects of additional latency and slower download speeds on your waterfall.

According to my analysis, this is due to the images on this page. However, Google PageSpeed Insights is very sensitive to mobile scores than desktop scores, so the stark difference between mobile and desktop scores is natural for this tool.
Try compressing the image first (you can use tinypng.com or other online tools), then use lazyload for image.

Why are Google Page Speed insight scores so different from GTMetrix, WebPageTest.org, Pingdom, etc?

Is this because it uses a slower connection to the website? I have read that it's a fast 3G connection? Is that used as well as field data?
I have websites that load in under 2 seconds but they fail the PSI tests.

Ryan - Google PageSpeed is a very robust tool. It is also stricter than other tools such as GTMetrix or Pingdom.
There are several factors that impact speed. Expect a variance of 5 to 7 points depending on the location of google servers relative to your server. If you are getting a larger variation - that could be your CDN instead of your server.
Double-check results in running Google Lighthouse. You can find this under Chrome dev tools.

Late answer but hopefully this will help people understand the difference.
Short Answer
Page Speed Insights (PSI) simulates a mid tier mobile phone on slow 4G connection. You will always score lower on PSI mobile tests as the other sites do not use throttling.
The desktop tab of PSI should be similar but yet again uses different metrics for score that the others do not appear to have updated to (at time of writing).
Longer Answer
Is this because it uses a slower connection to the website? I have read that it's a fast 3G connection?
Page Speed Insights (PSI),uses lighthouse to power it.
As part of this it uses simulated network throttling to simulate network latency and slower connection speeds (comparable to fast 3G / slow 4G).
It also simulates a slower CPU.
It does both of these to simulate a mid-tier mobile phone on a 4G connection. Mobiles have lower processing power and may be used "on the go" without WiFi.
GTMetrix, WebPageTest.org, Pingdom etc. all check the desktop version of the site.
This is the main reason you will see vastly different scores as they do not apply any form of throttling to the CPU or network speeds.
You should find that you get similar scores if you compare the desktop tab of PSI report to them as that is unthrottled.
Another difference (although I am not 100% sure) is that I think those sites are still using Lighthouse version 5 scoring at their core. Lighthouse changed to version 6 scoring earlier this year, to reflect the items that really matter to the end user. This is why I said "similar" scores in the previous paragraph.
Is that used as well as field data?
No field data is real world data, also known as RUM (Real User Metrics). It is collected from real visitors to your site.
It has no affect on your score on PSI as that is calculated each time from "lab data".
Field data is there for diagnostics (as RUM are far more reliable and help identify errors automated testing may miss such as an overloaded server, problems at certain screen sizes etc.)
I have websites that load in under 2 seconds but they fail the PSI tests.
Are you sure? It may show 2 seconds on automated tests (for desktop) but in the real world how can you know that?
One way to check is to actually monitor this information on your site. This answer I gave has all the relevant metrics you may want to gather and monitor for site performance.
If you combine that information with screen size and device information you have everything you need to identify issues in near real time.

Real Browser based load testing or Browser level user testing

I am currently working on multiple Load testing tool such as Jmeter, LoadRunner and Gatling.
All above tool works upon protocol level user load testing except TrueClient protocol offered by LoadRunner. Now something like real browser testing is in place which is definitely high on resources consumption tools such as LoadNinja and Flood.IO works on this novel concept.
I have few queries in this regards
What will be the scenario where real browser based load testing fits perfectly?
What real browser testing offers which is not possible in protocol based load testing?
I know, we can use Jmeter to Mimic browser behavior for load testing but is there anything different that real browser testing has to offer?

....this novel concept.....
You're showing your age a bit here. Full client testing was state of the art in 1996 before companies shifted en masse to protocol based testing because it's more efficient in terms of resources. (Mercury, HP, Microfocus) LoadRunner, and (Segue, Borland, Microfocus) Silk, and (Rational, IBM) Robot, have retained the ability to use full GUI virtual users (run full clients using functional automation tools) since this time. TruClient is a recent addition which runs a full client, but simply does not write the output to the screen, so you get 99% of the benefits and the measurements
What is the benefit. Well, historically two tier client server clients were thick. Lots of application processing going on. So having a GUI Virtual user in a small quantity combined with protocol virtual users allowed you to measure the cost/weight of the client. The flows to the server might take two seconds, but with the transform and present in the client it might take an addtional 10 seconds. You now know where the bottleneck is/was in the user experience.
Well, welcome to the days of future past. The web, once super thin as a presentation later, has become just as thick as the classical two tier client server applications. I might argue thicker as the modern browser interpreting JavaScript is more of a resource hog than the two tier compiled apps of years past. It is simply universally available and based upon a common client-server protocol - HTTP.
Now that the web is thick, there is value in understanding the delta between arrival and presentation. You can also observe much of this data inside of the performance tab of Chrome. We also have great w3c in browser metrics which can provide insight into the cost/weight of the local code execution.
Shifting the logic to the client also has resulted in a challenge on trying to reproduce the logic and flow of the JavaScript frameworks for producing the protocol level dataflows back and forth. Here's where the old client-server interfaces has a distinct advantage, the protocols were highly structured in terms of data representation. So, even with a complex thick client it became easy to represent and modify the dataflows at the protocol level (think database as an example, rows, columns....). HTML/HTTP is very much unstructured. Your developer can send and receive virtually anything as long as the carrier is HTTP and you can transform it to be used in JavaScript.
To make it easier and more time efficient for script creation with complex JavaScript frameworks the GUI virtual user has come back into vogue. Instead of running a full functional testing tool driving a browser, where we can have 1 browser and 1 copy of the test tool per OS instance, we now have something that scale a bit more efficiently, Truclient, where multiple can be run per OS instance. There is no getting around the high resource cost of the underlying browser instance however.

Let me try to answer your questions below:
What will be the scenario where real browser based load testing fits perfectly?
What real browser testing offers which is not possible in protocol based load testing?
Some companies do real browser based load testing. However, as you rightly concluded that it is extremely costly to simulate such scenarios. Fintech Companies mostly do that if the load is pretty less (say 100 users) and application they want to test is extremely critical and such applications cannot be tested using the standard api load tests as these are mostly legacy applications.
I know, we can use JMeter to Mimic browser behaviour for load testing but is there anything different that real browser testing has to offer?
Yes, real Browsers have JavaScript. Sometimes if implementation is poor on the front end (website), you cannot catch these issues using service level load tests. It makes sense to load test if you want to see how well the JS written by the developers or other logic is affecting page load times.
It is important to understand that performance testing is not limited to APIs alone but the entire user experience as well.
Hope this helps.

There are 2 types of test you need to consider:
Backend performance test: simulating X real users which are concurrently accessing the web application. The goal is to determine relationship between increasing number of virtual users and response time/throughput (number of requests per second), identify saturation point, first bottleneck, etc.
Frontend performance test: protocol-based load testing tools don't actually render the page therefore even if response from the server came quickly it might be the case that due to a bug in client-side JavaScript rendering will take a lot of time therefore you might want to use real browser (1-2 instances) in order to collect browser performance metrics
Well-behaved performance test should check both scenarios, it's better to conduct main load using protocol-based tools and at the same time access the application with the real browser in order to perform client-side measurements.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string