Origin Summary or Lab Data? - pagespeed-insights

I am struggling a bit to do some final improvements to my site. Search Console complains about CLS being greater than 0.25 for Desktop.
Checking the URL:s that Search Console has pointed out I can see that that CLS for Origin Summary is above the value, but CLS for Lab Data is below the accepted value.
I am trying to understand the differens between Origin Summary and Lab Data. It says that Original Summary is a collection of 28 days, should I just wait for about 4 weeks now that the Lab Data shows an accepted value?
Screenshoot

Related

Different field data for different sites, does anyone know why?

I've been working on improving our Core Web Vitals and thought I'd check another site to compare how we're doing.
I've noticed that one website has just the FCP, LCP, FID, CLS visible with a percentage image to represent how far away they are from the next stage. Yet, for our PageSpeed Insights Field Data we are seeing an additional 2 areas with Speed Index and Total Blocking Time, and we don't see the percentage image to help gauge.
Does anyone know how we can get ours to show the 4 main areas too please? I have attached images to show what I mean. Appreciate any help, thank you :)
enter image description hereenter image description here
It sounds like your site's PSI results don't have any field data, and the two additional metrics you're seeing are actually the lab data section. The other website may have more traffic than yours and qualify for inclusion in the public Chrome UX Report dataset, which could explain why they have field data in PSI but your site doesn't.
For example, here's a screenshot of the field and lab data sections:
The field data section resembles your screenshot while the lab data section has additional metrics for SI, TTI, and TBT.
All pages tested in PSI will have lab data but only the pages/origins in the Chrome UX Report will have field data available.
For more info about the difference between lab and field data see https://developers.google.com/web/fundamentals/performance/speed-tools#understanding_lab_vs_field_data

PageSpeed Insights - CLS on field data not improving

We implemented CLS optimization 20 days ago, actual values (lab data) are perfect from that time.
CLS on field data is different story. It is improving but very very slowly. If it is truth that it is calculated out of 28-day period, then we might see significantly better values.
We started with CLS of 1.06 and now we are on 0.68. Lab data on my computer shows CLS of 0.001
Is there any way to validate field data calculation?
Or is there any other reason I am not seeing? Thanks.
First after 20 days a CLS drop from 1.06 to 0.68 is good, you should level out at about 0.5 which is a big improvement.
Unfortunately the reason you have CLS issues is that you still have problems somewhere.
You see the synthetic lab tests only measure initial page load for CLS at 2 specific screen sizes.
The field data measures until page unload and at every screen size.
So your problem is either further down the page or caused by CLS at a different screen size than those tested.
As you have "maxed out" the synthetic tests the advice in this answer I gave may help you identify CLS issues, which covers 2 ways to test using developer tools and how to track real world data (the best way in my opinion) to help narrow down the cause.

Is the PageSpeed Insight display score got from the “Lab Data” or “Field Data”?

I've randomly tested a web link and got 64. However, the Lab Data and Field Data seems quite different. I think it's because the web page owner just modified it.
Is the score “64” reflecting the Lab Data or Field Data?
Short Answer
It is lab data score.
Longer Answer
The score you see there is the "lab data" score, it is the score for this synthetic test you just ran. It will change every time you run Page Speed Insights.
"Field Data" will not contribute towards your score in Page Speed Insights and is purely for diagnostics.
The "Field Data" is calculated over a rolling 30 days so is useful to see if there are issues that automated tests do not pick up, but useless if you have just done a major update to fix a site issue (for 30 days at least).
Additionally CLS in "Field Data" is calculated the whole time someone is on the site (until the "unload" event on a page), the PSI "Lab Data" is only calculated on the above the fold content. That is sometimes another reason for disparity between results.

Addressing Reliable Output in Newspaper3k

Current Behavior:
In attempting to use the News-aggregator package Newspaper3k , I am unable to produce consistent/reliable output.
System/Environment Setup:
Windows 10
Miniconda3 4.5.12
Python 3.7.1
Newspaper3k 0.2.8
Steps (Code) to Reproduce:
import newspaper
cnn_paper = newspaper.build('http://cnn.com')
print(cnn_paper.size())
Expected Behavior/Output (varies based on current links posted on cnn):
Produce consistent number of posted links on cnn on consecutive Print output runs.
Actual Behavior/Output
Running the code the first time produces a different number of links than code run immediately after.
1st Run Print output: 94 (as of time of posting this question)
2nd Run Print output: 0
3rd Run Print output: 18
4th Run Print output: 7
Printing the actual links will vary the same way as the above link count print. I have tried using a number of different news sources, and the same unexpected variance results. Do I need to change my User-Agent Header? Is this a detection issue? How do I produce reliable results?
Any help would be much appreciated.
Thanks.
My issue was resolved by better understanding of the default caching found under the heading 6.1.3 Article caching in the user documentation .
Apart from my general ignorance, my confusion came from the fact that the read the docs 'Documentation' listed the caching function as a TODO as can be seen here
Upon better scrutiny, I discovered:
By default, newspaper caches all previously extracted articles
andeliminates any article which it has already ex-tracted.This feature
exists to prevent duplicate articles and to increase extraction speed.
The return value of cbs_paper.size()changes from 1030 to 2 because
when we first crawled cbs we found 1030 articles. However, on our
second crawl, we eliminate all articles which have already been
crawled. This means 2 new articles have been published since our first
extraction.
You may opt out of this feature with the
memoize_articlesparameter.
You may also pass in the lower
level ‘‘Config‘‘ objects as covered in the advanced section.
>>>import newspaper
>>>cbs_paper = newspaper.build('http://cbs.com', memoize_articles=False)
>>>cbs_paper.size()1030

Get specific results from 3rd party website

I am trying to get results from realtor.ca for all houses that were built between 2000 and 2013. The advanced search does not have this feature, but I am trying to see if it is possible to add the search criteria in the URL.
I looked at the source code and the id for the value seems to be builtin_value. So, I added &builtin_value=2000,2011,2012,2013 to the URL string but this did not work.
After messing around with the site's URL structure for a little bit, I agree that you can't add the builtin_value parameter to the URL and there doesn't seem to be an equivalent.
That being said, I think that in this case, the Keywords text box is your friend. I tried doing some sample searches entering 2001 as the value for the Keywords text box and only got back houses that were built in 2001 - same with experiments for other years. You can exploit the fact that nobody enters numbers like 2001 unless it's in the context of the year the house was built in. The exception to this is for the year 2000, however. Entering the keyword 2000 comes back with a plethora of results, including every house with approximately 2000 square feet. Thus, if you can wriggle a relaxation of requirements out of your project manager or boss where you only need years built excluding 2000, then I think that you could write a relatively simple program which just executes a request for each year (i.e. a for loop iterating over the years 2000 to 2013), accepting the GET responses at face value, and paging the data as needed by incrementing the CurrentPage URL parameter until you don't get anymore results.
If you really need the year 2000 then you could write a scraper that is more sophisticated than what I am proposing above. It would have to GET each search result and vet that it was indeed built in the year 2000.
For example, consider the following URLs:
A search for all houses in Vancouver built in 1971: https://www.realtor.ca/Residential/Map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertySearchTypeId=1&TransactionTypeId=2&StoreyRange=0-0&BedRange=0-0&BathRange=0-0&Keywords=1971&LongitudeMin=-123.2340278625491&LongitudeMax=-122.85396957397488&LatitudeMin=49.21465057441378&LatitudeMax=49.34746245927539&SortOrder=A&SortBy=1&viewState=m&Longitude=-123.043998718262&Latitude=49.2811012268066&ZoomLevel=12&CurrentPage=1
A search for all houses in Canada built in the year 2000 (lots of non-relevant results) on page 3: https://www.realtor.ca/Residential/map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertySearchTypeId=1&TransactionTypeId=2&StoreyRange=0-0&BedRange=0-0&BathRange=0-0&Keywords=2000&LongitudeMin=-135.1318359375&LongitudeMax=-37.8369140625&LatitudeMin=44.707725934249424&LatitudeMax=70.92742296535133&SortOrder=A&SortBy=1&viewState=m&CurrentPage=3
A search for all houses in Canada built in the year 2001: https://www.realtor.ca/Residential/map.aspx#CultureId=1&ApplicationId=1&RecordsPerPage=9&MaximumResults=9&PropertySearchTypeId=1&TransactionTypeId=2&StoreyRange=0-0&BedRange=0-0&BathRange=0-0&Keywords=2001&LongitudeMin=176.220703125&LongitudeMax=10.810546875&LatitudeMin=4.483729141145389&LatitudeMax=72.9471586872288&SortOrder=A&SortBy=1&viewState=m&CurrentPage=2
I hope that helps.

Resources