Lighthouse: Fix assets with an efficient cache policy - cache-control

I am trying to improve the performance of my website, and I saw that I must use an efficient cache policy. Does this mean that when the user enters the website, the website can be stored on their machine, increasing the page loading time, using a cache timeout?
I don't know if I should configure this in the website's meta tags or on the server, I use the following meta tags on my website:
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Cache-Control" content="no-cache" />
<meta http-equiv="Expires" content="-1" />
I believe that I am not using cache on my website, which is why Lighthouse is reporting?
So, what can I do to solve this, change something on the server or in the meta tag? I need an efficient answer
My server is Nginx and I don't use any cache policies on my server.
Picture of Lighthouse

This recommendation is shown when you do not cache "static" files such as CSS, images, JS for at least 6 months (and ideally a year).
At the moment the meta tags you show are saying "do not cache this page" so every time someone arrives at the page it will be served from your server and not from their local machine.
This is fine for the page itself if it is likely to change often but for things like your site CSS and JS that is going to hurt performance.
This answer on Nginx static files config should help you with the base config settings.
However 30 days is not sufficient so you will need to change that to 365 days.
It has been a while since I have had to do this but I believe the following would work (not tested)
location ~* \.(?:ico|css|js|gif|jpe?g|png)$ {
expires 365d;
add_header Vary Accept-Encoding;
access_log off;
}
Doing it via config is far better as you can decide if files that are updated often can be cached for less time (the cache length is a recommendation not something that affects your score. Set the cache as high as you can without painting yourself into a corner on files that update regularly.)
Yet again it has been a while but I believe the file is located at "/etc/nginx/sites-available/default" but I could be wrong there, hopefully someone who is more familiar with Nginx will be able to confirm or correct me on those points.
To test it simply update the file, then open developer tools on the site and head over to the "Network" tab. Reload the page and find a CSS file, image, JS file etc. and then check the cache-control headers for them. They should have a max age of 365 days equivalent ( max-age=31536000 is one year) as a minimum.

Related

Node js - Bundler for http2

I'm currently using babel to transform es6 code to es5 and browserify to bundle it to use it in the browser. Now I've began to using a http2 server (Nginx).
Http2 is more effective when it can load multiple small files instead of one big bundle.
How to best serve multiple js files instead of one big bundle?
I know that SystemJS can load multiple files in development without bundling, and for production you can use a DepCache to define the dependence trees of the modules you are importing
https://github.com/systemjs/systemjs/blob/master/docs/production-workflows.md
This approach would require you to ditch browserfy and change to systemjs as it only uses bundles.
I see that you didn't get the answer on your question till now. Thus I try to help you in spite of HTTP/2 is new for me too (it explains the long text of my answer :-)).
Good information about HTTP/2 can be find on the page https://blog.cloudflare.com/http-2-for-web-developers/. I repeat shortly:
stop concatenating files
stop inlining assets
stop sharding domains
continue minimizing of CSS/JavaScript files
continue loading from CDNs
continue DNS prefetching via <link rel='dns-prefetch' href='...' /> included in <head>
...
I want to add two additional points about the importance of setting HTTP headers Cache-Control and Link:
think about setting Cache-Control HTTP headers (especially max-age, expires and etag) on all content of your page. See details below. I strictly recommend to read the Caching Tutorial.
set Link HTTP header to use SERVER PUSH of HTTP/2.
The setting of HTTP headers LINK: are important to use server push feature of HTTP/2 (see here, here). RFC5988 and Section 19.6.1.2 of RFC2068 describe the feature existing in HTTP 1.1 already. Everybody knows Content-Type: application/json, but in the same way one could set less known Link: <...>; rel=prefetch, described here. For example, one can use
Link: </app/script.js>; rel=preload; as=script
Link: </fonts/font.woff>; rel=preload; as=font
Link: </app/style.css>; rel=preload; as=style
Such links, set on HTML page (like index.html), will informs HTTP server to push the resources together with the response on your HTML page. As the result you save unneeded round-trips and the later requests (after parsing HTML files) and the resources will be displayed immediately. You can consider to set the LINK headers on all images from your page to improve the visibility of your page. See here additional information with nice pictures, which demonstrates the advantage of HTTP/2 server push. If you use PHP then the code could be interesting for you.
The most web developers do some optimizations steps directly or indirectly. The steps are done either during building process or by setting HTTP headers in HTTP responses. One have to review some processes switch off someone and include another one. I try to summarize my results.
you can consider to use webpack instead of browserify to exclude some dependencies from merging. I don't know browserify good enough, but I know that webpack supports externals (see here), which allows to load some modules from CDN. In the next step you can remove any merging at all, but minimize and set cache-control on all your modules.
It's strictly recommended to load CSS/JS/Fonts, which you use, and which you don't developed yourself, from CDN. You should never merge such resources with your JavaScript files (what could you probably do with browserify now). Loading of Bootstrap CSS from your server is not good idea. One should better follow advises from here and use CDN instead ol downloading of all files locally.
The main reason of the usage of CDN is very easy to understand if you examine HTTP headres of the response from https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.1/jquery.min.js for example. You will find something like cache-control: public, max-age=30672000 and expires:Mon, 06 Mar 2017 21:25:04 GMT. Chrome will shows typically Status Code:200 (from cache) and you will see no traffic over the wire. If you explicitly reload the page (by pressing F5) then you will see a response with 222 bytes and Status Code:304. In other words the file will be typically didn't loaded at all. jQuery 2.2.1 stay forever the same. The next version will have another URL. The usage of HTTPS makes sure that the user will load really jQuery 2.2.1. If it's not enough then you can use https://www.srihash.org/ to calculate sha384 value and use extended form of <link> or <script>:
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.1/jquery.min.js"
integrity="sha384-8C+3bW/ArbXinsJduAjm9O7WNnuOcO+Bok/VScRYikawtvz4ZPrpXtGfKIewM9dK"
crossorigin="anonymous"></script>
If the user opens your page with the link then the sha384 hash will be recalculated and verified (by Chrome and Firefox). If the file is not yet in local cache then it will be loaded really quickly too. One short remark by loading the same file from https://code.jquery.com/jquery-2.2.1.min.js one uses HTTP 1.1 today, but from https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.1/jquery.min.js be used HTTP/2 protocol. i recommend to test the protocol by choosing the CDN. You can find here the list of CDNs which supports now HTTP/2. In the same way loading Bootstrap from https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css one would uses HTTP 1.1 today, but one would use HTTP/2 by loading the same data from https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.6/css/bootstrap.min.css.
I spend many time for CDN to make clear that the most advantage of CDN is setting of cashing headers of HTTP response and the usage of immutable URLs. You can do the same in your modules too.
One should think about the time of caching of every content returned from the server. You can use URLs to your modules, which contains version number of your component (like /script/mycomponent1.1.12341) and to change the last part of version number every time on changing the module. You can set long enough value of max-age in cache-control and your components will be cached by web browser of the client.
Finally I'd recommend you to verify that you installed the latest version of OpenSSL and the latest version of nginx. I recommend to verify your web site in http://www.webpagetest.org/ and in https://www.ssllabs.com/ssltest/ to be sure that you don't forget any simple steps.

can files on cloudflare disappear and what about performance

So i'm new in the web development world. But for some plugins i use at angular i notice that some use cloudflare.
like this :
<link rel="stylesheet" href="http://cdnjs.cloudflare.com/ajax/libs/select2/3.4.5/select2.css">
So my question is should i download the files just in case or will they be permanent online ?
And will this improve the speed of my website if u use cloudflare or is it just the other way around. because i notice that some files are not using a compact css file.
So that isn't actually referencing a site on CloudFlare, it is referencing a service called CDNJS, CloudFlare provide CDN services for them. These URLs are intended to the permanent endpoint in order to access scripts from CDNJS.
In ordinary CloudFlare sites, the CDN is transparent. CloudFlare works by proxying requests from the user to the origin server, they sit in the middle and perform caching and optimisation from their 64 data centers around the world, no need to alter your code at all.

HTTPS Failing on Media in Magento

I'm having a problem with a clients magento site that has https enabled on the secure pages,
The website it built heavily around static block content and on the https pages images are pulled from static blocks (over 400 of them) using the media insert in the static block {{media url="media/bla/bla/bob.png"}} these images are comign through as http://site.com/media/bla/bla/bob.png
its not realistic, and it wouldn't make any sense to go through and change all these links to direct links.
Any ideas?
Cheers
Roly!
You are suppose to use the {{store url=""}} or the {{secure_base_url}}media/ in ur blocks
if ur not certain that ur page will be on HTTPS or HTTP the use first one if you know for sure that the request will be HTTPs use second one. (NOTE. Second is a system config path not the actual value that u'll put in the CMS block).
Hope it helps.
Whereas media files are not subject to a fallback, and with the awareness that if the directory level for Magento changes w/r/t the webroot (e.g. http://site.com/ vs. http://site.com/magento/) you can lead with the double-slash network location:
<img src="//media/bla/bob.png" />
Therefore, a search and replace against using the current data in cms_block.content is indicated.
I'll reiterate that this is not appropriate for skin assets due to the fallback.

How to stop search engines from crawling the whole website?

I want to stop search engines from crawling my whole website.
I have a web application for members of a company to use. This is hosted on a web server so that the employees of the company can access it. No one else (the public) would need it or find it useful.
So I want to add another layer of security (In Theory) to try and prevent unauthorized access by totally removing access to it by all search engine bots/crawlers. Having Google index our site to make it searchable is pointless from the business perspective and just adds another way for a hacker to find the website in the first place to try and hack it.
I know in the robots.txt you can tell search engines not to crawl certain directories.
Is it possible to tell bots not to crawl the whole site without having to list all the directories not to crawl?
Is this best done with robots.txt or is it better done by .htaccess or other?
Using robots.txt to keep a site out of search engine indexes has one minor and little-known problem: if anyone ever links to your site from any page indexed by Google (which would have to happen for Google to find your site anyway, robots.txt or not), Google may still index the link and show it as part of their search results, even if you don't allow them to fetch the page the link points to.
If this might be a problem for you, the solution is to not use robots.txt, but instead to include a robots meta tag with the value noindex,nofollow on every page on your site. You can even do this in a .htaccess file using mod_headers and the X-Robots-Tag HTTP header:
Header set X-Robots-Tag noindex,nofollow
This directive will add the header X-Robots-Tag: noindex,nofollow to every page it applies to, including non-HTML pages like images. Of course, you may want to include the corresponding HTML meta tag too, just in case (it's an older standard, and so presumably more widely supported):
<meta name="robots" content="noindex,nofollow" />
Note that if you do this, Googlebot will still try to crawl any links it finds to your site, since it needs to fetch the page before it sees the header / meta tag. Of course, some might well consider this a feature instead of a bug, since it lets you look in your access logs to see if Google has found any links to your site.
In any case, whatever you do, keep in mind that it's hard to keep a "secret" site secret very long. As time passes, the probability that one of your users will accidentally leak a link to the site approaches 100%, and if there's any reason to assume that someone would be interested in finding the site, you should assume that they will. Thus, make sure you also put proper access controls on your site, keep the software up to date and run regular security checks on it.
It is best handled with a robots.txt file, for just bots that respect the file.
To block the whole site add this to robots.txt in the root directory of your site:
User-agent: *
Disallow: /
To limit access to your site for everyone else, .htaccess is better, but you would need to define access rules, by IP address for example.
Below are the .htaccess rules to restrict everyone except your people from your company IP:
Order allow,deny
# Enter your companies IP address here
Allow from 255.1.1.1
Deny from all
If security is your concern, and locking down to IP addresses isn't viable, you should look into requiring your users to authenticate in someway to access your site.
That would mean that anyone (google, bot, person-who-stumbled-upon-a-link) who isn't authenticated, wouldn't be able to access your pages.
You could bake it into your website itself, or use HTTP Basic Authentication.
https://www.httpwatch.com/httpgallery/authentication/
In addition to the provided answers, you can stop search engines from crawling/indexing a specific page on your website in .robot.text. Below is an example:
User-agent: *
Disallow: /example-page/
The above example is especially handy when you have dynamic pages, otherwise, you may want to add the below HTML meta tag on the specific pages you want to be disallowed from search engines:
<meta name="robots" content="noindex, nofollow" />

What is HTTP cache best practices for high-traffic static site?

We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.

Resources