Varnish and WordPress, it is possible real caching without external plugin?

Varnish and WordPress, it is possible real caching without external plugin? - varnish

Maybe it sounds a novice question in Varnish Cache world, but why in WordPress it seems that is a need to install a external cache plugin, to working fully cached?
Websites are correctly loaded via Varnish, a curl -I command:
HTTP/1.1 200 OK
Server: nginx/1.11.12
Date: Thu, 11 Oct 2018 09:39:07 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
Cache-Control: max-age=0, public
Expires: Thu, 11 Oct 2018 09:39:07 GMT
Vary: Accept-Encoding
X-Varnish: 19575855
Age: 0
Via: 1.1 varnish-v4
X-Cache: MISS
Accept-Ranges: bytes
Pragma: public
Cache-Control: public
Vary: Accept-Encoding
With this configuration, by default WordPress installations are not being cached.
After test multiple cache plugins -some not working, or not working without complex configuration- i found the Swift Performance, in their Lite version, simply activating the Cache option, here really takes all advantages and here i can see varnish is working fully with very good results in stress test.
This could be ok for a single site on a single environment, but in shared hosting terms, when every customer can have their own WP (or other CMS) installation could be a problem.
So the key is there are no way to take full caching advantage from Varnish without installing 3rd party caching (and complex) plugins? Why not caching all by default?
Any kind of suggestions and help will be high welcome, thanks in advance.

With this configuration, by default WordPress installations are not being cached
By default, if you don't change anything in neither Wordpress or Varnish configuration, things would work together in a way that Wordpress pages are cached for 120 seconds. So real caching is possible, but it will be a short lived cache and highly ineffective one.
Your specific headers indicate that no caching should happen. They are either sent by Varnish itself (we're all guilty of copy pasting stuff without thinking what it does), or a Wordpress plugin (more often bad ones, than good). Without knowing your specific configuration, it's hard to decipher anything.
Varnish is a transparent HTTP caching proxy. Which means it’s just going to, by default, use HTTP headers, which are sent by backend (Wordpress), like Cache-Control, to make a decision on whether resource can be cached and for how long.
Wordpress, in fact, does not send cache related headers other than in a few specific areas (error pages, login POST submission, etc).
The standard approach outlined here is configuring Varnish with the highest TTL. With that:
Varnish has no idea when you update an article contents, or change theme. Typical solution to this lies in using cache invalidation plugin like Varnish HTTP Purge.
A plugin requirement comes from necessity to purge cache, when content is changed.
Suppose that you update a Wordpress page's text. You had that same page previously visited and it went into Varnish cache for storage. What happens upon the next visit, is that Varnish will serve the same, now stale content to all the next visitors.
The Wordpress plugins for Varnish, like Varnish HTTP Purge, will hook into Wordpress in a way that they will instruct Varnish to clear cache when pages are updated. This is their primary purpose.
That kind of approach (high TTL and cache purging) is de-facto standard with Varnish. As Varnish has no information about when you update content, the inner workings of purging cache is with the application itself. The cache purging feature is either bundled into CMS code itself (Magento 2, for example has it out of the box, without any extra plugins), or a Wordpress plugin.

Related

Azure Verizon CDN - 100% Cache CONFIG_NOCACHE

I set up an Azure Verizon Premium CDN a few days ago as follows:
Origin: An Azure web app (.NET MVC 5 website)
Settings: Custom Domain, no geo-filtering
Caching Rules: standard-cache (doesn't care about parameters)
Compression: Enabled
Optimized for: Dynamic site acceleration
Protocols: HTTP, HTTPS, custom domain HTTPS
Rules: Force HTTPS via Rules Engine (if request scheme = http, 301 redirect to https://{customdomain}/$1)
So - this CDN has been running for a few days now, but the ADN reports are saying that nearly 100% (99.36%) of the cache status is "CONFIG_NOCACHE" (Description: "The object was configured to never be cached in accordance with customer-specific configurations residing on the edge servers, so the response was served via the origin server.") A few (0.64%) of them are "NONE" (Description: "The cache was bypassed entirely for this request. For instance, the request was immediately rejected by the token auth module, or the client request method used an uncacheable request method such as "PUT".") Also, in the "Cache Hit" report, it says "0 hits, 0 misses" for every day. Nothing is coming through the "HTTP Large" side, only "ADN".
I couldn't find these exact messages while searching around, but I've tried:
Updating cache-control header to max-age, public (ie: cache-control: public,max-age=1209600)
Updating the cache-control header to max-age (cache-control: max-age=1209600)
Updating the expires header to a date way in the future (expires: Tue, 19 Jan 2038 03:14:07 GMT)
Using different browsers so the request cache info is different. In Chrome, the request is "cache-control: no-cache" in my browser. In Firefox, it'll say "Cache-Control: max-age=0". In any case, I'd assume the users on the website wouldn't have these same settings, right?
Refreshing the page a bunch of times, and looking at the real time report to see hits/misses/cache statuses, and it shows the same thing - CONFIG_NOCACHE for almost everything.
Tried running a "worldwide" speed test on https://www.dotcom-tools.com/website-speed-test.aspx, but that had the same result - a bunch of "NOCACHE" hits.
Tried adding ADN rules to set the internal and external max age to 864000 sec (10 days).
Tried adding an ADN rule to ignore "no-cache" requests and just return the cached result.
So, the message for "NOCACHE" says it's a node configuration issue... but I haven't really even configured it! I'm so confused. It could also be an application issue, but I feel like I've tried all the different permutations of "cache-control" that I can. Here's an example of one file that I'd expect to be cached:
Ultimately, I would hope that most of the requests are being cached, so I'd see most of the requests be "TCP Hit". Maybe that's incorrect? Thanks in advance for your help!

So, I eventually figured out this issue. Apparently Azure Verzion Premium CDN ADN platform has "bypass cache" enabled by default.
To disable this behavior you need to configure additional features to your caching rules.
Example:
IF Always
Features:
Bypass Cache Disabled
Force Internal Max-Age Response 200 864000 Seconds
Ignore Origin No-Cache 200

Isn´t no-cache, no-store enough to prevent browsers and proxies to cache?

I like to prevent any caching what and where so ever and has a response header with control-cache: private, proxy-revalidate, no-cache, no-store
But it seem to much, isn´t no-cache, no-store enough to prevent browsers and proxies to cache?

The best practice here is to set your response header as:
cache-control: no-cache, no-store, must-revalidate
This should give you a peace of mind.

Nope; nothing is guaranteed to work. If your data flows through/into that machine, it has the option to store it no matter what headers you set. That's not to say that every machine will, but all you're doing is asking nicely and hoping the software you're asking to will comply with your wishes. It's perfectly possible for that software to completely ignore your request and store the data anyway. If you don't want interim servers to be able to see the data, encrypt it, but if you're hoping to prevent the end client machine from keeping a copy of what you send it, you could well be out of luck
The only way to guarantee preventing a machine from storing some particular data is to never send that data to the machine, at all (which I'll admit, doesn't make for a very useful application in a lot of cases)

Creating a great .htaccess file that handles shared resources well

I am trying to create an ideal .htaccess file that will tick all the boxes for taking advantage of server compression, but only where it makes sense, and caching of resources on public proxies, again where it makes sense. I have been feeling my way through the process and I think I am pretty much there, but I suspect there might be a bit of finessing left to do and I thought I'd invite suggestions. I have my suspicions it's not there yet because of a great tool I have discovered and I have to share that with you to begin with.
www.pingdom.com has a great suite of website analysis tools, many of which are free to use and personally I think the best is http://tools.pingdom.com/fpt/. This shows you the load time of every element of your page, but more importantly, under it's 'Performance Grade' tab it offeres a breakdown of where things could be better. Now I use a number of JQuery resources that are served by Google (and others) and I understand these should exist on many proxy servers. I'm not sure how to say that in my .htaccess file (although I have tried) and sure enough, Pingdom's anaylsis includes the following feedack:
The following publicly cacheable, compressible resources should have a
"Vary: Accept-Encoding" header:
•http://jmar777.googlecode.com/svn/trunk/js/jquery.easing.1.3.js
•http://kwicks.googlecode.com/svn/branches/v1.5.1/Kwicks/jquery.kwicks-1.5.1.pack.js
Well I thought I'd done that, but then again, perhaps it's up to the servers that actually serve those resources to set those headers, and maybe there's nothing I can do about it? Is that so? Anyway here is my .htaccess file at the moment. Please note I have the caching set insanely low because I am still just experimenting / learning with it. I will adjust this up before I go live with it.
suPHP_ConfigPath /home/mydomain/public_html
<Files php.ini>
order allow,deny
deny from all
</Files>
<ifModule mod_deflate.c>
<filesMatch "\.(js|css|php|htm|html)$">
SetOutputFilter DEFLATE
</filesMatch>
</ifModule>
# 1 HOUR
<filesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf|htm|html|)$">
Header set Cache-Control "max-age=3600, public"
</filesMatch>
# PHP - NO CACHING WANTED DUE TO USING SESSION COOKIES
<filesMatch "\.(php)$">
Header set Cache-Control "private"
</filesMatch>
# STORE BOTH COMPRESSED AND UNCOMPRESSED FILES FOR JS & CSS
<IfModule mod_headers.c>
<FilesMatch "\.(js|css|xml|gz)$">
Header append Vary Accept-Encoding
</FilesMatch>
</IfModule>
You can see I am trying to do a 'Vary Accept-Encoding' towards the end of the file but not sure if this is what's needed. How do I tell clients to access JQuery and the like the proxies those files are undoubtedly stored at, and is there anything else I can do to make my .htaccess file deliver faster and my search engine friendly content?
Thank you for your thoughts.
Edit:
It seems my questions here were not clear enough so here goes with some clarification:
1) Is the JQuery library, hosted at Google, something whose proxy availability is somehow under the control of my .htaccess settings, because I make remote reference to it in my PHP, and if so, how should I say, in my .htaccess file, 'please cache that library in a proxy for a year or so'?
2) How too should I specify that Google hosted files should be provided compressed and uncompressed via 'Vary Accept-Encoding'? At a guess I'd say both issues were under Googles control and not mine, so to make that absolutely explicit...
3) Is the compression choices and proxification of files like the JQuery library under my control or under (in this case) Googles?
4) Generally, is anything in my .htaccess file expressed in a sub-optimal (long winded) way and how could I shorten/compact it?
5) Is anything in the .htaccess file sequenced in a way that might cause problems - for example I refer to CSS under three separate rules - does the order matter?
(End of Edit).

Is the JQuery library, hosted at Google, something whose proxy availability is somehow under the control of my .htaccess settings, because I make remote reference to it in my PHP, and if so, how should I say, in my .htaccess file, 'please cache that library in a proxy for a year or so'?
This assertion is not correct. The browser decides to cache or not, to download or not depending on the header exchange for that request only. So if a query response involves requests to multi-site, then your .htaccess file(s) only influence how it caches your files. How it caches Google's is up to Google to decide. So for example, a request to http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js received the response headers:
Age:133810
Cache-Control:public, max-age=31536000
Date:Fri, 17 Feb 2012 21:52:27 GMT
Expires:Sat, 16 Feb 2013 21:52:27 GMT
Last-Modified:Wed, 23 Nov 2011 21:10:59 GMT
Browsers will normally cache for a year but may decide to revalidate on reuse:
If-Modified-Since:Wed, 23 Nov 2011 21:10:59 GMT
And in this case ajax.googleapis.com will reply with a 301 and the following headers:
Age:133976
Date:Fri, 17 Feb 2012 21:52:27 GMT
Expires:Sat, 16 Feb 2013 21:52:27 GMT
This short request/response dialogue will typically require ~50 mSec since this content is CDN delivered.
You might wish to rework your other supplemental Qs in this light since some don't apply

HTTP Headers - Cache Question

I am making a a request to an image and the response headers that I get back are:
Accept-Ranges:bytes
Content-Length:4499
Content-Type:image/png
Date:Tue, 24 May 2011 20:09:39 GMT
ETag:"0cfe867f5b8cb1:0"
Last-Modified:Thu, 20 Jan 2011 22:57:26 GMT
Server:Microsoft-IIS/7.5
X-Powered-By:ASP.NET
Note the absence of the Cache-Control header.
On subsequent requests on Chrome, Chrome knows to go to the cache to retrieve the image. How does it know to use the cache? I was under the impression that I would have to tell it with the Cache-Control header.

You have both an ETag and a Last-Modified header. It probably uses those. But for that to happen, it still needs to make a request with If-None-Match or If-Modified-Since respectively.

To set the Cache-Control You have to specify it yourself. You can either do it in web.config , IIS Manager for selected folders (static, images ...) or set it in code. The HTTP 1.1 standard recommends one year in future as the maximum expiration time.
Setting expiration date one year in future is considered good practice for all static content in your site. Not having it in headers results in If-Modified-Since requests which can take longer then first time requests for small static files. In these calls ETag header is used.
When You have Cache-Control: max-age=315360000 basic HTTP responses will outnumber If-Modified-Since> calls and because of that it is good to remove ETag header and result in smaller static file response headers. IIS doesn't have setting for that so You have to do response.Headers.Remove("ETag"); in OnPreServerRequestHeaders()
And if You want to optimize Your headers further You can remove X-Powered-By:ASP.NET in IIS settings and X-Aspnet-Version header (altough I don't see in Your response) in web.config - enableVersionHeader="false" in system.web/httpRuntime element.
For more tips I suggest great book - http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839

Pocket IE: Still seems to be caching?

I'm having trouble with a particular version of Pocket IE running under Windows Mobile 5.0. Unfortunately, I'm not sure of the exact version numbers.
We had a problem whereby this particular 'installation' would return a locally cached version of a page when the wireless network was switched off. Fair enough, no problem. We cleared the cache of the handheld and started sending the following headers:
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Last-Modified: Thu, 30 Jul 2009 16:42:08 GMT
The Last-Modified header is calculated on the fly and set to 'now'.
Even still, the handheld seems to be caching these pages: the page is sent with the headers but then when they disconnect the wireless network and click a link to the page (that was not supposed to be cached) it still returns this cached file.
Is there some other header/s that should be sent, or is this just a problem with Pocket IE? Or is it possibly something entirely different?
Thanks!

I'm not sure I can answer your question since I have no Pocket IE to test with, but maybe I can offer something that can help.
This is a very good caching reference: http://www.mnot.net/cache_docs/
Also, I'm not sure whether your example is the pasted results of your headers, or the code that you've set up to send the headers, but I believe the collection of headers in most language implementations (and by extension I assume most browser implementations) is treated as a map; therefore, it's possible you've overwritten "no-store, no-cache, must-revalidate" with the second "Cache-Control" header. In other words, only one can get sent, and if last wins, you only sent "post-check=0, pre-check=0".
You could also try adding the max-age=0 header.
In my experience both Firefox and IE have seemed more sensitive to pages served by HTTPS as well. You could try that if you have it as an option.
If you still have no luck, and Pocket IE is behaving clearly differently from Windows IE, then my guess is that the handheld has special rules for caching based on the assumption that it will often be away from internet connectivity.
Edit:
After you mentioned CNN.com, and I realized that you do not have the "private" header in Cache-Control. I think this is what is making CNN.com cache the page but not yours. I believe "private" is the most strict setting available in the "Cache-Control header. Try adding that.
For example, here are CNN's headers. (I don't think listing "private" twice has any effect)
Date: Fri, 31 Jul 2009 16:05:42 GMT
Server: Apache
Accept-Ranges: bytes
Cache-Control: max-age=60, private, private
Expires: Fri, 31 Jul 2009 16:06:41 GMT
Content-Type: text/html
Vary: User-Agent,Accept-Encoding
Content-Encoding: gzip
Content-Length: 21221
200 OK
If you don't have the Firefox Web Developer Toolbar, it's a great tool to check Response Headers of any site - in the "Information" dropdown, "View Reponse Headers" is at the bottom.

Although Renesis has been awesome in trying to help me here, I've had to give up.
By 'give up' I mean I've cheated. Instead of trying to resolve this issue on the client side, I went the server side route.
What I ended up doing was writing a function in PHP that will take a URL and essentially make it unique. It does this by adding a random GET parameter based on a call to uniqid(). I then do a couple of other little things to it: make sure I add a '?' or a '&' to the URL based on the existence of other GET parameters and make sure that any '#' anchor items are pushed right to the end and then I return that URL to the browser.
This essentially resolves the issue as each link the browser ever sees is unique: it's never seen that particular URL before and so can't retrieve it from the cache.
Hackish? Yes. Working? So far, so good.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string