I have no experience with Varnish, so please bear with me.
We have inserted Google Tag Manager into a clients site. The Tag Manager injects Google Analytics tracking code (and nothing else) into the page. The clients technical service provider has now complained that the Tag Manager prevents the Varnish cache from working.
My guess is that this has nothing to do with the tag manager as such but is rather caused by the cookies from Google Analytics - apparently in the default configuration pages with cookies are not cached. However since I'm not very familiar with Varnish I cannot speak with any authority in the matter.
So my question is: is there any reason why Google Tag Manager itself (not any tags inside the tag manager) would invalidate a Varnish cache on each request ? A web search turned up nothing specific regarding Varnish and GTM.
Thank you for your time,
Eike
Google Tag Manager will not interfere with Varnish cache in any way. The reason being is that the requests for Google Tag Manager are sent to google-analytics.com, not your website.
The cookies are then set by google-analytics.com and are only sent between the clients browser and google-analytics.com.
This means that Google Tag Manager does not actually have any affect on your website apart from the initial Javascript being loaded from there.
In fact varnish does not validate any cookie that is created through javascript, only caches the "set-cookie header" of the http request.
The problem you may be having is, if the "DataLayer" is placed in the html code, the values of the variables do not change as they would be in cache.
To solve this problem, we must make another http call (ex. ajax) does not to cache, it returns the variables for DataLayer.
Related
I am using two application load balancers that are routing requests to 4 backend varnish servers. I got answers to configure the PHP file to purge the cache but I have no idea where to put it and how to execute.
For which type of PHP application are you trying to configure cache purging?
A custom application?
WordPress?
Drupal?
Magento?
Some other CMS or framework?
If you're using an existing platform, CMS, or framework, the documentation will probably state how to configure purging.
Varnish Configuration
Of course, the Varnish VCL code should also be tuned to process purges.
You can find more information about purging (and banning) in Varnish on http://varnish-cache.org/docs/6.0/users-guide/purging.html
Here are the questions you should ask yourself regarding purging. Maybe the documentation of your CMS or framework will answer these as well?
Are you trying to purge individual URLs?
Does your code have pattern matching in play to invalidate multiple URLs at once (uses bans in VCL)
If pattern matching is used, are you sending the invalidation pattern via an HTTP request header?
Does your invalidation code use the URL to identify objects in cache, or does it rely on tagged content?
Are you restricting access to the purging mechanism based on IP address or subnet. If so, please configure an ACL in VCL.
Many WordPress plugins rely on individual URL purging. Other WordPress plugins use bans through request header patterns.
Drupal uses bans, but has a system in place that tags content. The ban patterns don't match URLs, but tags.
Magento uses bans.
Conclusion
If you use a CMS or framework, the purging strategy is set in advance. It's just a matter of configuring your app and making sure the VCL can handle it.
If you have custom code, you have a choice, and you can implement purging or banning.
Please have a look at the user guide section about purging I mentioned above. It should help you understand the underlying mechanism.
I have a setup a azure cdn that point to my webapp. while i am changing in my style sheet and deploying webapp, the styles are updating immediately. so is there no any rquiremtn for purge in this case? does in this case cdn automatically update styles from webapp?
I am working according to this article
https://azure.microsoft.com/en-in/documentation/articles/cdn-websites-with-cdn/
If the URL of the resource remains the same, the CDN servers (and the browsers) are free to cache them. So, if you are using CDN, you need to force a URL change every time the file content changes (commonly done by adding a version string).
Since, it is working for you, either your files are not getting served from the CDN at all or somehow the URL is getting updated.
Look at the URL from where your style sheet is getting fetched (network tab in the browser's debugger). Make sure the URL path is actually from the CDN and not your website directly.
If you have a MVC.net app and you are using System.Web.Optimization.BundleCollection for style bundle, it add a query parameter to the URL embedded in the HTML and changes it if the file contents change. This ensures that the stale cached copies of the resources are not used.
See CDN and bundle caching sections at http://www.asp.net/mvc/overview/performance/bundling-and-minification
No, CDN does not automatically update the CSS for webapp.
To be safe, you should always purge.
CDN is a global service, you saw the CSS update doesn't mean everyone else all see the CSS update. Another IP address might still have the old CSS cached.
Besides, cache control header also plays a role here.
I'm using SSL on my website and it is giving me the lock with yellow triangle icon ("The site uses SSL, but Google Chrome has detected insecure content on the page.")
On clicking the lock icon it says:
Your connection to domainname is encrypted with 256-bit encryption. However, this page includes other resources which are not secure. These resources can be viewed by others while in transit, and can be modified by an attacker to change the look of the page. The connection uses TLS 1.0. The connection is encrypted using AES_256_CBC, with SHA1 for message authentication and DHE_RSA as the key exchange mechanism. The connection is not compressed.
How do I ensure I get the green lock?
You must have resources (images, stylesheets, scripts, etc...) which are embedded on the page but are not served over https. Make sure all your resources are served over https, and that warning should go away.
I had the same problem and it occoured because I included a script from Google Analytics using HTTP.
With a provider like Google, one can simply change HTTP to HTTPS - and it will work. This will not work with all providers.
If you are trying to load something from a website that you own, you will have to secure that website with HTTPS.
Google Chrome will detect this and automatically not load the insecure content (from the HTTP domain) which may take away some functionality from the website.
Certain AV/Malware softwares will also detect this and give a security warning which may frighten your visitors away.
If you are using Google Chrome, then you might not notice such a warning, because the AV/Malware software never sees this HTTP-link because it is blocked by Google Chrome.
And if you do not have the kind of AV/Malware software that detects this then you may never notice such a warning while the visitors are.
What you must do is:
Install Google Chrome and go to the website.
Click on "Tools >> JavaScript Console" and see if any warning appears. (this has been commented by Brad Koch on the question as well)
Go throught the different pages on your website and see if any errors appear - if so, then go change the URLs to HTTPS (if this is possible) or find another provider for this javascript.
make sure all references to resources such as images, js files, css files, ads, etc are served through https. If the uri to the resource is relative, e.g. /images/logo.png, then the resource is fetched from the same host and port and protocol as the page itself, in your case https. I would use fiddler to find what files get fetched over http:// when the page is loaded.
This is what I get when I go to TOOLS and then click on JAVASCRIPT CONSOLE ::
Failed to load resource chrome://thumb/https://accounts.google.com/ServiceLogin?service=chromiumsyn...
s%3A%2F%2Fwww.google.com%2Fintl%2Fen-US%2Fchrome%2Fblank.html%3Fsource%3D1
Failed to load resource chrome://thumb/http://www.xe.com/
What do I do after this?
We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.
In Firefox or Chrome I'd like to prevent a private web page from making outgoing connections, i.e. if the URL starts with http://myprivatewebpage/ or https://myprivatewebpage/ in a browser tab, then that browser tab must be restricted so that it is allowed to load images, CSS, fonts, JavaScript, XmlHttpRequest, Java applets, flash animations and all other resources only from http://myprivatewebpage/ or https://myprivatewebpage/, i.e. an <img src="http://www.google.com/images/logos/ps_logo.png"> (or the corresponding <script>new Image(...) must not be able to load that image, because it's not on myprivatewebpage. I need a 100% and foolproof solution: not even a single resource outside myprivatewebpage can be accessible, not even at low probability. There must be no resource loading restrictions on Web pages other than myprivatewebpage, e.g. http://otherwebpage/ must be able to load images from google.com.
Please note that I assume that the users of myprivatewebpage are willing to cooperate to keep the web page private unless it's too much work for them. For example, they would be happy to install a Chrome or Firefox extension once, and they wouldn't be offended if they see an error message stating that access is denied to myprivatewebpage until they install the extension in a supported browser.
The reason why I need this restriction is to keep myprivatewebpage really private, without exposing any information about its use to webmasters of other web pages. If http://www.google.com/images/logos/ps_logo.png was allowed, then the use of myprivatewebpage would be logged in the access.log of Google's ps_logo.png, so Google's webmasters would have some information how myprivatewebpage is used, and I don't want that. (In this question I'm not interested in whether the restriction is reasonable, but I'm only interested in the technical solutions and its strengths and weaknesses.)
My ideas how to implement the restriction:
Don't impose any restrictions, just rely on the same origin policy. (This doesn't provide the necessary protection, the same origin policy lets all images pass through.)
Change the web application on the server so it generates HTML, JavaScript, Java applets, flash animations etc. which never attempt to load anything outside myprivatewebpage. (This is almost impossibly hard to foolproof everywhere on a complicated web application, especially with user-generated content.)
Over-sanitize the web page using a HTML output filter on the server, i.e. remove all <script>, <embed> and <object> tags, restrict the target of <img src=, <link rel=, <form action= etc. and also restrict the links in the CSS files. (This can prevent all unwanted resources if I can remember all HTML tags properly, e.g. I mustn't forget about <video>. But this is too restrictive: it removes all dyntamic web page functionality like JavaScript, Java applets and flash animations; without these most web applications are useless.)
Sanitize the web page, i.e. add an HTML output filter into the webserver which removes all offending URLs from the generated HTML. (This is not foolproof, because there can be a tricky JavaScript which generates a disallowed URL. It also doesn't protect against URLs loaded by Java applets and flash animations.)
Install a HTTP proxy which blocks requests based on the URL and the HTTP Referer, and force all browser traffic (including myprivatewebpage, otherwebpage, google.com) through that HTTP proxy. (This would slow down traffic to other than myprivatewebpage, and maybe it doesn't protect properly if XmlHttpRequest()s, Java applets or flash animations can forge the HTTP Referer.)
Find or write a Firefox or Chrome extension which intercepts all outgoing connections, and blocks them based on the URL of the tab and the target URL of the connection. I've found https://developer.mozilla.org/en/Setting_HTTP_request_headers and thinkahead.js in https://addons.mozilla.org/en-US/firefox/addon/thinkahead/ and http://thinkahead.mozdev.org/ . Am I correct that it's possible to write a Firefox extension using that? Is there such a Firefox extension already?
Some links I've found for the Chrome extension:
http://www.chromium.org/developers/design-documents/extensions/notifications-of-web-request-and-navigation
https://groups.google.com/a/chromium.org/group/chromium-extensions/browse_thread/thread/90645ce11e1b3d86?pli=1
http://code.google.com/chrome/extensions/trunk/experimental.webRequest.html
As far as I can see, only the Firefox or Chrome extension is feasible from the list above. Do you have any other suggestions? Do you have some pointers how to write or where to find such an extension?
I've found https://developer.mozilla.org/en/Setting_HTTP_request_headers and thinkahead.js in https://addons.mozilla.org/en-US/firefox/addon/thinkahead/ and http://thinkahead.mozdev.org/ . Am I correct that it's possible to write a Firefox extension using that? Is there such a Firefox extension already?
I am the author of the latter extension, though I have yet to update it to support newer versions of Firefox. My initial guess is that, yes, it will do what you want:
User visits your web page without plugin. Web page contains ThinkAhead block that would send a simple version header to the server, but this is ignored as plugin is not installed.
Since the server does not see that header, it redirects the client to a page to install the plugin.
User installs plugin.
User visits web page with plugin. Page sends version header to server, so server allows access.
The ThinkAhead block matches all pages that are not myprivatewebpage, and does something like set the HTTP status to 403 Forbidden. Thus:
When the user visits any webpage that is in myprivatewebpage, there is normal behaviour.
When the user visits any webpage outside of myprivatewebpage, access is denied.
If you want to catch bad requests earlier, instead of modifying incoming headers, you could modify outgoing headers, perhaps screwing up "If-Match" or "Accept" so that the request is never honoured.
This solution is extremely lightweight, but might not be strong enough for your concerns. This depends on what you want to protect: given the above, the client would not be able to see blocked content, but external "blocked" hosts might still notice that a request has been sent, and might be able to gather information from the request URL.