Varnish Cache first time hit - varnish

I'm running varnish on a dedicated server. When i load a page, it is delivered via Apache and on the second and subsequent hits it is then delivered via Varnish Cache (i.e. I can see two timestamps in X-Varnish headers).
But when i open up the same page from some other computer, it's again delivered from the backend (apache) for the first time and on further reloads it comes from Varnish.
If a page is already in Varnish Cache, isn't it supposed to be delivered via Varnish even on a new computer for the first time? I've tried simple hello world php files without any database calls with the same effect. Might it be something wrong with my vcl file or Varnish works this way only?

check whether you sending session data (cookies) which then look like unique calls to varnish. the docs show you how to strip cookies.

Jon is right. I had similar problem. You also need to clean up your cookie and cache before test. Check if the first visit response header, it tries to set cookie. If so, you can do "unset beresp.http.Set-Cookie under vcl_fetch.

Related

How to identify https clients through proxy connection

We have developed a corporate NodeJS application served through http/2 protocol and we need to identify clients by their IP address because the server need to send events to clients based on their IP (basically some data about phone calls).
I can successfully get client IP through req.connection.remoteAddress but there are some of the clients that can only reach the server through our proxy server.
I know about x-forwarded-for header, but this doesn't work for us because proxies can't modify http headers in ssl connections.
So i though I could get the IP from client side and send back to the server, for example, during the login process.
But, if I'm not wrong, browsers doesn't provide that information to javascript so we need a way to obtain that information first.
Researching about it, the only option I found out is obtaining from a server which could tell me the IP from where I'm reaching it.
Of course, through https I can't because of the proxy. But I can easily enable an http service just to serve the client IP.
But then I found out that browsers blocks http connections from https-served pages because of "mixed active content" issue.
I read about it and I found out that I can get "mixed passive content" and I succeed in downloading garbage data as image file through <img>, but when I try to do the same thing using an <object> element I get a "mixed active content" block issue again even in MDN documentation it says it's considered passive.
Is there any way to read that data either by that (broken) <img> tag or am I missing something to make the <object> element really passive?
Any other idea to achieve our goal will also be welcome.
Finally I found a solution:
As I said, I was able to perform an http request by putting an <img> tag.
What I was unable to do is to read downloaded data regardless if it were an actual image or not.
...but the fact is that the request was made and to which url is something that I can decide beforehand.
So all what I need to do is to generate a random key for each served login screen and:
Remember it in association with your session data.
Insert a, maybe hidden, <img> tag pointing to some http url containing that id.
As soon as your http server receive the request to download that image, you could read the real IP through the x-forwarded-for header (trusting your proxy, of course) and resolve to which active session it belongs.
Of course, you also must care to clear keys, regardless of being used or not, after a few time to avoid memory leak or even to be reused with malicious intentions.
FINAL NOTE: The only drawback of this approach is the risk that, some day, browsers could start blocking mixed passive content too by default.
For this reason I, in fact, opted by a double strategy approach. That is: additionally to the technique explained above, I also implemented an http redirector which does almost the same: It redirects all petitions to the root route ("/") to our https app. But it does so by a POST request containing a key which is previously associated to the client real IP.
This way, in case some day the first approach stops to work, users would be anyway able to access first through http. ...Which is in fact what we are going to do. But the first approach, while it continue working, could avoid problems if users decide to bookmark the page from within it (which will result in a bookmark to its https url).

Cache control header not working

I have set Cache control in my response header as Cache-Control:public, max-age=86400. But when I try to refresh page or open a new tab, it always hits my server. The response status I got is 200, server log is appeared for this request also I checked chrome://cache/ this request is not in the list. I already looked some similar SO questions cache-control not working without etag and why cache-control:max-age don't work?. But still with no luck. Tested on chrome 56.
Chrome disables cache when DevTools is open, or at least it does Chrome 59. Open DevTools, go to Network, uncheck "Disable cache" at the top. Now you should be able to refresh the page and see it in chrome://cache.
Cache control tells your browser (and proxy servers like Squid) what resources it cannot cache. But it does not force your browser to cache a resource.
I recommend to check the error_logs to see if you really go to the backend, or stay in the browser.
In my case, browser gives me 200OK in the console logs but I don't reach the back end according to the error_log ...
Cache-Control response header will not work for page refresh. Try making that request twice without refreshing the page, then you will see it being cached (the request won't reach your server internally).
To achieve what you want you might have to cache your request by accessing localStorage, or just cache it through a back-end caching library.

Varnish cache and Google Tag Manager

I have no experience with Varnish, so please bear with me.
We have inserted Google Tag Manager into a clients site. The Tag Manager injects Google Analytics tracking code (and nothing else) into the page. The clients technical service provider has now complained that the Tag Manager prevents the Varnish cache from working.
My guess is that this has nothing to do with the tag manager as such but is rather caused by the cookies from Google Analytics - apparently in the default configuration pages with cookies are not cached. However since I'm not very familiar with Varnish I cannot speak with any authority in the matter.
So my question is: is there any reason why Google Tag Manager itself (not any tags inside the tag manager) would invalidate a Varnish cache on each request ? A web search turned up nothing specific regarding Varnish and GTM.
Thank you for your time,
Eike
Google Tag Manager will not interfere with Varnish cache in any way. The reason being is that the requests for Google Tag Manager are sent to google-analytics.com, not your website.
The cookies are then set by google-analytics.com and are only sent between the clients browser and google-analytics.com.
This means that Google Tag Manager does not actually have any affect on your website apart from the initial Javascript being loaded from there.
In fact varnish does not validate any cookie that is created through javascript, only caches the "set-cookie header" of the http request.
The problem you may be having is, if the "DataLayer" is placed in the html code, the values of the variables do not change as they would be in cache.
To solve this problem, we must make another http call (ex. ajax) does not to cache, it returns the variables for DataLayer.

Varnish HitPass debugging

I've noticed an issue on one of my sites whereby my content pages (which shouldn't set any cookies, should all be returning "Cache-Control: public" with a max-age set, and don't require authorization).
My issue is that somehow HitPass objects are making it into my cache, removing the caching from that page. I need to debug this, but am confused at exactly how best to do this particularly as I'm unable to replicate the issue.
I notice that varnish gives me an ID beside the HitPass in the varnish log. I assume this is the varnish ID for the request that generated the HitPass, and that searching back in a varnish log would tell me exactly what was wrong with the response?
Would it be better to just remove the SetCookie header from pages that I want to cache? The problem is that vcl_fetch is called even if a URL is passed... Is there any way to tell in vcl_fetch whether or not the current request has been passed by vcl_recv?
SetCookie is indeed a reason why you get hit-for-pass objects in your cache. This is an important optimization for non-prepared sites. A hit-for-pass will let varnish go straight to the backend for each of these request instead of stall them and wait for the response of the previous one.
I'm not sure as to exactly what you are wanting to debug. If it's the set-cookie, you should probably either remove that from the backend or make your own rules on what ones to cache or what one's to ignore in your cache. If you still need the set-cookie and it has unique values, hit-for-pass is the way to do that best.

Varnish caching too much files and not caching php

I'm using Varnish without touching any configuration (just the PORT forwarding to Apache to 8080).
But I got two issues:
I visit a URL of an image, I delete the image, and I visit again and it exists … Varnish cached it … how can i tell varnish to look first if the file AT LEAST exists before serving it from his cache ?
The PHP files are not being cached (I mean, the HTML content generated by the PHP). I always see in the Headers: Age: 0 … any clue ?
Thank you !
I visit a URL of an image, I delete the image, and I visit again and
it exists … Varnish cached it … how can i tell varnish to look first
if the file AT LEAST exists before serving it from his cache ?
Eh, the whole purpose of caching is not having to do the same work (like checking for existence & loading a file, or generating a PHP response) over and over again, but to reuse the generated response. Varnish never new about the existence of some file to begin with (your backend server did the math) so it can never check if 'the file at least exists'.
There are however ways to instruct varnish not to cache urls forever. For instance; if your back-end response instructs any cache to not reuse the result (certain HTTP response headers indicate this), varnish will not cache it. Varnish will be smart enough (by default) to not cache responses with cookies too (which probably answers your second question). You can tell varnish to only cache a response for a certain period (like 30 seconds), so your deletes will be picked up pretty quickly. You could PURGE urls from varnish after you changed/delete a file. If your backend server does not tell this correctly with it's response headers, your can override this behavior by writing your own .vcl file.
The PHP files are not being cached (I mean, the HTML content generated
by the PHP). I always see in the Headers: Age: 0 … any clue ?
I can guess: you're setting cookies. But it would really help if you added the response headers to your question.

Resources