Varnish caching too much files and not caching php - varnish

I'm using Varnish without touching any configuration (just the PORT forwarding to Apache to 8080).
But I got two issues:
I visit a URL of an image, I delete the image, and I visit again and it exists … Varnish cached it … how can i tell varnish to look first if the file AT LEAST exists before serving it from his cache ?
The PHP files are not being cached (I mean, the HTML content generated by the PHP). I always see in the Headers: Age: 0 … any clue ?
Thank you !

I visit a URL of an image, I delete the image, and I visit again and
it exists … Varnish cached it … how can i tell varnish to look first
if the file AT LEAST exists before serving it from his cache ?
Eh, the whole purpose of caching is not having to do the same work (like checking for existence & loading a file, or generating a PHP response) over and over again, but to reuse the generated response. Varnish never new about the existence of some file to begin with (your backend server did the math) so it can never check if 'the file at least exists'.
There are however ways to instruct varnish not to cache urls forever. For instance; if your back-end response instructs any cache to not reuse the result (certain HTTP response headers indicate this), varnish will not cache it. Varnish will be smart enough (by default) to not cache responses with cookies too (which probably answers your second question). You can tell varnish to only cache a response for a certain period (like 30 seconds), so your deletes will be picked up pretty quickly. You could PURGE urls from varnish after you changed/delete a file. If your backend server does not tell this correctly with it's response headers, your can override this behavior by writing your own .vcl file.
The PHP files are not being cached (I mean, the HTML content generated
by the PHP). I always see in the Headers: Age: 0 … any clue ?
I can guess: you're setting cookies. But it would really help if you added the response headers to your question.

Related

Express JS - Static files not being served from cache

I'm running an application on Express and my browser keeps fetching files that should've already been cached. The status code for the offending files is 304 and size is consistently 220 B / 221 B. Other resources (that are getting served properly), are showing '(from cache)'.
A bit more information: the ETags / file contents haven't changed and I've set some response headers.
res.set('Cache-Control', 'max-age=345600');
res.set('Expires', new Date(Date.now() + 345600000).toUTCString());
(source: imageno.com)
Admittedly, I'm no HTTP expert, but maybe someone can help me understand why this might be happening?
Essentially, the browser IS caching and serving the cached bundles (although it doesn’t display the “from cache” message). In order to serve them, it sends a request to the server and checks if the file has changed. If it hasn’t changed, the server sends a 304 response code and the browser pulls the file from cache. This takes about 15-50 ms, so it's not a substantial performance impact.
However, I CAN force the browser to show the file without sending a verification req (like externally hosted libraries for example). That would require setting expires/cache-control headers for the far future, time-stamping the filenames for static assets and serving them dynamically (by maybe writing the updated filenames to a configuration file or something like that), but I think this would be more trouble than it’s worth honestly.
Just posting this response for anyone who runs into the same issue.

Varnish HitPass debugging

I've noticed an issue on one of my sites whereby my content pages (which shouldn't set any cookies, should all be returning "Cache-Control: public" with a max-age set, and don't require authorization).
My issue is that somehow HitPass objects are making it into my cache, removing the caching from that page. I need to debug this, but am confused at exactly how best to do this particularly as I'm unable to replicate the issue.
I notice that varnish gives me an ID beside the HitPass in the varnish log. I assume this is the varnish ID for the request that generated the HitPass, and that searching back in a varnish log would tell me exactly what was wrong with the response?
Would it be better to just remove the SetCookie header from pages that I want to cache? The problem is that vcl_fetch is called even if a URL is passed... Is there any way to tell in vcl_fetch whether or not the current request has been passed by vcl_recv?
SetCookie is indeed a reason why you get hit-for-pass objects in your cache. This is an important optimization for non-prepared sites. A hit-for-pass will let varnish go straight to the backend for each of these request instead of stall them and wait for the response of the previous one.
I'm not sure as to exactly what you are wanting to debug. If it's the set-cookie, you should probably either remove that from the backend or make your own rules on what ones to cache or what one's to ignore in your cache. If you still need the set-cookie and it has unique values, hit-for-pass is the way to do that best.

Varnish Cache first time hit

I'm running varnish on a dedicated server. When i load a page, it is delivered via Apache and on the second and subsequent hits it is then delivered via Varnish Cache (i.e. I can see two timestamps in X-Varnish headers).
But when i open up the same page from some other computer, it's again delivered from the backend (apache) for the first time and on further reloads it comes from Varnish.
If a page is already in Varnish Cache, isn't it supposed to be delivered via Varnish even on a new computer for the first time? I've tried simple hello world php files without any database calls with the same effect. Might it be something wrong with my vcl file or Varnish works this way only?
check whether you sending session data (cookies) which then look like unique calls to varnish. the docs show you how to strip cookies.
Jon is right. I had similar problem. You also need to clean up your cookie and cache before test. Check if the first visit response header, it tries to set cookie. If so, you can do "unset beresp.http.Set-Cookie under vcl_fetch.

What is HTTP cache best practices for high-traffic static site?

We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.

Does the browser always request a cached file?

Does browser always request a cached file on each request? (e.g., a CSS style sheet or .js javascript file that has been sent previously)
I'm not sure but i think the answer is "no, it does not".
But then why does the Apache log show that the cached file was requested again?
What is the default behavior ?
It really depends on how the page is coded, for example, one can force a web browser to request a script from the web server rather than using its cached copy. So, in short, its not "always" that browser requests scripts from the cache but, "most of the time" it uses cached copy.

Resources