Varnish HitPass debugging - varnish

I've noticed an issue on one of my sites whereby my content pages (which shouldn't set any cookies, should all be returning "Cache-Control: public" with a max-age set, and don't require authorization).
My issue is that somehow HitPass objects are making it into my cache, removing the caching from that page. I need to debug this, but am confused at exactly how best to do this particularly as I'm unable to replicate the issue.
I notice that varnish gives me an ID beside the HitPass in the varnish log. I assume this is the varnish ID for the request that generated the HitPass, and that searching back in a varnish log would tell me exactly what was wrong with the response?
Would it be better to just remove the SetCookie header from pages that I want to cache? The problem is that vcl_fetch is called even if a URL is passed... Is there any way to tell in vcl_fetch whether or not the current request has been passed by vcl_recv?

SetCookie is indeed a reason why you get hit-for-pass objects in your cache. This is an important optimization for non-prepared sites. A hit-for-pass will let varnish go straight to the backend for each of these request instead of stall them and wait for the response of the previous one.
I'm not sure as to exactly what you are wanting to debug. If it's the set-cookie, you should probably either remove that from the backend or make your own rules on what ones to cache or what one's to ignore in your cache. If you still need the set-cookie and it has unique values, hit-for-pass is the way to do that best.

Related

Amazon Cloudfront removes Referer header

I am using Amazon CloudFront to deliver some HDS files. I have an origin server which check the HTTP HEADER REFERER and in case is no allowed it block it.
The problem is that cloud front is removing the referer header, so it is not forwarded to the origin.
Is it possible to tell Amazon not to do it?
Within days of writing the answer below, changes have been announced to Cloudfront. Cloudfront will now pass through headers you select and can add some headers of its own.
However, much of what I stated below remains true. Note that in the announcement, an option is offered to forward all headers which, as I suggested, would effectively disable caching. There's also an option to forward specific headers, which will cause Cloudfront to cache the object against the complete set of forwarded headers -- not just the uri -- meaning that the effectiveness of the cache is somewhat reduced, since Cloudfront has no option but to assume that the inclusion of the header might modify the response the server will generate for that request.
Each of your CloudFront distributions now contains a list of headers that are to be forwarded to the origin server. You have three options:
None - This option requests the original behavior.
All - This option forwards all headers and effectively disables all caching at the edge.
Whitelist - This option give you full control of the headers that are to be forwarded. The list starts out empty, and grows as you add more headers. You can add common HTTP headers by choosing them from a list. You can also add "custom" headers by simply entering the name.
If you choose the Whitelist option, each header that you add to the list becomes part of the cache key for the URLs associated with the distribution. Adding a header to the list simply tells CloudFront that the value of the header can affect the content returned by the origin server.
http://aws.amazon.com/blogs/aws/enhanced-cloudfront-customization/
Cloudfront does remove the Referer header along with several others that are not particularly meaningful -- or whose presence would cause illogical consequences -- in the world of cached content.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html
Just like cookies, if the Referer: header were allowed to remain, such that the origin could see it and react to it, that would imply that the object should be cached based on the request plus the referring page, which would seem to largely defeat the cachability of objects. Otherwise, if the origin did react to an undesired referer and send no-cache responses, that would be all well and good until the first legitimate request came in, the response to which would be served to subsequent requesters regardless of their referer, also largely defeating the purpose.
RFC-2616 Section 13 requires that a cache return a response that has been "checked for equivalence with what the origin server would have returned," and this implies that the response be valid based on all headers in the request.
The same thing goes for User-agent and other headers an origin server might use to modify its response... if you need to react to these values at the origin, there's little obvious purpose for serving them with a CDN.
Referring page-based tests are quite a primitive measure, the way many people use them, since headers are so trivial to forge.
If you are dealing with a platform that you don't control, and this is something you need to override (with a dummy value, just to keep the existing system "happy,") then a reverse proxy in front of the origin server could serve such a purpose, with Cloudfront using the reverse proxy as its origin.
In today's newsletter amazon announced that it is now possible to forward request headers with cloudfront. See: http://aws.amazon.com/de/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/

How to prevent IIS from sending cache headers with ASHX files

My company uses ASHX files to serve some dynamic images. Being it that the content type is image/jpeg, IIS sends headers with them as would be appropriate for static images.
Depending on settings (I don't know all of the settings involved, hence the question) the headers may be any of:
LastModified, ETag, Expires
Causing the browser to treat them as cacheable, which leads to all sorts of bugs with the user seeing stale images.
Is there a setting that I can set somewhere that will cause ASHX files to behave the same way as other dynamic pages, like ASPX files? Short of that, is there a setting that will allow me to, across the board, remove LastModified, Etag, Expires, etc and add a no-cache header instead?
Only solutions I've found were:
1) Adding Response.ContentControl = "no-cache" to each handler.
I don't like this because this requires all of the handlers to change and for all developers to be aware of it.
2) Setting HTTP Header override on a folder where the handlers live
I don't like this one because it requires the handlers to be in their own directory. While this may be good practice in general, unfortunately our application is not structured that way, and I cannot just move them because it would break client-facing links.
If nobody provides a better answer I'll have to accept that these are the only two choices.
Add a random generated string to the request query. This will trick the browser into thinking it is a different call. Example: document.getElementById("myimgcontl").src="myimages.ashx?15923763";.

What is the best way to debug a VCL file?

I am writing inline C in my VCL file. More specifically I am using Maxmind's GeoIP database to geocode a visitor's IP. I have everything installed, I have followed all the wiki examples for GeoIP database and everything works swimmingly.
I am trying to now do some magic with GeoIP besides the return country examples. I want to return the visitor's city using the method GeoIP_record_by_addr(), which returns a pointer.
Problem: I cannot seem to correctly cast a GeoIPRecord* to char*. I have tried for hours. I get Varnish to compile my VCL file without any errors or notices, but the varnish server responds with 403.
Question: Anyway I can debug either the inline C or the 403 varnish is responding with?
Generally, Firebug and varnishlog will be your best friends.
If you want to debug pure VCL, the best way is to send data into HTTP headers ([req/bereq/beresp/resp].http.[header name]) and check their value into Firebug (or varnishlog if you have few requests).
If you want to debug inline C, you can also play with headers (VRT_SetHdr()) but if your C code makes varnish crash, you'll see why into /var/log/messages.
You can also check varnishlog to see if varnish crashes...but when varnish crashes, you get timeouts, not 403...
I'd have to see your VCL to understand why you get 403 but technically, it's not an "error", but a "status", meaning that your request has been processed by varnish (and, unfortunately, forbidden somewhrere).
I don't think Varnish would return 403 except if you ask him to do it. So there's a big chance the 403 status comes from your web server (backend).
Anyway, your varnish doesn't seem to crash but rather have behavior issues.

Varnish Cache first time hit

I'm running varnish on a dedicated server. When i load a page, it is delivered via Apache and on the second and subsequent hits it is then delivered via Varnish Cache (i.e. I can see two timestamps in X-Varnish headers).
But when i open up the same page from some other computer, it's again delivered from the backend (apache) for the first time and on further reloads it comes from Varnish.
If a page is already in Varnish Cache, isn't it supposed to be delivered via Varnish even on a new computer for the first time? I've tried simple hello world php files without any database calls with the same effect. Might it be something wrong with my vcl file or Varnish works this way only?
check whether you sending session data (cookies) which then look like unique calls to varnish. the docs show you how to strip cookies.
Jon is right. I had similar problem. You also need to clean up your cookie and cache before test. Check if the first visit response header, it tries to set cookie. If so, you can do "unset beresp.http.Set-Cookie under vcl_fetch.

Varnish caching too much files and not caching php

I'm using Varnish without touching any configuration (just the PORT forwarding to Apache to 8080).
But I got two issues:
I visit a URL of an image, I delete the image, and I visit again and it exists … Varnish cached it … how can i tell varnish to look first if the file AT LEAST exists before serving it from his cache ?
The PHP files are not being cached (I mean, the HTML content generated by the PHP). I always see in the Headers: Age: 0 … any clue ?
Thank you !
I visit a URL of an image, I delete the image, and I visit again and
it exists … Varnish cached it … how can i tell varnish to look first
if the file AT LEAST exists before serving it from his cache ?
Eh, the whole purpose of caching is not having to do the same work (like checking for existence & loading a file, or generating a PHP response) over and over again, but to reuse the generated response. Varnish never new about the existence of some file to begin with (your backend server did the math) so it can never check if 'the file at least exists'.
There are however ways to instruct varnish not to cache urls forever. For instance; if your back-end response instructs any cache to not reuse the result (certain HTTP response headers indicate this), varnish will not cache it. Varnish will be smart enough (by default) to not cache responses with cookies too (which probably answers your second question). You can tell varnish to only cache a response for a certain period (like 30 seconds), so your deletes will be picked up pretty quickly. You could PURGE urls from varnish after you changed/delete a file. If your backend server does not tell this correctly with it's response headers, your can override this behavior by writing your own .vcl file.
The PHP files are not being cached (I mean, the HTML content generated
by the PHP). I always see in the Headers: Age: 0 … any clue ?
I can guess: you're setting cookies. But it would really help if you added the response headers to your question.

Resources