Where to put the purge.php file in varnish server? - varnish

I am using two application load balancers that are routing requests to 4 backend varnish servers. I got answers to configure the PHP file to purge the cache but I have no idea where to put it and how to execute.

For which type of PHP application are you trying to configure cache purging?
A custom application?
WordPress?
Drupal?
Magento?
Some other CMS or framework?
If you're using an existing platform, CMS, or framework, the documentation will probably state how to configure purging.
Varnish Configuration
Of course, the Varnish VCL code should also be tuned to process purges.
You can find more information about purging (and banning) in Varnish on http://varnish-cache.org/docs/6.0/users-guide/purging.html
Here are the questions you should ask yourself regarding purging. Maybe the documentation of your CMS or framework will answer these as well?
Are you trying to purge individual URLs?
Does your code have pattern matching in play to invalidate multiple URLs at once (uses bans in VCL)
If pattern matching is used, are you sending the invalidation pattern via an HTTP request header?
Does your invalidation code use the URL to identify objects in cache, or does it rely on tagged content?
Are you restricting access to the purging mechanism based on IP address or subnet. If so, please configure an ACL in VCL.
Many WordPress plugins rely on individual URL purging. Other WordPress plugins use bans through request header patterns.
Drupal uses bans, but has a system in place that tags content. The ban patterns don't match URLs, but tags.
Magento uses bans.
Conclusion
If you use a CMS or framework, the purging strategy is set in advance. It's just a matter of configuring your app and making sure the VCL can handle it.
If you have custom code, you have a choice, and you can implement purging or banning.
Please have a look at the user guide section about purging I mentioned above. It should help you understand the underlying mechanism.

Related

Honey tokens using AWS API Gateway and Lambda functions

I want to setup fake URLs or honeytoken to trick an attacker to follow those links, and have a script to auto block the attacker IPs using AWS WAF.
Security is a big thing these days, and our web infrastructure has already been a target of massive bruteforce and DDOS attempts. I want to setup tracks so attacker who are using directory traversing attacks can be found. e.g A common directory listing attacks traverse URLs like ../admin, ../wp-admin etc while scanning a target website. I want to setup a mechanism to get alerted when any of these non-existent URLs get browsed.
Question:
1. Is it possible to redirect part of web-traffic e.g www.abc.com/admin to API gateway and remaining www.abc.com to my existent servers?
2. How will I setup DNS entries for such, if it is possible?
3. Is there a different/easy to achieve this.
Any suggestion is welcome, as I am open to Ideas. Thanks
You can first setup a Cloudfront distribution with WAF ip blacklisting. Then setup a honey pot using API Gateway and Lambda which becomes a origin for urls like /admin, /wp-admin.
Following example with bad bot blocking using honeypots can provide you with a head start.

Http cache for symfony

I want to follow Google's directive in terms of cache headers for images, scripts and styles.
After reading symfony's documentation about http cache, I decided to install FOSHttpCacheBundle. The I set up rules for path like ^/Resources/ or ^/css/. I then fail to see it the proper headers for my images using Chrome's console.
Alternatively, I have read that, since my server is handling the resource, this is not Symfony that deals with this matter (yet I read in the doc that Symfony Proxy was good for shared-hosting servers, which is what I have).
So should I just add lines to my .htaccess as explained in here, or am I simply misusing FOSHttpCacheBundle? (Or both.)
Static files (including javascript files, CSS stylesheets, images, fonts...) are served directly by the web server. As the PHP module is not even loaded for such files, you must configure the server to set proper HTTP headers. You can do it using a .htaccess file if you use Apache but doing it directly in httpd.conf/apache2.conf/vhost conf (depending of your configuration) will be better from a performance point of view.
If you also want want to set HTTP cache headers for dynamic content (HTML generated by Symfony...), then you must use FosHttpCache or any other method provided by Symfonny such as the #Cache annotation.

Can Varnish Read a List of Backend Hosts from a Text File

Is there any way for varnish to read a list of backend urls from a text file, and then proxy cache misses to a random url taken from the text file?
What I imagine is something like this pseudocode...
/var/services/backend-urls.conf
http://backend-host-1/path/to/application
http://backend-host-2/path/to/application
http://backend-host-3/path/to/application
# etc
varnish config
sub vcl_miss {
// read a list of urls from a text file
backendHosts = readFile("/var/services/backend-urls.conf");
//choose a random url from the file
randomHost = chooseLineAtRandom(backendHosts);
//proxy the request to the random host
set req.backend = randomHost;
}
To provide some background, I work on a server system that comprises a number of backend applications that currently sit behind a front-end running apache. We are evaluating replacing the apache layer with varnish so we can benefit from the caching capabilities of varnish. We also have a service discovery framework that knows the endpoint locations for each backend application (the endpoint urls change periodically as new hosts emerge or are taken out of service).
Currently we use the RewriteMap functionality in mod_rewrite to route requests to the backend services. Then we have a process to maintain the lists of backend services based upon the contents of the service discovery framework.
All this works well for us in apache, except that apache is like using a sledgehammer to crack a nut. All we really want is the reverse proxy loigc, and the caching in varnish would be helpful too.
Is there any way to have varnish read the list of backend urls from an external resource?
Without resorting to custom vmod/c modules, the quick answer is no.
The VCL instruction are being compiled within varnish, and that rules out run-time inclusions.
But why not include within the VCL a separate backend vcl which includes the current backends.
that vcl file could be written out on demand. Then using varnishadm CLI command you could request a new compile of the VCL, therefore bringing the config live.
I can see two potential solutions.
The first is to have something generate your VCL and backends such as Chef or some custom scripting. You can then process the text file into backend definitions and the necessary VCL to invoke them. To handle the requirement for the random backend you could use a director. I've not dealt with directors myself but it looks like they are meant to solve that requirement. When changes to the backends occur you could rerun the generation script/Chef and tell Varnish to reload its configuration either using varnishadm or service varnish reload to avoid a full restart.
The second would be to implement it in C, either via a VMOD as Marcel Dumont suggests or possibly using inline C in your VCL.
With vmod_dynamic you can just use any DNS name as a backend or even service records.
For your use case, one option would be to set up an SRV record in DNS pointing to all your servers and then just use that as for example in the basic-stub.vtc test case.

Varnish cache and Google Tag Manager

I have no experience with Varnish, so please bear with me.
We have inserted Google Tag Manager into a clients site. The Tag Manager injects Google Analytics tracking code (and nothing else) into the page. The clients technical service provider has now complained that the Tag Manager prevents the Varnish cache from working.
My guess is that this has nothing to do with the tag manager as such but is rather caused by the cookies from Google Analytics - apparently in the default configuration pages with cookies are not cached. However since I'm not very familiar with Varnish I cannot speak with any authority in the matter.
So my question is: is there any reason why Google Tag Manager itself (not any tags inside the tag manager) would invalidate a Varnish cache on each request ? A web search turned up nothing specific regarding Varnish and GTM.
Thank you for your time,
Eike
Google Tag Manager will not interfere with Varnish cache in any way. The reason being is that the requests for Google Tag Manager are sent to google-analytics.com, not your website.
The cookies are then set by google-analytics.com and are only sent between the clients browser and google-analytics.com.
This means that Google Tag Manager does not actually have any affect on your website apart from the initial Javascript being loaded from there.
In fact varnish does not validate any cookie that is created through javascript, only caches the "set-cookie header" of the http request.
The problem you may be having is, if the "DataLayer" is placed in the html code, the values of the variables do not change as they would be in cache.
To solve this problem, we must make another http call (ex. ajax) does not to cache, it returns the variables for DataLayer.

How can I prevent Amazon Cloudfront from hotlinking?

I use Amazon Cloudfront to host all my site's images and videos, to serve them faster to my users which are pretty scattered across the globe. I also apply pretty aggressive forward caching to the elements hosted on Cloudfront, setting Cache-Controlto public, max-age=7776000.
I've recently discovered to my annoyance that third party sites are hotlinking to my Cloudfront server to display images on their own pages, without authorization.
I've configured .htaccessto prevent hotlinking on my own server, but haven't found a way of doing this on Cloudfront, which doesn't seem to support the feature natively. And, annoyingly, Amazon's Bucket Policies, which could be used to prevent hotlinking, have effect only on S3, they have no effect on CloudFront distributions [link]. If you want to take advantage of the policies you have to serve your content from S3 directly.
Scouring my server logs for hotlinkers and manually changing the file names isn't really a realistic option, although I've been doing this to end the most blatant offenses.
You can forward the Referer header to your origin
Go to CloudFront settings
Edit Distributions settings for a distribution
Go to the Behaviors tab and edit or create a behavior
Set Forward Headers to Whitelist
Add Referer as a whitelisted header
Save the settings in the bottom right corner
Make sure to handle the Referer header on your origin as well.
We had numerous hotlinking issues. In the end we created css sprites for many of our images. Either adding white space to the bottom/sides or combining images together.
We displayed them correctly on our pages using CSS, but any hotlinks would show the images incorrectly unless they copied the CSS/HTML as well.
We've found that they don't bother (or don't know how).
The official approach is to use signed urls for your media. For each media piece that you want to distribute, you can generate a specially crafted url that works in a given constraint of time and source IPs.
One approach for static pages, is to generate temporary urls for the medias included in that page, that are valid for 2x the duration as the page's caching time. Let's say your page's caching time is 1 day. Every 2 days, the links would be invalidated, which obligates the hotlinkers to update their urls. It's not foolproof, as they can build tools to get the new urls automatically but it should prevent most people.
If your page is dynamic, you don't need to worry to trash your page's cache so you can simply generate urls that are only working for the requester's IP.
As of Oct. 2015, you can use AWS WAF to restrict access to Cloudfront files. Here's an article from AWS that announces WAF and explains what you can do with it. Here's an article that helped me setup my first ACL to restrict access based on the referrer.
Basically, I created a new ACL with a default action of DENY. I added a rule that checks the end of the referer header string for my domain name (lowercase). If it passes that rule, it ALLOWS access.
After assigning my ACL to my Cloudfront distribution, I tried to load one of my data files directly in Chrome and I got this error:
As far as I know, there is currently no solution, but I have a few possibly relevant, possibly irrelevant suggestions...
First: Numerous people have asked this on the Cloudfront support forums. See here and here, for example.
Clearly AWS benefits from hotlinking: the more hits, the more they charge us for! I think we (Cloudfront users) need to start some sort of heavily orchestrated campaign to get them to offer referer checking as a feature.
Another temporary solution I've thought of is changing the CNAME I use to send traffic to cloudfront/s3. So let's say you currently send all your images to:
cdn.blahblahblah.com (which redirects to some cloudfront/s3 bucket)
You could change it to cdn2.blahblahblah.com and delete the DNS entry for cdn.blahblahblah.com
As a DNS change, that would knock out all the people currently hotlinking before their traffic got anywhere near your server: the DNS entry would simply fail to look up. You'd have to keep changing the cdn CNAME to make this effective (say once a month?), but it would work.
It's actually a bigger problem than it seems because it means people can scrape entire copies of your website's pages (including the images) much more easily - so it's not just the images you lose and not just that you're paying to serve those images. Search engines sometimes conclude your pages are the copies and the copies are the originals... and bang goes your traffic.
I am thinking of abandoning Cloudfront in favor of a strategically positioned, super-fast dedicated server (serving all content to the entire world from one place) to give me much more control over such things.
Anyway, I hope someone else has a better answer!
This question mentioned image and video files.
Referer checking cannot be used to protect multimedia resources from hotlinking because some mobile browsers do not send referer header when requesting for an audio or video file played using HTML5.
I am sure of that about Safari and Chrome on iPhone and Safari on Android.
Too bad! Thank you, Apple and Google.
How about using Signed cookies ? Create signed cookie using custom policy which also supports various kind of restrictions you want to set and also it is wildcard.

Resources