cache pictures from remote server with varnish - varnish

I'm creating simple page where will a lot of pictures. All pictures are hosted on remote provider (hosted on object storage and I have only links to all pictures) To speed up www I would like to use varnish to cache this pictures but I have problem:
All pictures are served with https, so I've used haproxy to terminate ssl and next traffic go to varnish, but how to map in varnish website address that should be visible for end user like https://www.website.com/picture.jpg with remote address where is picture hosted(https://www.remotehostedpictures.com/picture.jpg) . So, in final result user must see first link, remote address remotehostedpictures.com/picture.jpg can't be visible.

In your varnish vcl_recv you should change your request host header, then you must declare remotehostedpictures.com as your backend.
In the end, you should have something like this (code not tested)
sub vcl_recv {
if (req.url ~ "\.jpg") {
set req.http.host = "www.remotehostedpictures.com";
set req.backend_hint = remote_host;
}
}
backend remote_host {
.host = "www.remotehostedpictures.com";
.port = "80";
}
By the way, beware of dns in backend.host. If the dns resolved to multiple IPs varnish will use the first one. The dns resolving is done at vcl compile time so if the dns change you should reload your vcl.

I think that storing images in Varnish is not god idea, because than Varnish will fill whole ram quickly (if You have lot of images), than when Varnish is full it purges whole cache, imagine what is happening on the server when whole cache is purged and You have traffic on Your page.
Some time ago I make such cache in Varnish and after few hours live I have to disable caching images because of purging (for me most important was caching page content).
In such situations best solution is CDN. You can use external service such as Cloudflare, or make simple CDN on Nginx (which will only serve static files with expire header).
Hope it helps :)

Related

How to disable proxy for CloudFlare?

I changed my domain's nameserver to CloudFalre yesterday. But I'm confuse about Caching system of CloudFlare.
My server is Node.js based server and even client request same url, sometimes server response with different contents.
As I found on internet, CloudFlare uses cache for fast browsing. It saves the content of server in their cache server and when client request same resource to server, CloudFlare returns resource to client without connecting with original server.
So if server's resource become different, CloudFlare will return old resource, is it right?
I setted A record for some domain and it become a "Proxying Mode". I cannot change it to gray connection.
Should I pay for do it? or is there a way to change this?
Thank you.
Cloudflare should not cache text/html, application/json types of responses. Static javascript files and images may be cached. Here is an article about what will be cached by default.
There should be no reason that you cannot change the "orange cloud" proxied mode to "grey cloud" DNS by clicking the icon.

Angular ssr varnish usage

It working but not affect performance. Do I use it right?
/etc/varnish/default.vcl
backend default {
.host = "127.0.0.1";
.port = "4000"; }
i was add vanish port instead 4000 in nginx config
location / {
proxy_pass http://localhost:6081;
}
My Angular application (google pagespeed) desktop performance is 99% but the mobile performance is 40-60%.
Varnish's out-of-the-box behavior respects HTTP caching best practices.
This means:
Only cache HTTP GET & HTTP HEAD calls
Don't serve responses from cache when the request contains cookie headers
Don't serve responses from cache when the request contains authorization headers
Don't store responses in cache when set-cookie headers are present
Don't store responses in cache when the cache-control header is a zero TTL or when it contains the following: no-cache, or no-store, or private
Under all circumstances Varnish will try to serve from cache or store in cache.
This is that behavior written in VCL: https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl
Adapting to the real world
Although these caching best practices make sense, they are not realistic when you look at the real world. In the real world we use cookies all the time.
That's why you'll probably have to write some VCL code to change the behavior of the cache. In order to do so, you have to be quite familiar with the HTTP endpoints of your app, but also the parts where cookies are used.
Parts of your app where cookie values are used on the server-side will have to be excluded from caching
Parts of your app where cookie values aren't used will be stored in cache
Tracking cookies that are only used at the client side will have to be stripped
How to examine what's going on
The varnishlog binary will help you understand the kind of traffic that is going through Varnish and how Varnish behaves with that traffic.
I've written an in-depth blog post about this, please have a look: https://feryn.eu/blog/varnishlog-measure-varnish-cache-performance/
Writing VCL
Once you've figured out what is causing the drop in performance, you can write VCL to mitigate. Please have a look at the docs site to learn about VCL: https://varnish-cache.org/docs/6.0/index.html
The is reference material in there, a user guide and even a tutorial.
Good luck

How can I return a 500 response for all requests to a specific file at the Varnish level?

Background:
Our network structure brings all traffic into a Varnish installation, which then ports traffic to one of 5 different web servers, based on rules that a previous administrator setup. I don't have much experience with Varnish.
Last night we were being bombarded by requests to a specific file. This file is one that we limit to a specific set of servers, and it has direct link to our master database, due to reasons. Obviously, this wasn't optimal, and our site was hit pretty hard because of it. What I attempted to do, and failed, was to write a block of code in the Varnish VCL that would return a 500 response for every request to that file, which I could then comment out after the attack period ended.
Question:
What would that syntax be? I've done my googling, but at this point I think it's the fact that I don't know enough about Varnish to be able to word my search properly, so I'm not finding the information that I need.
You can define your own vcl_recv, prior to any other vcl_recv in your configuration, reload Varnish, and you should get the behaviour you're looking for.
sub vcl_recv {
if (req.url ~ "^/path/to/file(\?.*)?$") {
return (synth(500, "Internal Server Error"));
}
}

Varnish keeps caching my tracking software

I have a Varnish setup for one of my sites. I'm using the open source software Piwik for my stats tracking.
Piwik have an option of having a Proxy for tracking, which means that the URL of Piwik won't be revealed in my source code. Basically it's a PHP file that sits on my wordpress install and it sends CURL posts to my Piwik install...
Now, I set up my Varnish using:
https://github.com/mattiasgeniar/varnish-3.0-configuration-templates
In vcl_fetch I added:
if (req.url ~ "piwik") {
set beresp.ttl = 120s;
return (hit_for_pass);
}
In vcl_recv I added:
if (req.url ~ "piwik") {
return (pass);
}
What happens is, I see only 50% of the traffic I actually have on the website...
I'm afraid it's because of my vcl_fetch settings...
I read the differences between pass and hit_for_pass and from what I understand beresp.ttl is a config that I guides varnish to keep doing pass for 120s
One more thing, W3TotalCache on WP adds some caching headers like Max-Age & expires to my piwik.php file. Without Varnish it's still working well and tracking correctly. Is it possible that there is some sort of collision between Varnish and those headers?
Do I get it right?
Why do you think 50% of my tracking is missed?
Thanks.
The Varnish configuration for pass-ing in vcl_recv is correct.
The code you have in vcl_fetch can be removed, it doesn't make any difference at that point because of the code in recv.
Remember that any VCL code that filters response headers in vcl_fetch is also run for pass-ed responses. I'd guess that you are filtering the Set-Cookie that piwik sends.

Slow down rogue web srappers on my website and still use Varnish

Imagine there are scrappers crawling my website.
How can I ban them and still white list Google Bots ?
I think I can find the ip range of Google bots, and I am thinking of using Redis to store all the access of the day and if in a short time I see too many requests from the same IP -> ban.
My stack is ubuntu server, nodejs, expressjs.
The main problem I see is that this detection is behind Varnish. So Varnish cache has to be disabled. Any better idea, or good thoughts ?
You can use an Varnish ACL [1], it will be possibly a bit harder to maintain that in apache, but surely will work:
acl bad_boys {
"666.666.666.0"/24; // Your evil range
"696.696.696.696"; //Another evil IP
}
// ...
sub vcl_recv {
if (client.ip ~ bad_boys) {
error 403 "Forbidden";
}
// ...
}
// ...
You can also white-listing, use user agent or other techniques to ensure that it isn't GoogleBot... but I would defend myself in Varnish rather than in Apache.
[1] https://www.varnish-cache.org/docs/3.0/reference/vcl.html#acls
You could stop the crawler using the robots.txt
User-agent: BadCrawler
Disallow: /
This solution works if the crawler follow the robots.txt specifications

Resources