Cached content in Azure CDN from Microsoft expires too early - azure

We are using a Standard Microsoft Azure CDN to serve images for a web application. These images are requested as /api/img?param1=aaa&param2=bbb, so we cache every unique URL. The cache duration is 7 days. We also override the "Cache-Control" header so that the image is only cached for 1 hours by the client browser.
The problem is, the images do not stay in cache for 7 days. The first day after the images have been requested, they seem to be in CDN (I verify the X-Cache header and it returns "TCP_HIT"), however if I make the same requests 2-3 days later, around 25% of images are not cached anymore (the X-Cache header is "TCP_MISS"). The origin server receives and logs requests, so I am sure that they bypass CDN.
Is there any explanation for this? Do I have to set additional parameters for images to be cached correctly?
We use the following settings :
Caching rules "Cache every unique URL"
Rules Engine:
if URL path begins with /api/img
then Cache expiration: [cache behaviour] Override, [duration] 7 days
and then Modify response header: Overwrite, "Cache-Control", "public, max-age=3600"

From some folks on the CDN Product Group:
for all but the Verizon Premium SKU, the max-age and cache expiration are one and the same thing, so 2c overrides 2b.
The CDN reserves the right to flush entries from the CDN if they are not used - cache items are evicted using an LRU algorithm.
the Verizon Premium SKU offers the ability to have two different age values, one for browser-to-edge (the "External Max-Age") and one for edge-to-source (original expiration time, or forced override time - see the docs).

Related

Cache in Azure Front door service

I have setup azure front door service for three different geographies. The users are getting routed to the nearest data centre which is working as expected. Currently, I am setting up Caching under routing rules. I need to exclude some of the files that need not be cached. I do not see any configuration which allows exclude caching from certain files.
Below is the screenshot of the configuration setting.
https://imgur.com/biy9tjj
Since Azure front door matches the request to and then takes the defined action according to the particular routing rules. So if you need to exclude some of the files that do not be cached, you could try to create a separate routing rule with PATTERNS TO MATCH to set to the path of specific no-need-to cached files. Then set disabled caching in the ROUTE DETAILS in this separate routing rule.
Ref: How Front Door matches requests to a routing rule
While I think Nancy Xiong answer would work, I don't think this the correct approach.
Azure Front Door respects Cache-Control headers, therefore make sure your web server that is serving the files you don't want to cache returns a proper value. A decent starting point might be Cache-Control: no-cache, but check out the docs here for the details and options.
And talking about Azure Front Door - it claims that it respects these values (docs here):
Cache-Control response headers that indicate that the response won’t be cached such as Cache-Control: private, Cache-Control: no-cache, and Cache-Control: no-store are honored. However, if there are multiple requests in-flight at a POP for the same URL, they may share the response. If no Cache-Control is present the default behavior is that AFD will cache the resource for X amount of time where X is randomly picked between 1 to 3 days.

Azure Verizon CDN - 100% Cache CONFIG_NOCACHE

I set up an Azure Verizon Premium CDN a few days ago as follows:
Origin: An Azure web app (.NET MVC 5 website)
Settings: Custom Domain, no geo-filtering
Caching Rules: standard-cache (doesn't care about parameters)
Compression: Enabled
Optimized for: Dynamic site acceleration
Protocols: HTTP, HTTPS, custom domain HTTPS
Rules: Force HTTPS via Rules Engine (if request scheme = http, 301 redirect to https://{customdomain}/$1)
So - this CDN has been running for a few days now, but the ADN reports are saying that nearly 100% (99.36%) of the cache status is "CONFIG_NOCACHE" (Description: "The object was configured to never be cached in accordance with customer-specific configurations residing on the edge servers, so the response was served via the origin server.") A few (0.64%) of them are "NONE" (Description: "The cache was bypassed entirely for this request. For instance, the request was immediately rejected by the token auth module, or the client request method used an uncacheable request method such as "PUT".") Also, in the "Cache Hit" report, it says "0 hits, 0 misses" for every day. Nothing is coming through the "HTTP Large" side, only "ADN".
I couldn't find these exact messages while searching around, but I've tried:
Updating cache-control header to max-age, public (ie: cache-control: public,max-age=1209600)
Updating the cache-control header to max-age (cache-control: max-age=1209600)
Updating the expires header to a date way in the future (expires: Tue, 19 Jan 2038 03:14:07 GMT)
Using different browsers so the request cache info is different. In Chrome, the request is "cache-control: no-cache" in my browser. In Firefox, it'll say "Cache-Control: max-age=0". In any case, I'd assume the users on the website wouldn't have these same settings, right?
Refreshing the page a bunch of times, and looking at the real time report to see hits/misses/cache statuses, and it shows the same thing - CONFIG_NOCACHE for almost everything.
Tried running a "worldwide" speed test on https://www.dotcom-tools.com/website-speed-test.aspx, but that had the same result - a bunch of "NOCACHE" hits.
Tried adding ADN rules to set the internal and external max age to 864000 sec (10 days).
Tried adding an ADN rule to ignore "no-cache" requests and just return the cached result.
So, the message for "NOCACHE" says it's a node configuration issue... but I haven't really even configured it! I'm so confused. It could also be an application issue, but I feel like I've tried all the different permutations of "cache-control" that I can. Here's an example of one file that I'd expect to be cached:
Ultimately, I would hope that most of the requests are being cached, so I'd see most of the requests be "TCP Hit". Maybe that's incorrect? Thanks in advance for your help!
So, I eventually figured out this issue. Apparently Azure Verzion Premium CDN ADN platform has "bypass cache" enabled by default.
To disable this behavior you need to configure additional features to your caching rules.
Example:
IF Always
Features:
Bypass Cache Disabled
Force Internal Max-Age Response 200 864000 Seconds
Ignore Origin No-Cache 200

azure cdn purge not refreshing cached content

I have an Azure CDN (Verizon, premium) connected to blob storage. I have 2 rules in place based on step 6 in this tutorial. The rules are designed to force the CDN to serve "index.html" when the root of the CDN is called. They may or may not be relevant to the issue, but they are described in Step 6 as follows:
Make sure the dropdown says “IF” and “Always”
click on “+” button next to “Features” twice.
set the two newly-created dropdowns to “URL Rewrite”
set the all the sources and destination dropdowns to the endpoint you created (the value with the endpoint name)
for the first source pattern, set to ((?:[^\?]/)?)($|\?.)
for the first destination pattern, set to $1index.html$2
for the second source pattern, set to ((?:[^\?]/)?[^\?/.]+)($|\?.)
for the second destination pattern, set to $1/index.html$2
I initially uploaded files to blob storage, was able to hit them via the CDN (demonstrating the above rules worked correctly), and then made changes to the local files (debugging) for uploading to blob storage. After updating all files on blob storage and manually purging the CDN endpoint with the "purge all" option checked, I am served the old files by the CDN, and the new files when hitting blob storage directly. This seems to occur for every file (even when hitting the file directly, not just index.html). This still occurs after waiting ~10hrs, clearing browser cache, and trying browsers never before used to access the CDN.
Does anyone know what may be occurring? Is it cached somewhere between my network and the CDN endpoint? I feel like I'm probably missing something very simple...
edit 1: I have another Verizon (non-premium) CDN connected to the same storage container, and it picks up the correct files after a purge; however, even now (24 hrs later) the premium CDN is not serving the updated files.
edit 2: Called Microsoft support for Azure, and they spent about 6 hours investigating to no avail. We eventually tried purging again, and now the updated files are being sent. Still not sure what the issue was.
After working with Microsoft and Verizon Digital Media support for a number of weeks, they finally figured out the solution.
In order to avoid interfering with the purge process, the easiest method is to implement the following "IF" statement before the "Features" portion of your rule:
IF | Request Header Wildcard | Name | User-Agent | Does Not Match | Values | ECPurge/* | Ignore Case (checked)
This if statement will skip this rule altogether for requests made with the purge user agent, allowing the request to hit the CDN as normal.
I also found if you purge each file separately, for example, put /css/main.css in 'Content path' when purging instead of 'purge all' it will work
10/14/2019 - With this similar setup (Azure CDN/Blob-Static-Site/Verizon), I was having a cache issue too. I happened to have only 1 URL rewrite rule. What solved it for me was to set a subsequent rule regarding the caching/Max-Age.
Here's how I did that:
Azure Portal -> CDN profiles -> Click your profile (Overview section)
Click the manage icon in the Overview (in the details section/blade). This should
open the craptastic Verizon page.
Hover over "Http Large" and then click Rules Engine
Enter a Name/Description in the textbox. For example: "Force Cache Refresh"
Leave If/Always selected
Click the "+" button next to Features
In the new dropdown that appears choose "Force Internal Max-Age" (leave 200 as the response)
Enter a reasonable* value in the field for seconds. (300 for example)
Click the black Add button
It says it could take 4 hours for this rule to work, for me it was about 2.
Lastly, for this method, the order of the rules should not matter, but for above rewrite rules keep in mind the rules do allow you to move them up and down in priority. For me, i have, Https redirect first, Url rewrite (for "React-routing") 2nd. and lastly Force Cache Refresh last.
*note: This is a balance since purge is not working for me, you want the static (app/site) to update after you publish, but you don't want to have tons of needless traffic to refresh the cache either. For my dev server i settled for 5 minutes. I think production will be 1 hour... haven't decided.
is everything working now?
I see that your rules are probably more complicated than you need. If the goal is just to have root "/" rewrite to /index.html, this is the only rule you would need:
If always
URL Rewrite - Source: "/"; Destination "/index.html"

Varnish caching - age gets reset

I have a very simple site and am setting up varnish cache on it. The server is nginx.
The cache seems to get automatically purged after 120 seconds as when I go on the site i see the Age header being reset.
Can anyone point me towards where to remove this and have pages cached indefinitely or until i manually purge varnish?
You did not mention your OS or distribution, but for example on CentOS /etc/sysconfig/varnish sets the defaults for Varnish. Amongst those defaults is VARNISH_TTL=120, which sets the default TTL to 120 seconds.
If you only wish to set a high TTL for all objects, you can just edit the default one in /etc/sysconfig/varnish.
If the backend sends to the Varnish age headers, the Varnish will consider them as a real expiration date just like a web browser and will purge it's content when the header expires.
You should make sure that the backend doesn't send cache-control headers to the varnish and only the varnish will add cache-control headers when sending data to the browsers.

What is HTTP cache best practices for high-traffic static site?

We have a fairly high-traffic static site (i.e. no server code), with lots of images, scripts, css, hosted by IIS 7.0
We'd like to turn on some caching to reduce server load, and are considered setting the expiry of web content to be some time in the future. In IIS, we can do this on a global level via "Expire web content" section of the common http headers in the IIS response header module. Perhaps setting content to expire 7 days after serving.
All this actually does is sets the max-age HTTP response header, so far as I can tell, which makes sense, I guess.
Now, the confusion:
Firstly, all browsers I've checked (IE9, Chrome, FF4) seem to ignore this and still make conditional requests to the server to see if content has changed. So, I'm not entirely sure what the max-age response header will actually effect?! Could it be older browsers? Or web-caches?
It is possible that we may want to change an image in the site at short notice... I'm guessing that if the max-age is actually used by something that, by its very nature, it won't then check if this image has changed for 7 days... so that's not what we want either
I wonder if a best practice is to partition one's site into folders of content really won't change often and only turn on some long-term expiry for these folders? Perhaps to vary the querystring to force a refresh of content in these folders if needed (e.g. /assets/images/background.png?version=2) ?
Anyway, having looked through the (rather dry!) HTTP specification, and some of the tutorials, I still don't really have a feel for what's right in our situation.
Any real-world experience of a situation similar to ours would be most appreciated!
Browsers fetch the HTML first, then all the resources inside (css, javascript, images, etc).
If you make the HTML expire soon (e.g. 1 hour or 1 day) and then make the other resources expire after 1 year, you can have the best of both worlds.
When you need to update an image, or other resource, you just change the name of that file, and update the HTML to match.
The next time the user gets fresh HTML, the browser will see a new URL for that image, and get it fresh, while grabbing all the other resources from a cache.
Also, at the time of this writing (December 2015), Firefox limits the maximum number of concurrent connections to a server to six (6). This means if you have 30 or more resources that are all hosted on the same website, only 6 are being downloaded at any time until the page is loaded. You can speed this up a bit by using a content delivery network (CDN) so that everything downloads at once.

Resources