Implement full HTML page caching on CDN - kentico

We are trying to implement a full page html caching using CDN on our Kentico portal engine site. To be able to do this we need to set the cache-control of the documents and not only assets to "public". I've tried adding the code below in my global.asax begin request event to test it but for some reason the document response header cache-control is always set to no-cache. Did Kentico intentionally set it? I would think yes because they have its own caching mechanism built-in but if we want to use CDN we need to set the cache to public. Is there a way to override this?
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetMaxAge(new TimeSpan(1, 0, 0));
I also tried modifying the PortalTemplate.aspx.cs to add cache-control meta tag but it also did not work.
tags.Text += "<meta http-equiv=\"cache-control\" content=\"public\" />";
The response header is always
cache-control:no-cache, must-revalidate
content-encoding:deflate
content-type:text/html; charset=utf-8
date:Fri, 02 Mar 2018 18:38:03 GMT
expires:-1
pragma:no-cache
server:Microsoft-IIS/10.0
status:200
vary:Accept-Encoding
x-aspnet-version:4.0.30319
x-frame-options:SAMEORIGIN
x-powered-by:ASP.NET

I was able to override it in PreSendRequestHeaders event in global.asax.
protected void Application_PreSendRequestHeaders(Object source, EventArgs e)
{
//removed some code for brevity
var headers = Response.Headers;
headers.Remove("cache-control");
headers.Remove("pragma");
headers.Remove("expires");
headers.Remove("set-cookie");
headers.Add("cache-control", "public, max-age=" + TimeSpan.FromHours(1).TotalSeconds.ToString());
}

Adding in a great article for static sites by one of the MVPs
https://www.kenticotricks.com/blog/static-sites-with-kentico-cloud

Related

WebPageTest shows no static cache but I got 304 response in repeated run

I am testing my webpage with webpagetest.org
On my page, there are a bunch of images. I can see them well cached in the repeated run: (304 response is marked as yellow in WebPageTest waterfall result)
However, in cache static content, there's no check on those resources
I found the difference is that those scripts and styles have cache-control: max-age=2592000, while those media resources have cache-control: max-age=0 in the server response. Does it mean that WebPageTest will neglect these responses with max-age=0 in static cache checking?
Does it mean that WebPageTest will neglect these responses with max-age=0 in static cache checking?
The documentation states that resources which include a specific indication of non-cacheability will not be subject to the 'Cache Static' check:
Applicable Objects
Any non-html object with a mime type of "text/*", "*javascript*" or "image/*" that does not explicitly have an Expires header of 0 or -1, a cache-control header of "private", "no-store" or "no-cache" or a pragma header of "no-cache"
While max-age=0 isn't included in that list, it should be treated the same as no-cache, and is likely being treated the same here and excluding those objects from this check.

CloudFront Modify JS / CSS Content

My website's theme is broken when I am serving JS and CSS via CloudFront. Further troubleshooting shows that some JS and CSS contents are different from the origin and I suspect this is the reason. Is it possible that CF has some kind of optimization features that modify our JS /CSS content? If yes, how can we disable or fix this problem?
I believe it is not a caching problem due to there isn't any changes to the origin's file after CF enabled. Also, I've tried to invalidated /wp-content/uploads/sites/2386/bb-plugin/cache/* but still getting the same behavior. As shown in the print screen below, I've also set query string to "Forward all, cache based on all".
Below are the JS and CSS files that are different by comparing the origin and CF, and my CF settings print screen:
JS
(Origin) https://www.seeustosee.com/wp-content/uploads/sites/2386/bb-plugin/cache/2650-layout.js?ver=774d199e19697e00bc26b83ff78afa2c
(CF) https://da4e1j5r7gw87.cloudfront.net/wp-content/uploads/sites/2386/bb-plugin/cache/2650-layout.js?ver=774d199e19697e00bc26b83ff78afa2c
CSS
(Origin) https://www.seeustosee.com/wp-content/uploads/sites/2386/bb-plugin/cache/2650-layout.css?ver=774d199e19697e00bc26b83ff78afa2c
(CF) https://da4e1j5r7gw87.cloudfront.net/wp-content/uploads/sites/2386/bb-plugin/cache/2650-layout.css?ver=774d199e19697e00bc26b83ff78afa2c
CF Behavior Settings
https://imgur.com/XiPDq0X
CloudFront does not modify payload. Even when Compress Objects Automatically is enabled (which it isn't), the compression is transparent gzip that results in a response body identical to the original, after decompression.
But take a look at your response headers, and you'll see the problem. Your origin server is Nginx, but you don't have CloudFront configured to use that server as the origin for these requests. You have CloudFront sending the requests to an Amazon S3 bucket. The JS file there is from August 28, 2019.
Content-Type: application/javascript
Content-Length: 18371
Date: Fri, 31 Jan 2020 02:21:42 GMT
Last-Modified: Wed, 28 Aug 2019 06:53:02 GMT
Server: AmazonS3

How does a web browser determine what to do with a resource?

In the browser's address bar, I can specify a resource using any extension or none, e.g., http://www.something.com/someResource.someExtension. How does the browser determine what to do with this resource? e.g., should the browser parse it as an HTML document, or treat it as some script? Is there a notion of a resource type? Thank you.
P.S. I could not believe what I was thinking! :( (see my flaw in the comment to Luka's answer). How could the browser look at a resource locally! The browser is a client, and the resource resides on the server side. Duh! (I've found myself on this "mental" drug occasionally)
The HTTP response returned by server typically contains "Content-type: text/html" or similar line (application/octet-stream, etc).
Here's an example (the easiest way to view similar results is to open firebug's Net tab):
Cache-Control public, max-age=60
Content-Encoding gzip
Content-Length 9334
Content-Type text/html; charset=utf-8<----------------here's it
Date Sat, 05 May 2012 20:34:36 GMT
Expires Sat, 05 May 2012 20:35:36 GMT
Last-Modified Sat, 05 May 2012 20:34:36 GMT
Vary *
It looks at the Mime Type of the document.
HTML pages have the mime type text/html, JPEG images have image/jpeg
More information: http://en.wikipedia.org/wiki/Internet_media_type
It does using MIME types http://en.wikipedia.org/wiki/Internet_media_type.

Trying to pass pci complience but have a cross-site scripting issue

I'm currently trying to pass PCI compliance for one of my client's sites but the testing company are flagging up a vulnerability that I don't understand!
The (site removed) details from the testing company are as follows:
The issue here is a cross-site
scripting vulnerability that is
commonly associated with e-commerce
applications. One of the tests
appended a harmless script in a GET
request on the end of the your site
url. It flagged as a cross-site
scripting vulnerability because this
same script that was entered by the
user (our scanner) was returned by the
server unsanitized in the header. In
this case, the script was returned in
the header so our scanner flagged the
vulnerability.
Here is the test I ran from my
terminal to duplicate this:
GET
/?osCsid=%22%3E%3Ciframe%20src=foo%3E%3C/iframe%3E
HTTP/1.0 Host:(removed)
HTTP/1.1 302 Found
Connection: close
Date: Tue, 11 Jan 2011 23:33:19 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: http://www.(removed).co.uk/index.aspx?osCsid="><iframe src=foo></iframe>
Set-Cookie: ASP.NET_SessionId=bc3wq445qgovuk45ox5qdh55; path=/; HttpOnly
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 203
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
The solution to this issue is to
sanitize user input on these types of
requests, making sure characters that
could trigger executable scripts are
not returned on the header or page.
Firstly, I can't get the result that the tester did, it only ever returns a 200 header which doesn't include the location, nor will it return the object moved page. Secondly, i'm not sure how (on iis 6) to stop it returning a header with the query string in it! Lastly, why does code in the header matter, surely browsers wouldn't actually execute code from the http header?
Request: GET /?osCsid=%22%3E%3Ciframe%20src=foo%3E%3C/iframe%3E HTTP/1.0 Host:(removed)
The <iframe src=foo></iframe> is the issue here.
Response text:
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
The response link is:
http://www.(removed).co.uk/index.aspx?osCsid="><iframe src=foo></iframe>
Which contains the contents from the request string.
Basically, someone can send someone else a link where your osCsid contains text that allows the page to be rendered in a different way. You need to make sure that osCsid sanitizes input or filters against things that could be like this. For example, I could provide a string that lets me load in whatever javascript I want, or make the page render entirely different.
As a side note, it tries to forward your browser to that non-existent page.
It turned out that I have a Response.redirect for any pages which are accessed by https which don't need to be secure and this was returning the location as part of the redirect. Changing this to:
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location", Request.Url.AbsoluteUri.Replace("https:", "http:"));
Response.End();
Fixed the issue

How does GMail implement Comet?

With the help of HttpWatch, I tried to figure out how GMail implements Comet.
I login in to GMail with two accounts, one in IE and the other in Firefox. Chatting in GTalk in GMail with some magic words like "WASSUP". Then, I logoff both GMail accounts, filter any http content without "WASSUP" string. The result shows which HTTP request is the streaming channel. (Note: I have to logoff. Otherwise, never-ending HTTP would not show content in HttpWatch.)
The result is interesting. The URL for stream channel is like:
https://mail/channel/bind?VER=8&at=xn3j33vcvk39lkfq.....
There is no surprise that GMail do Comet in IE with IFRAME. The Http content starts with "<html><body>".
Originally, I guessed that GMail does Comet in Firefox with multipart XmlHttpRequest. To my surprise, the response header doesn't have "multipart/x-mixed-replace" header. The response headers are as below:
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Date: Sat, 20 Mar 2010 01:52:39 GMT
X-Frame-Options: ALLOWALL
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
Server: GSE
X-XSS-Protection: 0
Unfortunately, the HttpWatch doesn't tell whether a HTTP request is from XmlHttpRequest or not. The content is not HTML but JSON. It looks like a response for XHR, but that would not work for Comet without multipart/x-mixed-replace, right?
Is there any way else to figure out how GMail implements Comet?
Update:
After further investigation, I believe GMail implements Comet this way:
1) in IE, it use a forever-hidden-iframe;
2) in Firefox, it use forever-XHR without multipart/x-mixed-replace header. The client will response in conditon (readyState == 3) OR (readyState == 4). That is, in both interactive state and complete state.
Per this article,
So what is the solution used by Google
Gmail?
The solution is really simple,
straight forward and very portable!
What Gmail did is requesting an
endless html page that contains
streams of Javascript portions. Give
it a try, It’s very powerful. So, we
will have on the client side a js file
that processes the responses, and
another endless html that contains the
Javascript Streams.
The rest of the article goes into much more detail, including an exploration of alternatives as well as the specific one picked by GMail.

Resources